Strange 'Connection reset by peer' errors

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange 'Connection reset by peer' errors

oldgluuser
Hi Yan,

I have upgraded to 5.2.0 and just about everything looks good.  However, I have noticed a couple of times now one or two of the agents lose control and their logs are flooded with messages like:

2013/09/20 15:18:24.633 INFO [ClientCnxn] Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2013/09/20 15:18:24.633 INFO [ClientCnxn] Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2013/09/20 15:18:24.633 INFO [ClientCnxn] Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2013/09/20 15:18:24.683 INFO [ClientCnxn] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2013/09/20 15:18:24.683 INFO [ClientCnxn] Socket connection established to localhost/127.0.0.1:2181, initiating session
2013/09/20 15:18:24.684 WARN [ClientCnxn] Session 0x0 for server localhost/127.0.0.1:4=2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

I suspect I am doing something incorrectly.  Do you have any ideas?  Also, I need to stop and restart the Agent process to recover.  Is there anyway to perhaps automatically restart after say 50 or 100 of these?
Reply | Threaded
Open this post in threaded view
|

Re: Strange 'Connection reset by peer' errors

sodul
Yan would be better than me at troubleshooting this, but it seems it tries to connect to zookeeper on localhost. Is Zookeeper on the same machine as the agent, and if so do you set your zookeeper string to localhost:2181 or 127.0.0.1:2181?
Reply | Threaded
Open this post in threaded view
|

Re: Strange 'Connection reset by peer' errors

frenchyan
Administrator
In reply to this post by oldgluuser
In a real setup, ZooKeeper should not be on localhost/127.0.0.1. If you have multiple agents (on different machines) then it obviously cannot work. Are you sure you set it up properly in the meta model?

Feel free to post your meta model so that i can take a look at it (you can also use the -J option to view the fully expanded meta model => http://pongasoft.github.io/glu/docs/latest/html/setup-tool.html#showing-the-json-model-j )

Yan


On Fri, Sep 20, 2013 at 9:24 AM, joeg [via glu] <[hidden email]> wrote:
Hi Yan,

I have upgraded to 5.2.0 and just about everything looks good.  However, I have noticed a couple of times now one or two of the agents lose control and their logs are flooded with messages like:

2013/09/20 15:18:24.633 INFO [ClientCnxn] Opening socket connection to server localhost/0:0:0:0:0:0:0:1:4181. Will not attempt to authenticate using SASL (unknown error)
2013/09/20 15:18:24.633 INFO [ClientCnxn] Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2013/09/20 15:18:24.633 INFO [ClientCnxn] Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2013/09/20 15:18:24.683 INFO [ClientCnxn] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2013/09/20 15:18:24.683 INFO [ClientCnxn] Socket connection established to localhost/127.0.0.1:2181, initiating session
2013/09/20 15:18:24.684 WARN [ClientCnxn] Session 0x0 for server localhost/127.0.0.1:4=2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

I suspect I am doing something incorrectly.  Do you have any ideas?  Also, I need to stop and restart the Agent process to recover.  Is there anyway to perhaps automatically restart after say 50 or 100 of these?



If you reply to this email, your message will be added to the discussion below:
http://glu.977617.n3.nabble.com/Strange-Connection-reset-by-peer-errors-tp4026103.html
To start a new topic under glu, email [hidden email]
To unsubscribe from glu, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Strange 'Connection reset by peer' errors

oldgluuser
No mystery here - I was setting it to 'localhost:2181'.  Let me change that and see how it helps!  FYI: this was in my development instance with only the one zookeeper running and everything on the same host, so I thought that would be ok!
Reply | Threaded
Open this post in threaded view
|

Re: Strange 'Connection reset by peer' errors

sodul
If everything is on the same host, then test localhost should be ok. That said … I've been having issues with other java apps when mixing localhost and ipv6, so you might still want to explicitly set the ip address (127.0.0.1). Your zookeeper string would simply be '127.0.0.1:2181'.