We have bunch of agents scattered across different hosts and they are working fine.I encountered a strange behavior when glu console tries to PS (from console ui) to an Agent.
When i ps this particular agent the console is throwing exception below, but i do see Agent is being connected to zookeeper and showed up in console. since Agent logs says its connected to zookeeper successfully.
After trying few troubleshooting, i have replaced the existing agents(non working) with working one (from different host), but have no luck.
The only addition to generated distribution is starting the Agents with additional java ssl keystore params to satisfy security need for the front applications.
Any advice and suggestions. Thanks!!
2015/09/21 21:54:43.525 ERROR [GrailsExceptionResolver] RecoverableAgentException occurred when processing request: [GET] /console/agents/ps/xxxx-hostname
Communication Error (1001) - The connector failed to complete the communication with the server. Stacktrace follows:
org.linkedin.glu.agent.rest.client.RecoverableAgentException: Communication Error (1001) - The connector failed to complete the communication with the server
Sep 21, 2015 10:13:11 PM org.restlet.resource.ClientResource retry
INFO: A recoverable error was detected (1001), attempting again in 2000 ms.
Sep 21, 2015 10:13:13 PM org.restlet.ext.httpclient.internal.HttpMethodCall sendRequest
WARNING: An error occurred during the communication with the remote HTTP server.
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
Does it fail only when using ps? What happens if you try to invoke a command for example (uptime)? Do you see the same issue?
The fact that the agent can talk to ZooKeeper and shows up in the console does not mean that the console can talk to the agent: I assume that ZooKeeper is not hosted on the same machine as the console so you may have a connectivity issue between the console and the agent host (but not between the agent host and ZooKeeper host).
In the past when I have seen this problem, it was a connection issue (network) between the console and the agent. You may want to talk a sysad and see if they can help.
I get failures on all commands from console for that particular agent, The zookeeper and console are on the same hosts and the curl and telnet works when i ping the host port of agent from console hosts. it could be network, Is their any way to find out the connectivity issue?
I do see below warns in the zookeeper logs.
2015-09-21 22:10:42,787 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x14fe1d8d7af000b, likely client has closed socket
I would suggest to try the agent-cli.sh command: it lets you talk to the agent directly using the same rest api that the console uses. You can run this command on the console host, talking to the agent on the agent host. And also you can try to run this command on a different host to see if the result is the same.