Detected recoverable error while talking to the agent

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Detected recoverable error while talking to the agent

hansg01
I have been successfully implementing glu deployment but there is this one issue which takes 15mins of the deployment time sitting idle. After this fixed time span this over, the script starts execution and completes the deployment. Total time taken by the deployment is 15 mins higher than usual which is the main problem. The agent logs nothing while the console logs:

WARN [RecoverableAgent] #0: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:15.967 WARN [RecoverableAgent] #0: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:24.980 WARN [RecoverableAgent] #1: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:24.982 WARN [RecoverableAgent] #1: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:33.993 WARN [RecoverableAgent] #2: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:33.994 WARN [RecoverableAgent] #2: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:43.007 WARN [RecoverableAgent] #3: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:43.009 WARN [RecoverableAgent] #3: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:52.021 WARN [RecoverableAgent] #4: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:23:52.022 WARN [RecoverableAgent] #4: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:01.033 WARN [RecoverableAgent] #5: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:01.034 WARN [RecoverableAgent] #5: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:10.046 WARN [RecoverableAgent] #6: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:10.047 WARN [RecoverableAgent] #6: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:19.058 WARN [RecoverableAgent] #7: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:19.060 WARN [RecoverableAgent] #7: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:28.069 WARN [RecoverableAgent] #8: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:28.070 WARN [RecoverableAgent] #8: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:37.083 WARN [RecoverableAgent] #9: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:24:37.084 WARN [RecoverableAgent] #9: detected recoverable error while talking to the agent [ignored]: Communication Error (1001) - The connector failed to complete the communication with the server
2014/08/29 02:29:30.298 INFO [AbstractValidatingSessionManager] Validating all active sessions...
2014/08/29 02:29:30.298 INFO [AbstractValidatingSessionManager] Finished session validation.  No sessions were stopped.
Reply | Threaded
Open this post in threaded view
|

Re: Detected recoverable error while talking to the agent

frenchyan
Administrator
You get this error (RecoverableAgent communication error) whenever the console tries to talk to an agent and it is timing out. It is unclear why this is happening in your specific scenario but it could be due to many issues:

* (congested) network traffic between console and agent... check with your network administrator
* if you do big deployments (many agents at the same time), which from a glu point of view is not a big deal, it may put heavy load on your infrastructure: for example, if your install phase in your script downloads packages/tar balls from a single download server, if many are doing this at the same time, it could saturate your network pipes and/or download server
* I have also seen cases when the script is installing and deploying a java application which at boot time is really heavy (loading caches, lots of gc, etc...) and since it runs on the same host/node where the agent is running it may impact the agent as well.


As suggestions:
* try to do a serial deployment vs a parallel one see if that changes anything
* ask your network administrator to monitor the network traffic as you are doing a deployment
* check the agent log file and gc log files
* monitor the load and network activity on an agent host to see if something funky is happening

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Detected recoverable error while talking to the agent

hansg01
Below are the gc logs on one of the agent:

2014-09-01T22:42:29.573-0700: 1058.690: [GC 163630K->44582K(221184K), 0.0417030 secs] 2014-09-01T22:42:40.275-0700: 1069.392: [GC 175654K->82562K(351744K), 0.0929390 secs] 2014-09-01T22:42:50.945-0700: 1080.062: [GC 328834K->106786K(367616K), 0.1127800 secs] 2014-09-01T22:42:51.150-0700: 1080.267: [Full GC 106786K->88595K(422400K), 0.5060740 secs] 2014-09-01T22:43:11.518-0700: 1100.635: [GC 334867K->111417K(402432K), 0.0953060 secs]

They don't look good.
Reply | Threaded
Open this post in threaded view
|

Re: Detected recoverable error while talking to the agent

frenchyan
Administrator
Can you please explain what makes you say that they don't look good?

Yan