Last set of unknown questions

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Last set of unknown questions

oldgluuser
Thank you for all the answers thus far.  We are having a review very soon to decide on a technology choice, so it is good to have the right information!

Two more questions (these will be my last!)

1. When Zookeeper is upgraded, how is that managed?  i.e. How does one upgrade zookeeper?

2. What happens if an agent or the console goes down?  What monitors the agents? the console?

thanks for any insight!  
Reply | Threaded
Open this post in threaded view
|

Re: Last set of unknown questions

sodul
1. I never bothered with that. Yan has updated zookeeper with the 4.7 release but this is quite rare. I guess you would do that manually if you ever have to do it.

2. If an agent goes down, notthing really monitors that. The Console will show that the agent is missing in the dashboard if that agent is listed in the current model. The nice thing is that if the agent goes down and you restart it, it will come back in the exact same state as before. There is a gotcha however: if you machine rebooted, the deployed apps will still be marked as Started, you can extend the monitoring step in the groovy script to correct that.
If the console goes down, nothing happens to the agents. Restart the console and everything will be back in the state it was.
As for monitoring the console and agents, your Ops teams should have monitoring solutions in place. We use Scout and Splunk.
Reply | Threaded
Open this post in threaded view
|

Re: Last set of unknown questions

frenchyan
Administrator
In reply to this post by oldgluuser
1. Upgrading ZooKeeper simply follows the upgrade procedure from the ZooKeeper apache web site and is done manually (I am working on a glu setup and may have something more automatic but cannot promise it yet :). The good news is that you have usually a cluster of 3 ZooKeepers and you do a rollout upgrade, one at a time so that your cluster still work => does not require any glu downtime. Also the upgrade is usually: stop ZooKeeper, copy new jar file, restart ZooKeeper...

2. If the agent goes down for any reason, then it will be detected by the console and will show as a missing agent. So it is not monitoring in the sense that you get an alert somewhere, but it appears as a problem in the console. Note that you can write a few lines of code to listen to ZooKeeper and detect an agent failure and react accordingly (see my post: http://www.pongasoft.com/blog/yan/glu/2011/03/18/building-monitoring-solution-with-glu/)

As Stephane pointed out, if an agent goes down (agents have been designed to be pretty robust, so they don't just go down very often), it restarts in the same state it was when it went down. This allows for upgrading the agent (a nice feature from glu) as the system is up and running without impacting the apps deployed by glu. If the agents goes down because the machine it is running on goes down, then when the machine comes back up, the agent will be up in the same state it was, so it will believe the apps that were started are still up: but if you use the same trick (timers + monitor), as used in the JettyGluScript to essentially monitor the apps that glu is starting, it is very easy to detect and report the failure.

In terms of the console, if it goes down, you simply restart it: we seriously encourage to not use the database that comes with glu as pointed out many times in the documentation. There is nothing special that monitors the console built-in.

Yan