wait for state does not wake up on forceChangeState
I have a deploy script that monitors the state of a running application and can change the state of the mountpoint when the application is not running. It is based on the monitor in the JettyGluScript and uses stateManager.forceChangeState to change the state
I want to monitor when the application terminates, so I have another program that waits for the state to change using the agent REST interface:
However is seems that the call does return until the timeout has expired, even if the state changes much earlier.
Looking at the source code of org.linkedin.groovy.util.state.StateMachineImpl, I can see that the forceChangeState method does not call notifyAll at the end. Should it not do this to wake up anyone waiting for state changes?
The method executeAction does call notifyAll, and I assume this it the reason that waiting for state change during deployment responds much faster.
How critical is this issue? I imagine that a "quick" workaround for now is to reduce the timeout. In a way it is not too bad to reduce the timeout: I am not sure I would recommend to block for 2mns at a time since you are in the end consuming a resource on the agent (tcp connection, etc...). The console for example "loops" every 10s.
Note that in order to achieve what you want to achieve I would actually recommend a different approach: simply connect and listen to ZooKeeper traffic: that way you do not tie a resource on your agent, you get notified right away and you can also listen to many agents/mountPoints with only one connection to your ZooKeeper cluster as opposed to 1 connection per agent per mountPoint. The blog post I wrote a while ago: http://www.pongasoft.com/blog/yan/glu/2011/03/18/building-monitoring-solution-with-glu/ gives you some example on how to monitor ZooKeeper (in the "Collecting and processing the monitoring data" section).
Note that I am not trying to evade the issue. It is a bug and I will fix it. I am just providing feedback.
Re: wait for state does not wake up on forceChangeState
Thanks for the quick reply.
The issue is not that critical right now. I just wanted to confirm the behaviour I was seeing, and report it in case anyone else stumbled into it.
I was thinking in the same lines as you suggest as a workaround: Waiting with a shorter timeout in a loop and then checking the returned reply to determine when the process is actually done (I can also use this loop to extract some progress information in the waiting program). I think we will try that first, to see if this is good enough for the purpose we need.
Thanks also for the point about tying up resources on the agent and for the link to the blog post.