Yan, I'm wondering if you've ever considered storing the static model for a fabric in zookeeper, rather than just on the console server.
We have a topology such that we have several remote datacenters that all require their own glu consoles so that we can manage them in the event that we lose the dedicated connection between our office and the datacenter (we would VPN to the data center over a different line).
I would like to also have one "master" console that was configured to manage fabrics at all of our remote data centers, each with its own zookeeper ensemble. This obviously does not work quite as well as you might hope because when I deploy a model to the master console, the backup console servers are unaware of the change and vice versa.
It seems to me that you could store the static model for a fabric in zookeeper, and orchestration engines could watch zookeeper for changes in the static model, much like they watch zookeeper for changes in the live model.
I think this is an interesting question. I think there are several points:
* zookeeper is not really meant to store big content. I know that the LinkedIn model is ~ 20MB and this is not appropriate for ZooKeeper (which I think by default limit entries to 1MB).
* a zookeeper cluster should not span datacenter and it seems that is exactly what you would want to do so I don't think it would work very well
2. "Master" glu
I think what you are really asking about is what you call a "master" console. I like the idea a lot. In my mind it should be a piece of software that can talk to the (REST) api of the various consoles: it is not a console per se, something different. The consoles are not bakups, they are the real deal, handling their own "fabrics". The "master" or "central" one is the one that simply talk to them (it can have its own storage) and is an aggregate view of the system. In your example, you would "upload" a model to the "master" and it would propagate to the appropriate console invoking its "upload model" api.
I am not sure if this is really part of glu. It could definitely be built entirely on top of glu so that may be a separate project. I am sure there are some APIs that are missing in glu in order to make it work but we could add the ones that are missing.
Let's see what people think of this. If there is enough demand for it, I would consider looking into it for sure. That seems like a pretty reasonable use case.
We do not have zookeeper ensembles that span datacenters; each cluster has its own zookeeper ensemble since latency has to be very low. I like your idea of a master console that controls the regular consoles, though it seems that it would have largely the same interface as the existing console only it would be passing through commands to the sub-consoles rather than directly to agents.
Perhaps I will look into this -- it would be very nice to, at the least, have a service that could provide the console server and zookeeper ensemble for a given fabric, which would then allow me to proxy commands/queries to the appropriate place.
We too like this idea considering the environment of multiple data centers. However, I'd like to know more the thoughts on having separate zookeepers for separate fabrics. That does make sense given that most fabrics are environmentally isolated, but does it work well for the situation?
Due to a ZooKeeper limitation (not glu), it is not recommended to have ZooKeeper clusters spanning multiple data centers. Since one fabric is defined in a ZooKeeper cluster, a fabric should not span data centers and in general represents a subset (of a data center).