We're in the critical process of reviewing the use of Glu from the security aspect. Questions pop up on
(1) how Glu server authenticates itself to Glu agents, and
(2) how Glu agents authenticate themselves to ZooKeeper.
For the first question, the team wants to know specifically the steps and type of public/private keys used. The documentation only talks about key generation but not much of the details on this.
And for the second question, I don't see anything in the documentation.
In addition, since Glu agent makes request back to somewhere (repo or an app) to fetch the Glu script to run on the host. What's the recommended security practice around it (e.g. to authenticate the fetch request)?
The passwords to the keystore and truststore are obfuscated and stored in the config file. What is important is that both the console.keystore (which is the most important private key) as well as the console config file (which contains the obfuscated password) are not accessible by any user. In other words the machine on which the console is installed should be well protected. Also it is highly recommended to deploy the console webapp under an https protocol due to the fact that users are required to login (hence provide their credentials).
There is currently no authentication done for the agents to authenticate to ZooKeeper. The agents mostly write to ZooKeeper. The only time the agents actually read is to get their configuration at boot time. If you are afraid of somebody tampering with ZooKeeper in order for the agents to read the wrong configuration, then you can disable entirely reading configuration from ZooKeeper and store all configuration parameters in the agent config file (which is a lot easier now with easy setup).
In regards to the agent fetching the script and app, there is nothing built-in for any kind of protection. I would recommend (but I am no security expert...) that you include some checksum in the URL pointing to your scripts and/or webapps: since this checksum is part of the model that only an admin can change in glu, you can have the repo check that what you are requesting has not been tampered with (which is usually what is important in this case).
...you can disable entirely reading configuration
from ZooKeeper and store all configuration parameters in the agent config
file (which is a lot easier now with easy setup).
Not related to security. If we disable the use of ZooKeeper, where does agent store their live model and other statuses? Also, how the console know which agents are there or not there. My understanding is that agent is using ephemeral node to indicate its presence.
The agents can function without using ZooKeeper at all (and you can disable ZooKeeper entirely). That being said without ZooKeeper the rest of glu (console/orchestration engine) will not work since it is relying on ZooKeeper.
The fact that the agent can function with or without ZooKeeper is 2 fold:
1) you can use the agent only (and I know of at least one person that was using the agent alone, as a remote script execution engine... using the agent REST api directly)
2) theoretically the orchestration engine could work without ZooKeeper by polling/pinging the agents. This was a feature that was not implemented and nobody has ever requested it so it has not been implemented (a lot of work...).