Help with the direction for glu

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Help with the direction for glu

frenchyan
Administrator
Hi guys

I would like to get some help and feedback deciding on what to work on next for glu. It is very unclear to me how much/little glu is actually being used in the real world. Besides this forum, the only other way I can track is how many times glu is downloaded (on bintray).

So the best way is to ask the users (you) what they would want to see next. There is a list of tickets opened github (the one marked "upcoming" are actually already implemented and awaiting the next release :).

I can look at the list and pick something that sounds interesting to me but the ROI may not be worth it. For example I spent probably 2 months on glu-58 (the "easy" glu setup) and although I believe it is so much better and easier than it used to be, I have gotten pretty much 0 feedback on it. So in the end I am not sure that it helped glu in any way.

So I am asking if you can take a look at the list of tickets and tell me which ones are the most important to you. And also if there is anything else which is currently not captured in a ticket (in which case it needs to be added).

Please make your voice heard, this is your chance to shape the future and direction of glu!

Thanks
Yan
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

sodul
Here are my votes, in order of importance to us (more to less):
- Make URLs copy/paste friendly: https://github.com/pongasoft/glu/issues/153: very needed for collaboration and improved bug reports of our apps.
- Add "tail -f" for log files: https://github.com/pongasoft/glu/issues/187: the alternative is to ssh to a host, tailing/streaming the logs in Glu as they get written would reduce our dependency on ssh.
- Failed deployments show the head of the groovy script output, not tail: https://github.com/pongasoft/glu/issues/173

My understanding is that 153 and 187 are non trivial to implement, but adding them would probably increase glu adoption as they would decrease the need to rely on ssh access.

I've added a new one which I've implemented in a project similar to Glu at Netflix:
- add regex filter option when tailing remote logs: https://github.com/pongasoft/glu/issues/244


I'll try to work on 174, 202 and 233 if I can find some spare time.
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

yoron
In reply to this post by frenchyan
Hi Yan,

One feature I didn't find in the list and I think would be a good improvement, is a better error reporting when a plan execution fails. Currently, we only get the x-glu-completion header when GETting an execution status, it would be very helpful to have an actual message in the response, that describes what went wrong when deployments fail.

Thanks again, Glu is a great product!
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

sodul
Yoron, what you want might be related to 173 and 174.
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

Lucas McGrew
In reply to this post by frenchyan
Yan, we are still actively using glu to manage over a dozen fabrics.  I think our top priority feature request is some kind of master glu console.  Because our clusters are remotely located in different data centers, each one has its own zookeeper ensemble and glu console.  It would be really nice if I could run a single console in our local datacenter that could control the remote fabrics and then only rely on the remote consoles as a failover if a data link between the "master" console and the remote console were to fail.
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

sodul
Lucas ... any reason you cannot put all your fabrics on a single console? You would keep the zookeeper instances where they are and a single console for all the DCs. As long as that single console can reach  the zookeeper instances and the agents (not the other way around) everything should work well.

I've used Glu with agents in EC2 in multiple regions with a console in our local datacenter in the past. We are currently setting more data centers for our product and the plan is to have one console for all of them.

If you are concerned about local control in case of outage you can setup a live replication of your console (+database) to an other datacenter (I recommend you to do that actually).
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
In reply to this post by Lucas McGrew
In regards to the "master" console, are you talking about a) cross fabric orchestration? Or b) simply the ability to "easily" connect from one console to the next? For example, for b), if the page showed a "master" frame at the top (or bottom) of the screen with the list of all fabrics, and the rest of the page was the view of a given fabric from a given console (in that regard the master frame would know on which console, which fabric is located), and whenever you click (or select in a drop down) the fabric from the master area, it simply renders the other part of the page by pointing to the right console, would it work?

I could even imagine implementing b) with a chrome extension or something like this...

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

sodul
In reply to this post by Lucas McGrew
Lucas, I forgot to mention: we control our Glu fabrics on multiple Glu consoles from a single Jenkins master. We do not get the 'glu dashboard' since the Jenkins UI does not provide this feature, but thanks to Jenkins more advanced access control management we can easily give full redeployment control to QA environment to anyone with an LDAP account, more limited access to our staging environments and very limited access to our production environments. This could be an alternative for you to unify controlling the deployments from a
 single server. You will have to write your own equivalent to blu_model.py though.

I've attached a sample log output from such a Jenkin job (slightly sanitized, mostly build names):
00:00:00.000 Started by user stephane
00:00:00.016 [EnvInject] - Loading node environment variables.
00:00:00.035 Building remotely on slave5 in workspace /home/builder/workspace/deploy-stable-to-QaStable
00:00:00.489 Cleaning up /home/builder/workspace/deploy-stable-to-QaStable/glu_model
00:00:00.500 Updating https://code/glu_model at revision '2013-09-02T09:54:22.975 +0000'
00:00:00.654 At revision 10232
00:00:01.174 Cleaning up /home/builder/workspace/deploy-stable-to-QaStable/gluscripts
00:00:01.186 Deleting /home/builder/workspace/deploy-stable-to-QaStable/gluscripts/glu.pyc
00:00:01.187 Updating https://code/gluscripts at revision '2013-09-02T09:54:22.975 +0000'
00:00:01.338 At revision 10232
00:00:02.173 Cleaning up /home/builder/workspace/deploy-stable-to-QaStable/console-cli
00:00:02.186 Deleting /home/builder/workspace/deploy-stable-to-QaStable/console-cli/lib/python/gluconsole/__init__.pyc
00:00:02.187 Deleting /home/builder/workspace/deploy-stable-to-QaStable/console-cli/lib/python/gluconsole/rest.pyc
00:00:02.190 Updating https://code/org.linkedin.glu.packaging-all-5.0.0/console-cli at revision '2013-09-02T09:54:22.975 +0000'
00:00:02.345 At revision 10232
00:00:03.198 no change for https://code/glu_model since the previous build
00:00:03.198 no change for https://code/gluscripts since the previous build
00:00:03.199 no change for https://code/org.linkedin.glu.packaging-all-5.0.0/console-cli since the previous build
00:00:03.201 No emails were triggered.
00:00:03.254 [deploy-stable-to-QaStable] $ /bin/bash -xe /tmp/hudson6386226257473454902.sh
00:00:03.262 + cd glu_model
00:00:03.262 + ./glu_model.py -e corp -f QaStable -u jenkins -x ******** -c deploy -b ''
00:00:03.350 Builds:  ['foo-0.3.0-20130818.062121-1.war', 'shnbin-foo-5.tgz', 'bar-0.1.1-20130828.231036-1.jar', 'shnbin-bar-1.tgz']
00:00:03.378 http://articorp.corp.sjc.shn/artifactory/api/search/artifact?name=all-defaults.json
00:00:03.422 Artifact metadata at http://articorp/artifactory/api/storage/settings/all-defaults.json
00:00:03.441 78a0869ced81a2ede2d8a5637273e75a
00:00:03.447 http://articorp/artifactory/api/search/artifact?name=corp-defaults.json
00:00:03.484 Artifact metadata at http://articorp/artifactory/api/storage/settings/corp-defaults.json
00:00:03.490 2c93b0a1461ef61a7270a240c290d517
00:00:03.497 http://articorp/artifactory/api/search/artifact?name=corp-QaStable.json
00:00:03.540 Artifact metadata at http://articorp/artifactory/api/storage/settings/corp-QaStable.json
00:00:03.547 178f9f62382ac13cc0ea71a7da8b587e
00:00:03.769 2013-09-02 09:54:26,746: > ../console-cli/bin/console-cli.py -u jenkins -x ******** -c http://glucorp -f QaStable -M QaStable.json load
00:00:03.981 Model loaded successfully: id=88621a5eb5dfc4644054fae9baec4e19a7f02a34
00:00:03.994 2013-09-02 09:54:26,970: QaStable.json loaded successfuly
00:00:03.994 2013-09-02 09:54:26,971: > ../console-cli/bin/console-cli.py -u jenkins -x ******** -c http://glucorp -f QaStable -p --all deploy
00:00:04.211 2013-09-02 09:54:27,187 [INFO] - gluconsole.rest.Client - executing plan: plan/8318a091-948c-4a96-b96b-0b9bfb53f2da/execution
00:00:04.273 2013-09-02 09:54:27,250 [INFO] - gluconsole.rest.Client - status url = plan/8318a091-948c-4a96-b96b-0b9bfb53f2da/execution/341
00:00:04.294 2013-09-02 09:54:27,271 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:06.318 2013-09-02 09:54:29,294 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:08.341 2013-09-02 09:54:31,318 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:10.360 2013-09-02 09:54:33,337 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:12.383 2013-09-02 09:54:35,360 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:14.407 2013-09-02 09:54:37,383 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:16.431 2013-09-02 09:54:39,407 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:18.453 2013-09-02 09:54:41,430 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:20.477 2013-09-02 09:54:43,454 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:22.502 2013-09-02 09:54:45,477 [INFO] - gluconsole.rest.Client - InProgress: 0% complete
00:00:24.518 2013-09-02 09:54:47,494 [INFO] - gluconsole.rest.Client - InProgress: 12% complete
00:00:26.542 2013-09-02 09:54:49,518 [INFO] - gluconsole.rest.Client - InProgress: 31% complete
00:00:28.566 2013-09-02 09:54:51,543 [INFO] - gluconsole.rest.Client - InProgress: 31% complete
00:00:30.584 2013-09-02 09:54:53,561 [INFO] - gluconsole.rest.Client - InProgress: 34% complete
00:00:32.608 2013-09-02 09:54:55,585 [INFO] - gluconsole.rest.Client - InProgress: 40% complete
00:00:34.633 2013-09-02 09:54:57,609 [INFO] - gluconsole.rest.Client - InProgress: 46% complete
00:00:36.656 2013-09-02 09:54:59,633 [INFO] - gluconsole.rest.Client - InProgress: 46% complete
00:00:38.681 2013-09-02 09:55:01,657 [INFO] - gluconsole.rest.Client - InProgress: 50% complete
00:00:40.704 2013-09-02 09:55:03,681 [INFO] - gluconsole.rest.Client - InProgress: 50% complete
00:00:42.726 2013-09-02 09:55:05,703 [INFO] - gluconsole.rest.Client - InProgress: 56% complete
00:00:44.750 2013-09-02 09:55:07,727 [INFO] - gluconsole.rest.Client - InProgress: 71% complete
00:00:46.773 2013-09-02 09:55:09,750 [INFO] - gluconsole.rest.Client - InProgress: 81% complete
00:00:48.796 2013-09-02 09:55:11,772 [INFO] - gluconsole.rest.Client - InProgress: 81% complete
00:00:50.820 2013-09-02 09:55:13,796 [INFO] - gluconsole.rest.Client - InProgress: 84% complete
00:00:52.842 2013-09-02 09:55:15,819 [INFO] - gluconsole.rest.Client - InProgress: 87% complete
00:00:54.861 2013-09-02 09:55:17,838 [INFO] - gluconsole.rest.Client - InProgress: 87% complete
00:00:56.884 2013-09-02 09:55:19,861 [INFO] - gluconsole.rest.Client - InProgress: 90% complete
00:00:58.907 2013-09-02 09:55:21,884 [INFO] - gluconsole.rest.Client - InProgress: 93% complete
00:01:00.931 2013-09-02 09:55:23,907 [INFO] - gluconsole.rest.Client - Completed : 100:COMPLETED
00:01:00.932 100:COMPLETED
00:01:00.948 2013-09-02 09:55:23,924: deployed successfuly
00:01:01.009 Archiving artifacts
00:01:01.126 No emails were triggered.
00:01:01.134 Finished: SUCCESS




Here 'Build with Parameters' go to a page with a series of dropdowns of recent builds for all components deployed in the fabrics. Some of these fabrics redeploy automatically with the latests builds a few minutes after a commit thanks to Jenkins ability to chain builds.

Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

oldgluuser
In reply to this post by frenchyan
On #240 status - yes!

On no feedback for 'Easy Setup', I have only had time to give the doc a quick read.  Still on 5.0.0.  Should be upgrading within 2 weeks.

+1 for #183

+2 for #91 (can I do that :)

Along with the others, I would love to see #174, but I need support from the REST Api as we are not using the python CLI.  Additionally, it would be HUGE to have an easy way to send back status/error/failure information.

Finally, one I didn't see, but we talked about in one of the threads - the REST Api should be able to do 'partial plans' the way the console can.    Just think of the Tomcat/Jetty use case.  To deploy a new/updated jar, I only want to stop, configure, and start.   Who is supposed to create a Feature request for this?  If I am, sorry, let me know and I'll do it.

hth,
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
On Thu, Sep 5, 2013 at 7:07 AM, joeg [via glu] <[hidden email]> wrote:
On #240 status - yes!

Happy to hear!
 

+1 for #183

Makes sense (and not a biggy).
 

+2 for #91 (can I do that :)

This is Windows support. This is definitely something that I will not be able to work on. I don't use Windows and know nothing of Windows shell scripting. Recently I unfortunately had to make glu even more unix specific by forking for shell.cp and shell.mv as nothing is handling it properly (whether it is ant, commons-io, etc... looked at a LOT of libraries...). I still plan to fix this and buy the bullet and reimplement cp/mv entirely in java (dog work, painful, etc...) which would make the port to Windows easier. That being said, all the shell scripting in glu would have to be rewritten for Windows. I am more than happy to take pull request.
 

Along with the others, I would love to see #174, but I need support from the REST Api as we are not using the python CLI.  Additionally, it would be HUGE to have an easy way to send back status/error/failure information.

Ok I will keep this in mind.
 


Finally, one I didn't see, but we talked about in one of the threads - the REST Api should be able to do 'partial plans' the way the console can.    Just think of the Tomcat/Jetty use case.  To deploy a new/updated jar, I only want to stop, configure, and start.   Who is supposed to create a Feature request for this?  If I am, sorry, let me know and I'll do it.

Isn't it this issue: https://github.com/pongasoft/glu/issues/228 which has been implemented in 5.2.0 ? If that is the case, then it is available from the REST api already.

Yan 
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

sodul
In reply to this post by oldgluuser
Reconfigure is available in the latest Glu console. I have not checked the REST API but I assume you can use that now. I intend to add Reconfigure to the console-cli if the API allows.

UPDATE: Yan beat me to the post ;-)
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

oldgluuser
In reply to this post by frenchyan
About issue #228.  That is great news that it is already implemented, but I'm not sure how I use it!  The discussion thread talked about a workaround and the bug looks like it was closed as it was implemented.  But how do I do reconfigure (ie. stop, unconfigure, configure, start) with the new implementation?  Or do I use the workaround?

Also - I don't see a way to ask for status.  Minimally, I am looking for something analogous to the standard tomcat.sh script where one uses:
    ./tomcat.sh status
and gets back some information.  Ideally, I would like to be able to set some type of built-in 'status' variable in my agent gluscript and have a way to pull that back.   [Am doing this round-about now by writing such data into my own section of zookeeper....]

Ok, stupid question #95.  When you say 'built-in' you mean I just use "reconfigure" instead of something like 'redeploy'?  That would be very nice!
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

oldgluuser
I'm not sure if this is a new feature request or if there is already a (standard) way of doing this.

I think a common case (ok, I have this case!) often requires coordination amongst a set of Agents.  So I have Agent#1 on Machine#1 managing the database installation there.  I have Agent#2 on Machine#2 managing the Tomcat (or Jetty) installation there.  When the web application (war file) is updated, most often it comes with (in our case, Flyway) migration scripts that need to be run against the database.

While Flyway allows Agent#2 to easily run the migration script it receives along with the war file, the database needs to be backed up first.  Agent#1 on Machine#1 needs to do this, and only after it completes can Agent#2 resume.

Do others already do cross-agent coordination like this (if so, how is this currently done)?  Or is this a new feature?
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
In reply to this post by oldgluuser
You use planAction="reconfigure" in your rest call (http://pongasoft.github.io/glu/docs/latest/html/orchestration-engine.html#goe-rest-api-representing-a-plan). I have missed updating the first row in the table in the documentation with the new value. Also as an added bonus, the code is now generic and if you have your own custom plans, then you can generate them from the rest call as well (before, it was only working with the predefined plan actions).

In regard to your status, you can already do this by having a status variable defined in your glu script and when you request the live model (http://pongasoft.github.io/glu/docs/latest/html/orchestration-engine.html#view-live-model), it will be part of the json result: all fields are exported to ZooKeeper automatically (see http://pongasoft.github.io/glu/docs/latest/html/glu-script.html#fields). For example the picture in this section (http://pongasoft.github.io/glu/docs/latest/html/tutorial.html#viewing-entry-details) shows the scriptState section and port and pid are variables like this. It is then fairly trivial to write a small cli wrapper, that will take whatever input you want (most likely agent name and mount point), will fetch the live model, extract the status from the scriptState section and render it the way you want.

The previous method requires your script to have a timer to update the status variable on a periodically basis, but if you already do this (like the built-in jetty script does), then it is not a biggy.

Another way to do it would be to have another closure on your glu script that you can invoke on demand:

def status = {
 ... 
}

At this stage, the api lets you have any kind of "call" on your glu script ((https://github.com/pongasoft/glu/blob/master/agent/org.linkedin.glu.agent-api/src/main/groovy/org/linkedin/glu/agent/api/Agent.groovy#L112)) and it is exposed via the rest api of the agent only. It is not exposed in the agent-cli or in the console and this is the point of this ticket: https://github.com/pongasoft/glu/issues/190 The good news is that the mechanism are in place... it is just a matter of exporting it. If this is what you want then we can add 1 vote to 190 :)

Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
In reply to this post by oldgluuser
In regards to your question about cross-agent coordination, there is nothing in glu that specifically handle this case.

One way that I can picture this being implemented is by issuing a sequence of small plans/calls:

execute deployment plan -> undeploy agent-2
execute call -> backupDb agent-1 (this is the new call I was mentioning in my previous message)
execute deployment plan -> deploy agent-2

I have also had something in the back of my mind that would let you express this sequence of steps as a plan (with some dsl) that can be fed to glu and glu would execute those 3 steps for you.

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

oldgluuser
In reply to this post by frenchyan
frenchyan wrote
...

At this stage, the api lets you have any kind of "call" on your glu script
((
https://github.com/pongasoft/glu/blob/master/agent/org.linkedin.glu.agent-api/src/main/groovy/org/linkedin/glu/agent/api/Agent.groovy#L112))
and it is exposed via the rest api of the agent only. It is not exposed in
the agent-cli or in the console and this is the point of this ticket:
https://github.com/pongasoft/glu/issues/190 The good news is that the
mechanism are in place... it is just a matter of exporting it. If this is
what you want then we can add 1 vote to 190 :)
Yes - add a vote to 190!  That would be extremely useful.  The doc on the REST Api is a TODO - is most of the Api 'documented' in a single implementation class?
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
Ok for 190.

In regards to the rest api, some of the api is detailed at various places in the agent doc page (http://pongasoft.github.io/glu/docs/latest/html/agent.html). The reason why it is a TODO in the doc is because in general (99% of the time) you do not talk to the agent directly via its rest api, but you talk to the console via the orchestration engine rest api which is thoroughly documented here: http://pongasoft.github.io/glu/docs/latest/html/orchestration-engine.html#rest-api 

The console then talks to the agent using the agent rest api.


Each Resource handles a path (ex: MountPointResource handles /mountPoint) and the source code is usually explicit in terms of methods (GET, POST,... ) and arguments.

Yan

Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

lukestephenon
In reply to this post by frenchyan
Hi Yan,

Thanks for the awesome effort on glu.  Just wanted to say I am using glu.  I'm a fan of the current architecture of the agents and using zookeeper to maintain process state. At the moment I don't have any major suggestions for the direction going forward.

Thanks

Luke
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
Very glad to hear :)

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Help with the direction for glu

frenchyan
Administrator
In reply to this post by yoron
This is in reply to yoron.

If you issue a HEAD (http://pongasoft.github.io/glu/docs/latest/html/orchestration-engine.html#check-status-of-plan-execution) then yes you get only headers... this is on purpose... HEAD does not have a body

If you issue a GET (http://pongasoft.github.io/glu/docs/latest/html/orchestration-engine.html#view-execution-plan) you get the full xml which contains the failure...

Isn't it what you need?

It may not be available from the console-cli (and that is the point of issue 174) but if you use the REST api directly, it is available: use GET instead of HEAD

Yan
12