Windows auto upgrade

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Windows auto upgrade

lukestephenon
Hello,

About to start on some work to implement the agent auto upgrade on windows.  Some questions / advance notice:

1. Has anyone else already completed this?  I don't want to reinvent the wheel.

2.
I'm running the agent as a windows service using commons daemon.  The changes to run the
I've not yet updated the agent build process (I've some custom packaging which adds some additional utils to the agent).  The actual code changes to get the agent running on windows I took from github.com/bhagatsingh/utils-misc and github.com/bhagatsingh/glu.  With those changes I've found the agent to be stable on windows.

I was planning on using a similar approach to unix auto upgrades, but changing AutoUpgradeScript.groovy to invoke windows bat scripts which perform the same logic.  Also, I've noticed that the current glu upgrade script uses user.dir which I've found this to be unreliable when running as a windows service.  The agent exposes glu.agent.homeDir which is set correctly.  Other than that change, I'm planning on launching the unix vs win specific scripts by detecting the OS using System.getProperty("os.name").

Planning on using the same approach of spawning a background process which will:
1. Stop the current windows service
2. Uninstall the current windows service (classpath lib location has changed so need to update commons daemon config)
3. Install the new windows service
4. Start the new windows service

A lot which can go wrong here.  Unfortunately it's not as simple as switching 'version.txt' on windows with commons daemon.

Feel free to comment with any tips / suggestions.

Yan, would you have anything against the using the commons-daemon framework if I submitted a pull request?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Windows auto upgrade

frenchyan
Administrator
I have no issue for or against the commons-daemon framework.: at this stage, it is not very clear to me how the Windows version of glu re-integrates with the main development of glu itself. 

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Windows auto upgrade

lukestephenon
Hi Yan,

The windows version of the agent will hopefully re-integrate nicely with the main development of glu.  To run on windows, I've added a bat file to agent-server/bin which uses apache commons daemon to register as a windows service.  After that, the agent can be managed from the windows services list.

2 additional classes were required to launch the agent on windows (one to launch and one to destroy)

The config file I've created only supports the environment variables, not the friendly names like GLU_AGENT_PORT.  Eg
-Dglu.agent.fabric=dev
-Dglu.agent.port=12906
-Dglu.agent.zookeeper.root=/org/glu/DevOps
-Dglu.agent.zkConnectString=myzookeeper
-Dglu.agent.name=luckyluke

I've come unstuck again with stdout and stderr not being closed on windows until all spawned background processes terminate (even if the output for those processes was directed to a logfile).  I reported this earlier http://glu.977617.n3.nabble.com/shell-exec-hangs-launching-background-process-windows-td4026571.html, but in relation to the agent upgrade this issue prevents the agent from shutting down until the async script to shut the agent down exits.

To work around this I've made the following changes to ShellExec :
  private def executeBlockingCall = {

    // if stdin then provide it to subprocess
    _stdin?.withStream { InputStream sis ->
      new BufferedOutputStream(_process.outputStream).withStream { os ->
        os << new BufferedInputStream(sis)
      }
    }

    log.info("Waiting for process to terminate")

    // we wait for the process to be done
    int exitValue = _process.waitFor()

    log.info("Process has exited with {}", exitValue)

    // make sure that the thread complete properly
    // on windows, the output streams are not closed when a background process is launched by the process.
    // Wait 1 second and report what input has been recieved until that point.
    [StreamType.stdout, StreamType.stderr].each {
      try {
        _processIO[it]?.future?.get(1, TimeUnit.SECONDS)
      } catch (TimeoutException ex) {
        log.info("Timeout waiting for output stream {} to close", it)
      }
    }

    log.info("Output streams have been read")

In summary, only wait for 'process.waitFor()' to complete and then grab whatever logs are available at that point (1 second grace period).

Any concerns with this approach?  I could put in logic so that on unix the future still waits for completion with no timeout.
Reply | Threaded
Open this post in threaded view
|

Re: Windows auto upgrade

frenchyan
Administrator
The main issue is that it seems that the code you added is simply waiting an extra second and the future is not completed: doesn't it mean that the underlying thread is still running and will never complete? That could be a serious issue...

Yan
Reply | Threaded
Open this post in threaded view
|

Re: Windows auto upgrade

lukestephenon
Hi Yan,

Thanks for the suggestion.  I've updated the code to interrupt and cancel the Future when the TimeoutException is caught.

https://github.com/lukestephenson/glu/blob/57d9b5840b7ed9c92adfdde1722349b742f301bb/utils/org.linkedin.glu.utils/src/main/groovy/org/linkedin/glu/groovy/utils/shell/ShellExec.groovy#L303

Luke