Agent loose state information on restart

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Agent loose state information on restart

Jesper B
Well that is probably not so strange...
I have a number of glu groovy scripts that all relies heavily on a custom class that handles deploy of apps and ressources to a Glassfish server (GlassfishAppDeployer).
The glassfishAppDeployer object is created when glu method "install" is invoked.
Once created, it is used heavily in all glu methods (install, configure, start, stop, unconfigure, etc).
Now the thing is: I cannot be sure that the glassfishAppDeployer object exists! The agentserver might have been restarted, and thus the object might have been zapped, leaving me with a null pointer exception.
I guess this is notoriously hard to handle, but it would be nice if the agentserver were able to persist (maybe just some predefined?) objects regularly. Do you have any plans for such a feature?

regards
jesper
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Agent loose state information on restart

frenchyan
Administrator
If you are storing your class as a field in your script and your class is serializable then it should automatically be saved and restored for you.

ex: 

class GlassfishAppDeployer extends Serializable
{
 ...
}

class MyGluScript
{
  def glassfishAppDeployer

  def install = {
    glassfishAppDeployer = new GlassfishAppDeployer(params)
  }

  def start = {
    glassfishAppDeployer.xxx()
  }
}

Yan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Agent loose state information on restart

Thomas Pii
Hello Yan,

Sorry we never got back to you about this problem.
We were never able to make this work even after making the GlassfishAppDeployer class serializable, so it was put aside for later.

I am taking a second look at it now and one thing I noticed, was that the 'glassfishAppDeployer' property of the script is still not shown in the GLU console. Even after making it serializable. So i went looking for what it takes for this to happen and find out why it does not in this case.

I can see that the ScriptState.isSerializable method calls LangUtils.deepClone, so I tried calling that method myself for the 'glassfishAppDeployer' field in the script. This resulted in a ClassNotFoundException for the GlassfishAppDeployer class. That class is stored in the jar file with the deploy script. It seems that the LangUtils.deepClone can not resolve classes inside the jar file. We use the "script" : "class:/<FQCN>?cp=<URI to jar>" form in the model when loading the script for deployment. If instead I put the jar file in the agents 'lib' directory everything works fine.


I have made an example that I think also demonstrates a similar problem using the tutorial:

Groovy script file content:

class MyScript {
    static class Foo implements Serializable {
        def bar = "Foo.Bar"
    }
    def foo
    def install = {
        foo = new Foo()
        log.info "Install: foo.bar is ${foo.bar}"
        org.linkedin.util.lang.LangUtils.deepClone((Serializable) foo)
    }
    def uninstall = {
        log.info "Uninstall: foo.bar is ${foo.bar}"
    }
}

GLU Model (The script must be loaded from an external file to demonstrate the problem):

{
  "entries": [
    {
      "agent": "agent-1",
      "mountPoint": "/test",
      "script": "file:///home/thp/MyScript.groovy"
    }
  ],
  "fabric": "glu-dev-1",
  "id": "e0fad40db3aae3b432bceee66b2f11f994da766f",
  "metadata": {
    "name": "Empty System Model"
  }
}

Starting the tutorial and deploying this throws a "java.lang.ClassNotFoundException: MyScript$Foo" exception when calling deepClone.
If you disable the deepClone line and deploy the script, then you can see that the foo property is not shown in the glu console agent page.
If you stop and start the tutorial, and then try to undeploy the fabric, it throws a NullPointerException when it logs the value of 'foo.bar' during the uninstall phase. The 'foo' field is null after the restart. This is the same problem we saw originally (although through a script loaded from a .jar file).

I am guessing that the problem is the class loaders used in the *ScriptFactory classes, and that those loaders need to be used when checking for serializability and when restoring state.

Regards,
Thomas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Agent loose state information on restart

frenchyan
Administrator
I think what your analysis makes a lot of sense to me. Thanks for the details! I created a ticket to track it: https://github.com/pongasoft/glu/issues/306

Unfortunately I am travelling and will be moving very soon so it's going to be hard for me to get to it shortly. I will do my best.

Yan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Agent loose state information on restart

Thomas Pii
Than ks Yan,

It is no hurry to get it fixed for us. I think I have found an easy workaround for it.
I store the required state information in a map in the script itself and then lazy initialize the deployer when I need it:

    Map glassfishProperties = []
    // Must be transient or glu will attemp to call getDeployer during installation when enumerating script properties
    transient def deployer
    def getDeployer() {
      if (deployer == null) {
        deployer = new GlassFishAppDeployer(glassfishProperties)
      }
      return deployer;
    }

regards,
Thomas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Agent loose state information on restart

frenchyan
Administrator
In the end I think it is a much better solution anyway: the state of the script gets stored on the filesystem (so that you can stop and start the agent at will and you don't loose the state) as well as in ZooKeeper (so that the orchestration engine/console is notified of the changes in the state machine and the fields are available in the console/rest or via some ZooKeeper listener for your own purposes). The storage on the filesystem does use Java serialization, BUT storage in ZooKeeper does not (it is ONLY written by the agent, never read): it uses json serialization... so there is no way that the json serialization would be able to serialize a GlassFishAppDeployer even if it was Java serializable. This was done deliberately because it was never meant to be the permanent storage and you should be able to just look at the data without having to have the proper class loader in order to do so (how would the console do? it doesn't have the class loader, nor can it recreate it).

What you are doing is the recommended approach: always store json friendly types (basic types + maps and collections of same) and rebuild the actual object at will (which should happen ONLY in the event that the agent is stopped and restarted... so very infrequently!).

I still believe that glu should handle serialization properly with the proper class loader so I will fix it eventually, but fixing it would not help in seeing the inside of an opaque object in the console.

Yan
Loading...