The Importance of Executable Class Bodies

I spent the past few days at JavaOne, where I gave a well-received talk on Ruby, and got to attend a number of sessions on both Ruby and other related technologies.

Out of curiosity, I went to a session on Groovy, a language that has a syntax that is derived directly from Java, but with semantics that are fairly close to Ruby's. Groovy is missing a number of features that Ruby has, and is more clunky in a number of cases.

For instance, while Ruby has pure open classes, Groovy allows you to open or reopen the metaclass of a class and insert new methods. Groovy 1.6 (released in February) added the ability to insert a number of methods to a metaclass at once.

But what I want to discuss here is another distinction. Unlike Ruby, Groovy does not allow executable code anywhere. Instead Groovy classes are compiled, so runtime code execution inside of class bodies can not work. This means that a large number of the features that make Rails stand out, like declarative callbacks and validations, various forms of accessors, runtime method generation based on introspecting the database, and other per-class mutable structures cannot be implemented nearly as elegantly in Groovy.

In case you're not familiar, Ruby doesn't need annotations, because class bodies in Ruby are simply executable code, with a self that is the class that is being defined.

Let's take a simple example. In Ruby, accessors are defined as followed:

class Car  
  attr_accessor :model, :make
end  

In Groovy, accessors are defined as:

class Car {  
  model
  make
}

At first glance, they seem pretty similar. In both cases, getters and setters are added, and new fields (in the respective languages) exist. The difference is that while Groovy needed to add new syntax to support this, Ruby's version can be implemented in Ruby itself:

class Class  
  def attr_accessor(*names)
    names.each do |name|
      class_eval "def #{name}() @#{name} end"
      class_eval "def #{name}=(val) @#{name} = val end"      
    end
  end
end  

Rubinius, a complete implementation of Ruby in Ruby, implements attr_accessor as:

class Class  
  def attr_reader(name)
    meth = Rubinius::AccessVariable.get_ivar name
    @method_table[name] = meth
    return nil
  end

  def attr_writer(name)
    meth = Rubinius::AccessVariable.set_ivar name
    @method_table["#{name}=".to_sym] = meth
    return nil
  end

  def attr_accessor(name)
    attr_reader(name)
    attr_writer(name)
    return true
  end
end  

Here, Rubinius exposes the method table to Ruby, and we store a method representing the instance variable directly into the method table. Obviously, this requires more Ruby infrastructure, but it shows how powerful "everything is executable code" can be.

In an effort to support Ruby's declarative style, Groovy has added what they call "AST Transformations", which allows a declarative rule plus some code to be converted, at compile time, into different code to be passed into the compiler.

To make this immediately useful, they shipped a bunch of these annotations with Groovy 1.6, so we can take a look at how this is supposed to work. One example is their "Lazy" annotation, which allows the creation of an accessor that is initialized to something slow, so you want to defer initializing the accessor until it is actually accessed. It works like this (from the Groovy documentation):

class Person {  
  @Lazy pets = ['Cat', 'Dog', 'Bird']
}

Assuming that creating that Array was slow, this would defer loading the Array until pets was accessed. Pretty nice. Unfortunately, implementing this nice abstraction is a non-trivial operation:

package org.codehaus.groovy.transform;

import org.codehaus.groovy.ast.*;  
import org.codehaus.groovy.ast.expr.*;  
import org.codehaus.groovy.ast.stmt.*;  
import org.codehaus.groovy.control.CompilePhase;  
import org.codehaus.groovy.control.SourceUnit;  
import org.codehaus.groovy.runtime.MetaClassHelper;  
import org.codehaus.groovy.syntax.Token;  
import org.objectweb.asm.Opcodes;

/**
 * Handles generation of code for the @Lazy annotation
 *
 * @author Alex Tkachman
 */
@GroovyASTTransformation(phase= CompilePhase.CANONICALIZATION)
public class LazyASTTransformation implements ASTTransformation, Opcodes {

    public void visit(ASTNode[] nodes, SourceUnit source) {
        if (!(nodes[0] instanceof AnnotationNode) || !(nodes[1] instanceof AnnotatedNode)) {
            throw new RuntimeException("Internal error: wrong types: $node.class / $parent.class");
        }

        AnnotatedNode parent = (AnnotatedNode) nodes[1];
        AnnotationNode node = (AnnotationNode) nodes[0];

        if (parent instanceof FieldNode) {
            FieldNode fieldNode = (FieldNode) parent;
            final Expression init = getInitExpr(fieldNode);

            fieldNode.rename("$" + fieldNode.getName());
            fieldNode.setModifiers(ACC_PRIVATE | (fieldNode.getModifiers() & (~(ACC_PUBLIC|ACC_PROTECTED))));

            create(fieldNode, init);
        }
    }

    private void create(FieldNode fieldNode, final Expression initExpr) {
        BlockStatement body = new BlockStatement();
        final FieldExpression fieldExpr = new FieldExpression(fieldNode);
        if ((fieldNode.getModifiers() & ACC_VOLATILE) == 0) {
            body.addStatement(new IfStatement(
                    new BooleanExpression(new BinaryExpression(fieldExpr, Token.newSymbol("!=",-1,-1), ConstantExpression.NULL)),
                    new ExpressionStatement(fieldExpr),
                    new ExpressionStatement(new BinaryExpression(fieldExpr, Token.newSymbol("=",-1,-1), initExpr))
            ));
        }
        else {
            body.addStatement(new IfStatement(
                new BooleanExpression(new BinaryExpression(fieldExpr, Token.newSymbol("!=",-1,-1), ConstantExpression.NULL)),
                new ReturnStatement(fieldExpr),
                new SynchronizedStatement(
                        VariableExpression.THIS_EXPRESSION,
                        new IfStatement(
                                new BooleanExpression(new BinaryExpression(fieldExpr, Token.newSymbol("!=",-1,-1), ConstantExpression.NULL)),
                                    new ReturnStatement(fieldExpr),
                                    new ReturnStatement(new BinaryExpression(fieldExpr,Token.newSymbol("=",-1,-1), initExpr))
                        )
                )
            ));
        }
        final String name = "get" + MetaClassHelper.capitalize(fieldNode.getName().substring(1));
        fieldNode.getDeclaringClass().addMethod(name, ACC_PUBLIC, fieldNode.getType(), Parameter.EMPTY_ARRAY, ClassNode.EMPTY_ARRAY, body);
    }

    private Expression getInitExpr(FieldNode fieldNode) {
        Expression initExpr = fieldNode.getInitialValueExpression();
        fieldNode.setInitialValueExpression(null);

        if (initExpr == null)
          initExpr = new ConstructorCallExpression(fieldNode.getType(), new ArgumentListExpression());

        return initExpr;
    }
}

Pretty ugly. If you look closely at the code, you'll see that the amount of code necessary to express even simple concepts is huge, because you're manipulating an actual AST.

A similar feature in Ruby might look like:

class Person  
  lazy(:pets) { ["Cat", "Dog", "Bird"] }
end  

And the implementation:

class Class  
  def lazy(name, &block)
    define_method("_lazy_#{name}", &block)
    class_eval "def #{name}() @#{name} ||= _lazy_#{name} end"
  end
end  

Because the lazy method is just a method being run on class at runtime, we can evaluate code live into the context. First, we define a method on the object called lazypets. Next, we define a method called pets that memoizes the results of calling that method into an instance variable. And that's it.

A slightly slower solution in Ruby that doesn't require eval is:

class Class  
  def lazy(name)
    ivar = "@#{name}"
    define_method(name) do
      instance_variable_get(ivar) || instance_variable_set(ivar, yield)
    end
  end
end  

In this case, since we're defining the method in a block, we still have access to the block that was passed in to the original lazy method, so we can yield to it inside the new method. Pretty cool, no?

Because all code is executable in Ruby, it's easy to abstract away repetitive code in around the same number of lines as it took to write the code in the first place. With these simple examples, it would be possible to implement a simpler way to express these transforms. But as these sorts of things are expected to compose well with each other, the flexibility of executable, runtime code starts to really add up, in the same way that languages that are dynamic at runtime can be more flexible and powerful than languages that try to precompute everything at compile time.