How to Marshal Procs Using Rubinius

The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.

There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.

One example is the Binding object. To create a new binding in Rubinius, you call Binding.setup:

def self.setup(variables, code, static_scope, recv=nil)
  bind = allocate()

  bind.self = recv || variables.self
  bind.variables = variables
  bind.code = code
  bind.static_scope = static_scope
  return bind
end

This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding in pure Ruby.

In fact, Rubinius implements Kernel#binding itself in terms of Binding.setup.

def binding
  return Binding.setup(
    Rubinius::VariableScope.of_sender,
    Rubinius::CompiledMethod.of_sender,
    Rubinius::StaticScope.of_sender,
    self)
end

Yes, you're reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller's scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It's core to how Rubinius manages eval, which of course makes heavy use of bindings.

Marshalling Procs

For a while, I have wanted the ability to Marshal.dump a proc in Ruby. MRI has historically disallowed it, but there's nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.

Let's take a quick detour to learn about the constructs in question.

Rubinius::StaticScope

Rubinius represents Ruby's constant lookup scope as a Rubinius::StaticScope object. Perhaps the easiest way to understand it would be to look at Ruby's built-in Module.nesting function.

module Foo
  p Module.nesting

  module Bar
    p Module.nesting
  end
end

module Foo::Bar
  p Module.nesting
end

# Output:
# [Foo]
# [Foo::Bar, Foo]
# [Foo::Bar]

Every execution context in Rubinius has a Rubinius::StaticScope, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object.

Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting in Rubinius:

def nesting
  scope = Rubinius::StaticScope.of_sender
  nesting = []
  while scope and scope.module != Object
    nesting << scope.module
    scope = scope.parent
  end
  nesting
end

A static scope also has an addition property called current_module, which is used during class_eval to define which module the runtime should add new methods to.

Adding Marshal.dump support to a static scope is therefore quite easy:

class Rubinius::StaticScope
  def marshal_dump
    [@module, @current_module, @parent]
  end

  def marshal_load(array)
    @module, @current_module, @parent = array
  end
end

These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don't show up in the instance_variables list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.

Rubinius::CompiledMethod

A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.

It's actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.

class Rubinius::CompiledMethod
  def _dump(depth)
    Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)])
  end

  def self._load(string)
    scope, dump = Marshal.load(string)
    cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump)
    cm.scope = scope
    cm
  end
end

Rubinius::VariableScope

A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self, and several other characteristics.

I wrote about the variable scope before. It's one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.

Dumping and loading the VariableScope is also easy:

class VariableScope
  def _dump(depth)
    Marshal.dump([@method, @module, @parent, @self, nil, locals])
  end

  def self._load(string)
    VariableScope.synthesize *Marshal.load(string)
  end
end

The synthesize method is new to Rubinius master; getting a new variable scope previously required synthesizing its locals using class_eval, and the new method is better.

Rubinius::BlockEnvironment

A Proc is basically nothing but a wrapper around a Rubinius::BlockEnvironment, which wraps up all of the objects we've been working with so far. Its scope attribute is a VariableScope and its code attribute is a CompiledMethod.

Dumping it should be quite familiar by now.

class BlockEnvironment
  def marshal_dump
    [@scope, @code]
  end

  def marshal_load(array)
    scope, code = *array
    under_context scope, code
  end
end

The only thing new here is the under_context method, which gives a BlockEnvironment its variable scope and compiled method. Note that we dumped the static scope along with the compiled method above.

Proc

Finally, a Proc is just a wrapper around a BlockEnvironment, so dumping it is easy:

class Proc
  def _dump(depth)
    Marshal.dump(@block)
  end

  def self._load(string)
    block = Marshal.load(string)
    self.__from_block__(block)
  end
end

The from_block method constructs a new Proc from a BlockEnvironment.

So there you have it. Dumping and reloading Proc objects in pure Ruby using Rubinius! (the full source is at https://gist.github.com/1378518).