How to Marshal Procs Using Rubinius
The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.
There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.
One example is the Binding
object. To create a new binding in Rubinius, you call Binding.setup
:
def self.setup(variables, code, static_scope, recv=nil)
bind = allocate()
bind.self = recv || variables.self
bind.variables = variables
bind.code = code
bind.static_scope = static_scope
return bind
end
This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding
in pure Ruby.
In fact, Rubinius implements Kernel#binding
itself in terms of Binding.setup
.
def binding
return Binding.setup(
Rubinius::VariableScope.of_sender,
Rubinius::CompiledMethod.of_sender,
Rubinius::StaticScope.of_sender,
self)
end
Yes, you're reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller's scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It's core to how Rubinius manages eval
, which of course makes heavy use of bindings.
Marshalling Procs
For a while, I have wanted the ability to Marshal.dump
a proc in Ruby. MRI has historically disallowed it, but there's nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.
Let's take a quick detour to learn about the constructs in question.
Rubinius::StaticScope
Rubinius represents Ruby's constant lookup scope as a Rubinius::StaticScope
object. Perhaps the easiest way to understand it would be to look at Ruby's built-in Module.nesting
function.
module Foo
p Module.nesting
module Bar
p Module.nesting
end
end
module Foo::Bar
p Module.nesting
end
# Output:
# [Foo]
# [Foo::Bar, Foo]
# [Foo::Bar]
Every execution context in Rubinius has a Rubinius::StaticScope
, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object
.
Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting
in Rubinius:
def nesting
scope = Rubinius::StaticScope.of_sender
nesting = []
while scope and scope.module != Object
nesting << scope.module
scope = scope.parent
end
nesting
end
A static scope also has an addition property called current_module
, which is used during class_eval
to define which module the runtime should add new methods to.
Adding Marshal.dump
support to a static scope is therefore quite easy:
class Rubinius::StaticScope
def marshal_dump
[@module, @current_module, @parent]
end
def marshal_load(array)
@module, @current_module, @parent = array
end
end
These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don't show up in the instance_variables
list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.
Rubinius::CompiledMethod
A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.
It's actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod
into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.
class Rubinius::CompiledMethod
def _dump(depth)
Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)])
end
def self._load(string)
scope, dump = Marshal.load(string)
cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump)
cm.scope = scope
cm
end
end
Rubinius::VariableScope
A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self
, and several other characteristics.
I wrote about the variable scope before. It's one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.
Dumping and loading the VariableScope
is also easy:
class VariableScope
def _dump(depth)
Marshal.dump([@method, @module, @parent, @self, nil, locals])
end
def self._load(string)
VariableScope.synthesize *Marshal.load(string)
end
end
The synthesize
method is new to Rubinius master; getting a new variable scope previously required synthesizing its locals using class_eval
, and the new method is better.
Rubinius::BlockEnvironment
A Proc is basically nothing but a wrapper around a Rubinius::BlockEnvironment
, which wraps up all of the objects we've been working with so far. Its scope
attribute is a VariableScope
and its code
attribute is a CompiledMethod
.
Dumping it should be quite familiar by now.
class BlockEnvironment
def marshal_dump
[@scope, @code]
end
def marshal_load(array)
scope, code = *array
under_context scope, code
end
end
The only thing new here is the under_context
method, which gives a BlockEnvironment
its variable scope and compiled method. Note that we dumped the static scope along with the compiled method above.
Proc
Finally, a Proc is just a wrapper around a BlockEnvironment, so dumping it is easy:
class Proc
def _dump(depth)
Marshal.dump(@block)
end
def self._load(string)
block = Marshal.load(string)
self.__from_block__(block)
end
end
The from_block
method constructs a new Proc from a BlockEnvironment.
So there you have it. Dumping and reloading Proc objects in pure Ruby using Rubinius! (the full source is at https://gist.github.com/1378518).