Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

How to Marshal Procs Using Rubinius

The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.

There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.

One example is the Binding object. To create a new binding in Rubinius, you call Binding.setup:

def self.setup(variables, code, static_scope, recv=nil)
  bind = allocate()
 
  bind.self = recv || variables.self
  bind.variables = variables
  bind.code = code
  bind.static_scope = static_scope
  return bind
end

This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding in pure Ruby.

In fact, Rubinius implements Kernel#binding itself in terms of Binding.setup.

def binding
  return Binding.setup(
    Rubinius::VariableScope.of_sender,
    Rubinius::CompiledMethod.of_sender,
    Rubinius::StaticScope.of_sender,
    self)
end

Yes, you’re reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller’s scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It’s core to how Rubinius manages eval, which of course makes heavy use of bindings.

Marshalling Procs

For a while, I have wanted the ability to Marshal.dump a proc in Ruby. MRI has historically disallowed it, but there’s nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.

Let’s take a quick detour to learn about the constructs in question.

Rubinius::StaticScope

Rubinius represents Ruby’s constant lookup scope as a Rubinius::StaticScope object. Perhaps the easiest way to understand it would be to look at Ruby’s built-in Module.nesting function.

module Foo
  p Module.nesting
 
  module Bar
    p Module.nesting
  end
end
 
module Foo::Bar
  p Module.nesting
end
 
# Output:
# [Foo]
# [Foo::Bar, Foo]
# [Foo::Bar]

Every execution context in Rubinius has a Rubinius::StaticScope, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object.

Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting in Rubinius:

def nesting
  scope = Rubinius::StaticScope.of_sender
  nesting = []
  while scope and scope.module != Object
    nesting << scope.module
    scope = scope.parent
  end
  nesting
end

A static scope also has an addition property called current_module, which is used during class_eval to define which module the runtime should add new methods to.

Adding Marshal.dump support to a static scope is therefore quite easy:

class Rubinius::StaticScope
  def marshal_dump
    [@module, @current_module, @parent]
  end
 
  def marshal_load(array)
    @module, @current_module, @parent = array
  end
end

These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don’t show up in the instance_variables list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.

Rubinius::CompiledMethod

A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.

It’s actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.

class Rubinius::CompiledMethod
  def _dump(depth)
    Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)])
  end
 
  def self._load(string)
    scope, dump = Marshal.load(string)
    cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump)
    cm.scope = scope
    cm
  end
end

Rubinius::VariableScope

A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self, and several other characteristics.

I wrote about the variable scope before. It’s one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.

Dumping and loading the VariableScope is also easy:

class VariableScope
  def _dump(depth)
    Marshal.dump([@method, @module, @parent, @self, nil, locals])
  end
 
  def self._load(string)
    VariableScope.synthesize *Marshal.load(string)
  end
end

The synthesize method is new to Rubinius master; getting a new variable scope previously required synthesizing its locals using class_eval, and the new method is better.

Rubinius::BlockEnvironment

A Proc is basically nothing but a wrapper around a Rubinius::BlockEnvironment, which wraps up all of the objects we’ve been working with so far. Its scope attribute is a VariableScope and its code attribute is a CompiledMethod.

Dumping it should be quite familiar by now.

class BlockEnvironment
  def marshal_dump
    [@scope, @code]
  end
 
  def marshal_load(array)
    scope, code = *array
    under_context scope, code
  end
end

The only thing new here is the under_context method, which gives a BlockEnvironment its variable scope and compiled method. Note that we dumped the static scope along with the compiled method above.

Proc

Finally, a Proc is just a wrapper around a BlockEnvironment, so dumping it is easy:

class Proc
  def _dump(depth)
    Marshal.dump(@block)
  end
 
  def self._load(string)
    block = Marshal.load(string)
    self.__from_block__(block)
  end
end

The __from_block__ method constructs a new Proc from a BlockEnvironment.

So there you have it. Dumping and reloading Proc objects in pure Ruby using Rubinius! (the full source is at https://gist.github.com/1378518).

8 Responses to “How to Marshal Procs Using Rubinius”

Cute! It is actually possible to do this from Ruby in JRuby too. It isn’t necessary for an impl to be implemented in Ruby for this to be possible…just accessible from Ruby.

In JRuby, everything…even the parts that are in Java…is accessible from Ruby. You can get raw access to a binding, scopes, class and module internals…whatever. But we don’t normally expose any of it since we are believe it is important not to expose impl-specific logic to user code unless the user opts in. Miraculous things. might be possible if users access JRuby internals directly, but that’s not Ruby.

In any case, it’s neat to see Rubinius can do this too. I hope nobody ever does it :) If you want procs to be marshalable, perhaps we can try to get it added to Ruby instead, rather than encouraging people to use one impl’s internal APIs :)

@Charles the fact that, in Rubinius, these things are exposed to Ruby *in order to be used in its own implementations of core Ruby methods* is significant. It means that learning about their purpose and how they work is very easy, and that they have more of a flavor of public API than Java methods that happen to be exposed to Ruby through the standard JI interface.

I’m not sure how that’s relevant. What you’re showing here is cute…a glimpse inside Rubinius internals. Are you advocating that people should use this in their code? Advocating that users should embrace and extend Ruby in ways that are specific to a single implementation?

Whether the APIs are nice or “Ruby flavored” is entirely beside the point. They could be strawberry flavored and it would still be internal APIs used in an implementation-specific way to produce something…not Ruby. Ruby does not marshal procs. Perhaps it should…and perhaps it’s possible to add it. But I think you’ll agree it’s a bad idea to advocate forking Ruby using one implementation’s features, regardless of how fun and Ruby-flavored it is.

And in case you’re not advocating people doing this in their own code…then so be it, I agree it’s a cute hack :)

I didn’t get the feeling that Yehuda was advocating anything. And this writeup is more than “a cute hack”.

I actually had a need for something similar to this the other day. I was implementing a scriptable system (scripting in Ruby, implemented in Ruby) in which some of the method calls the user could invoke were external and long running (on order of hours, even days to complete). It would have made sense for me to record the execution context and reload it when I had received notification that their processing requests had completed. Then I could have just resumed the script. But since Ruby didn’t support serializable continuations, I resorted to a different approach. I like knowing that my initial is probably possible under a different runtime, even if it isn’t standard Ruby.

Seaside utilized continuations which made for a very approachable way of developing web apps. Starting with code above, a similar approach could be adapted in Ruby.

Thanks for sharing Yehuda.

@Jim Your problem sounds like a good candidate for MagLev (http://maglev.github.com/) if I’m not mistaken (which i very well could be..).

Having access to your program’s context hardly seems like it can be a bad thing. Call it reflection instead of internals if it makes you feel better. http://gilesbowkett.blogspot.com/2009/07/do-you-believe-in-magic.html

I implemented this many years ago for MRI 1.6 and 1.8 in a library called nodewrap, which was one of the original inspirations for Evan creating Rubinius. The project has now evolved into a more general library called ruby-internal:

http://rubystuff.org/ruby-internal/

It can still be used to marshal procs:

irb(main):005:0> require ‘internal/proc’
=> true
irb(main):006:0> Marshal.dump(p)
=> “04\bu:\tProc01\26704\b[\au:22Node::NEWLINE01\24004\b[\ai\374\314\212\350\333{\ni\374\f\214\350\333[\ni03?\32031\”\n(irb)[\ai17i0600i\374D\213\350\333[\ni03?P31\”\n(irb)i\374\f\214\350\333i060i\374\b\213\350\333[\ni03?(31\”\n(irb)i\374D\213\350\333:06+i\374\362\214\350\333i\374\314\212\350\333[\ni03?\37032\”\n(irb)i\374\b\213\350\33300i\374\362\214\350\333[\ni03?\32031\”\n(irb)[\ai17i06000″

It can even marshal methods and entire classes.

I never really did much with this other than implement it, because I always felt like it was a solution looking for a problem. It’s a lot more practical to marshal your state than to marshal your code.

Jim: Serializing continuations is a completely different world. This example serializes only the variables captured by the proc and a bit of additional state. It’s not serializing actual call frames, as would be necessary for serializing continuations. That’s a whole other ball of wax. Maglev is able to do this because it’s designed to support it out of the box…the data on the stack is transportable, unlike the data on the stack of most other VMs.

Brennan: I wouldn’t say it’s a bad thing…just a thing that ties you to a specific impl. In every case I know of, allowing users reflective access to the call stack eventually bites you in the ass. At the very least, it’s extremely difficult to optimize; just ask the folks who have attempted to implement Python’s frame access. I only object to hacks like this because they’ll never work on other implementations with this API, and even if there were a standard API made available it would paint Ruby into a very troublesome corner.

Anyway, like I say, it’s neat to see into Rubinius internals. Here’s the same thing in JRuby…don’t do it: https://gist.github.com/1378616

Leave a Reply

Archives

Categories

Meta