8 min read

Getting Comfortable With Rubinius' Pure-Ruby Internals

You probably know that Rubinius is a Ruby whose implementation is mostly written in Ruby. While that sounds nice in theory, you may not know what that means in practice. Over the past several years, I've contributed on and off to Rubinius, and feel that as Rubinius has matured since the 1.0 release, a lot of its promise has gelled.

One of the great things about Rubinius is that it exposes, as first-class concepts, a lot of things that are merely implicit in other Ruby implementations. For instance, Ruby methods have a variable scope which is implicitly accessible by blocks created in the scope. Rubinius exposes that scope as Rubinus::VariableScope. It also exposes Ruby bindings as richer objects that you can create yourself and use. In this post, I will talk about Rubinius::VariableScope, how it works, and how it fits into the Rubinius codebase.

Prologue

For those of you who don't know much about Rubinius, it's a Ruby implementation whose core classes are mostly written in Ruby. Some functionality, such as the core of the object model, as well as low-level methods, are written in C++ and exposed into Ruby via its primitive system. However, the functionality is typically fairly low-level, and the vast majority of the functionality is written on top of those primitives. For example, a Ruby String in Rubinius is a mostly pure-Ruby object with a backing ByteArray, and ByteArray's implementation is mostly written as a primitive.

For instance, String's allocate method looks like this:

def self.allocate
  str = super()
  str.__data__ = Rubinius::ByteArray.new(1)
  str.num_bytes = 0
  str.characters = 0
  str
end

If you just want to get to the meat of the matter, feel free to skip this section and go directly to the Internals section below.

In general, Rubinius' code is broken up into three stages. The first stage, called alpha, creates just enough raw Ruby constructs to get to the next stage, the bootstrap stage. It defines methods like Class#new, Kernel#send, Kernel#raise, Kernel#clone, basic methods on Module, Object, String, and Symbol, and some classes on the Rubinius module, used for internal use. It's a pretty short file, weighing it at under 1,000 lines, and I'd encourage you to read it through. It is located in kernel/alpha.rb.

Next, the bootstrap stage creates the basic methods needed on many of Ruby's core classes. In some cases, it defines a simpler version of a common structure, like Rubinius::CompactLookupTable and Rubinius::LookupTable instead of Hash. It also contains Rubinius constructs like the Rubinius::CompiledMethod, Rubinius::MethodTable, Rubinius::StaticScope and Rubinius::VariableScope. Many of these methods are defined as primitives. Primitive methods are hooked up in Ruby; check out the bootstrap definition of Rubinius::VariableScope:

module Rubinius
  class VariableScope
    def self.of_sender
      Ruby.primitive :variable_scope_of_sender
      raise PrimitiveFailure, "Unable to get VariableScope of sender"
    end

    def self.current
      Ruby.primitive :variable_scope_current
      raise PrimitiveFailure, "Unable to get current VariableScope"
    end

    def locals
      Ruby.primitive :variable_scope_locals
      raise PrimitiveFailure, "Unable to get VariableScope locals"
    end

    # To handle Module#private, protected
    attr_accessor :method_visibility
  end
end

The bootstrap definitions are in kernel/bootstrap. The file kernel/bootstrap/load_order.txt defines the order that bootstrap methods are loaded.

Next, Rubinius fleshes out the core classes, mostly using Ruby methods. There is some primitive use in this stage, but it's fairly few and far between. For instance, the Hash class is written entirely in Ruby, using a Rubinius::Tuple for its entries. This code can be found in kernel/common.

Some additional code lives in kernel/platform, which contains platform-specific functionality, like File and POSIX functionality, and kernel/delta, which runs after kernel/common. For the full description of the bootstrapping process, check out Rubinius: Bootstrapping.

Internals

One of the really nice things about Rubinius is that many concepts that are internal in traditional Ruby are exposed directly into the Ruby runtime. In many cases, this is so that more of Rubinius can be implemented in pure Ruby. In other words, if a core method in MRI needs some behavior, it is often implemented directly in C and unavailable to custom Ruby code.

One good example is the behavior of $1 in a Ruby method. If you call the =~ method, MRI's internal C code will walk back up to the caller and set an internal match object on it. If you write your own method that uses the =~ method, there is no way for you to set the match object on the caller. However, since Rubinius itself implements these methods in Ruby, it needs a way to perform this operation directly from Ruby code. Check out the Rubinius implementation of =~:

class Regex
  def =~(str)
    # unless str.nil? because it's nil and only nil, not false.
    str = StringValue(str) unless str.nil?

    match = match_from(str, 0)
    if match
      Regexp.last_match = match
      return match.begin(0)
    else
      Regexp.last_match = nil
      return nil
    end
  end
end

There are a few interesting things going on here. First, Rubinius calls StringValue(str), which is a Ruby method that implements Ruby's String coercion protocol. If you follow the method, you will find that it calls Type.coerce_to(obj, String, :to_str), which then, in pure-Ruby, coerces the object into a String. You can use this method anywhere in your code if you want to make use of the String coercion protocol with the standard error messages.

Next, Rubinius actually invokes the match call, which ends up terminating in a primitive called search_region, which binds directly into the Oniguruma regular expression engine. The interesting part is next: Regexp.last_match = match. This code sets the $1 property on the caller, from Ruby. This means that if you write a custom method involving regular expressions and want it to obey the normal $1 protocol, you can, because if Rubinius needs functionality like this, it is by definition exposed to Ruby.

VariableScope

One result of this is that you can get access to internal structures like the current "variable scope". A variable scope is an object that wraps the notion of which locals are available (both real locals and locals defined using eval), as well as the current contents of each of those variables. If the current context is a block, a VariableScope also has a pointer to the parent VariableScope. You can get access to the current VariableScope by using VariableScope.current.

def foo
  x = 1
  p [:root, Rubinius::VariableScope.current.locals]
  ["a", "b", "c"].each do |item|
    p [:parent, Rubinius::VariableScope.current.parent.locals]
    p [:item, Rubinius::VariableScope.current.locals]
  end
end

foo

The output will be:

[:root, #<Rubinius::Tuple: 1>]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "a">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "b">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "c">]

A Rubinius::Tuple is a Ruby object that is similar to an Array, but is fixed-size, simpler, and usually more performant. The VariableScope has a number of additional useful methods, which you can learn about by reading kernel/bootstrap/variable_scope.rb and kernel/common/variable_scope.rb. One example: locals_layout gives you the names of the local variables for each slot in the tuple. You can also find out the current visibility of methods that would be declared in the scope (for instance, if you call private the VariableScope after that call will have method_visibility of :private).

Perhaps the most exciting thing about VariableScope is that you can look up the VariableScope of your calling method. Check this out:

def root
  x = 1
  ["a", "b", "c"].each do |item|
    call_me
  end
end

def call_me
  sender = Rubinius::VariableScope.of_sender
  p [:parent, sender.parent.locals]
  p [:item, sender.locals]
end

root

In this case, you can get the VariableScope of the caller, and get any live information from it. In this case, you can tell that the method was called from inside a block, what the locals inside the block contain, and what block contexts exist up to the method root. This is extremely powerful, and gives you a level of introspection into a running Ruby program heretofore not possible (or very difficult). Better yet, the existence of this feature doesn't slow down the rest of the program, because Rubinius uses it internally for common Ruby functionality like the implementation of private and protected.

Implementing caller_locals

Now that we understand the VariableScope object, what if we want to print out a list of all of the local variables in the caller of a method.

def main
  a = "Hello"

  [1,2,3].each do |b|
    say_hello do
      puts "#{a}: #{b}"
    end
  end
end

def say_hello
  # this isn't working as we expect, and we want to look at
  # the local variables of the calling method to see what's
  # going on
  yield
ensure
  puts "Goodbye"
end

Let's implement a method called caller_locals which will give us all of the local variables in the calling method. We will not be able to use VariableScope.of_sender, since that only goes one level back, and since we will implement caller_locals in Ruby, we will need to go two levels back. Thankfully, Rubinius::VM.backtrace has the information we need.

Rubinius::VM.backtrace takes two parameters. The first is the number of levels back to start from. In this case we want to start two levels back (the caller will be the say_hello method, and we want its caller). The second parameter indicates whether the Location objects should contain the VariableScope, which is what we want.

NOTE: Yes, you read that right. In Rubinius, you can obtain a properly object-oriented backtrace object, where each entry is a Location object. Check out kernel/common/location.rb to see the definition of the class.

In this case, if we pass true, to Rubinius::VM.backtrace, each Location will have a method called variables, which is a Rubinus::VariableScope object.

def caller_locals
  variables = Rubinius::VM.backtrace(2, true).first.variables
end

The next thing you need to know is that each block in a method has its own VariableScope and a parent pointing at the VariableScope for the parent block.

Each VariableScope also has a pointer to its method, an object that, among other things, keeps a list of the local variable names. These lists are both ordered the same.

To collect up the locals from the caller, we will walk this chain of VariableScope objects until we get nil.

def caller_locals
  variables = Rubinius::VM.backtrace(2, true).first.variables

  locals = []

  while variables
    names  = variables.method.local_names
    values = variables.locals

    locals << Hash[*names.zip(values).flatten]
    variables = variables.parent
  end

  locals
end

This will return an Array of local variable Hashes, starting from the caller's direct scope and working up to the method body itself. Let's change the original method to use caller_locals:

def say_hello
  # this isn't working as we expect, and we want to look at
  # the local variables of the calling method to see what's
  # going on. Let's use caller_locals.
  p caller_locals
  yield
ensure
  puts "Goodbye"
end

If we run the program now, the output will be:

[{:b=>1}, {:a=>"Hello"}]
Goodbye
[{:b=>2}, {:a=>"Hello"}]
Goodbye
[{:b=>3}, {:a=>"Hello"}]
Goodbye

This is just the tip of the iceberg of what is possible because of Rubinius' architecture. To be clear, these features are not accidents; because so much of Rubinius (including things like backtraces), it is forced to expose details of the internals to normal Ruby program. And because the Rubinius kernel is real Ruby (and not some kind of special Ruby subset), anything that works in the kernel will work in your Ruby code as well.