Getting Comfortable With Rubinius' Pure-Ruby Internals
You probably know that Rubinius is a Ruby whose implementation is mostly written in Ruby. While that sounds nice in theory, you may not know what that means in practice. Over the past several years, I've contributed on and off to Rubinius, and feel that as Rubinius has matured since the 1.0 release, a lot of its promise has gelled.
One of the great things about Rubinius is that it exposes, as first-class concepts, a lot of things that are merely implicit in other Ruby implementations. For instance, Ruby methods have a variable scope which is implicitly accessible by blocks created in the scope. Rubinius exposes that scope as Rubinus::VariableScope
. It also exposes Ruby bindings as richer objects that you can create yourself and use. In this post, I will talk about Rubinius::VariableScope, how it works, and how it fits into the Rubinius codebase.
Prologue
For those of you who don't know much about Rubinius, it's a Ruby implementation whose core classes are mostly written in Ruby. Some functionality, such as the core of the object model, as well as low-level methods, are written in C++ and exposed into Ruby via its primitive system. However, the functionality is typically fairly low-level, and the vast majority of the functionality is written on top of those primitives. For example, a Ruby String in Rubinius is a mostly pure-Ruby object with a backing ByteArray, and ByteArray's implementation is mostly written as a primitive.
For instance, String's allocate
method looks like this:
def self.allocate
str = super()
str.__data__ = Rubinius::ByteArray.new(1)
str.num_bytes = 0
str.characters = 0
str
end
If you just want to get to the meat of the matter, feel free to skip this section and go directly to the Internals section below.
In general, Rubinius' code is broken up into three stages. The first stage, called alpha, creates just enough raw Ruby constructs to get to the next stage, the bootstrap stage. It defines methods like Class#new
, Kernel#send
, Kernel#raise
, Kernel#clone
, basic methods on Module, Object, String, and Symbol, and some classes on the Rubinius
module, used for internal use. It's a pretty short file, weighing it at under 1,000 lines, and I'd encourage you to read it through. It is located in kernel/alpha.rb
.
Next, the bootstrap stage creates the basic methods needed on many of Ruby's core classes. In some cases, it defines a simpler version of a common structure, like Rubinius::CompactLookupTable
and Rubinius::LookupTable
instead of Hash
. It also contains Rubinius constructs like the Rubinius::CompiledMethod
, Rubinius::MethodTable
, Rubinius::StaticScope
and Rubinius::VariableScope
. Many of these methods are defined as primitives. Primitive methods are hooked up in Ruby; check out the bootstrap definition of Rubinius::VariableScope
:
module Rubinius
class VariableScope
def self.of_sender
Ruby.primitive :variable_scope_of_sender
raise PrimitiveFailure, "Unable to get VariableScope of sender"
end
def self.current
Ruby.primitive :variable_scope_current
raise PrimitiveFailure, "Unable to get current VariableScope"
end
def locals
Ruby.primitive :variable_scope_locals
raise PrimitiveFailure, "Unable to get VariableScope locals"
end
# To handle Module#private, protected
attr_accessor :method_visibility
end
end
The bootstrap definitions are in kernel/bootstrap
. The file kernel/bootstrap/load_order.txt
defines the order that bootstrap methods are loaded.
Next, Rubinius fleshes out the core classes, mostly using Ruby methods. There is some primitive use in this stage, but it's fairly few and far between. For instance, the Hash class is written entirely in Ruby, using a Rubinius::Tuple
for its entries. This code can be found in kernel/common
.
Some additional code lives in kernel/platform
, which contains platform-specific functionality, like File and POSIX functionality, and kernel/delta
, which runs after kernel/common
. For the full description of the bootstrapping process, check out Rubinius: Bootstrapping.
Internals
One of the really nice things about Rubinius is that many concepts that are internal in traditional Ruby are exposed directly into the Ruby runtime. In many cases, this is so that more of Rubinius can be implemented in pure Ruby. In other words, if a core method in MRI needs some behavior, it is often implemented directly in C and unavailable to custom Ruby code.
One good example is the behavior of $1
in a Ruby method. If you call the =~
method, MRI's internal C code will walk back up to the caller and set an internal match object on it. If you write your own method that uses the =~
method, there is no way for you to set the match object on the caller. However, since Rubinius itself implements these methods in Ruby, it needs a way to perform this operation directly from Ruby code. Check out the Rubinius implementation of =~
:
class Regex
def =~(str)
# unless str.nil? because it's nil and only nil, not false.
str = StringValue(str) unless str.nil?
match = match_from(str, 0)
if match
Regexp.last_match = match
return match.begin(0)
else
Regexp.last_match = nil
return nil
end
end
end
There are a few interesting things going on here. First, Rubinius calls StringValue(str)
, which is a Ruby method that implements Ruby's String coercion protocol. If you follow the method, you will find that it calls Type.coerce_to(obj, String, :to_str)
, which then, in pure-Ruby, coerces the object into a String. You can use this method anywhere in your code if you want to make use of the String coercion protocol with the standard error messages.
Next, Rubinius actually invokes the match call, which ends up terminating in a primitive called search_region
, which binds directly into the Oniguruma regular expression engine. The interesting part is next: Regexp.last_match = match
. This code sets the $1 property on the caller, from Ruby. This means that if you write a custom method involving regular expressions and want it to obey the normal $1
protocol, you can, because if Rubinius needs functionality like this, it is by definition exposed to Ruby.
VariableScope
One result of this is that you can get access to internal structures like the current "variable scope". A variable scope is an object that wraps the notion of which locals are available (both real locals and locals defined using eval), as well as the current contents of each of those variables. If the current context is a block, a VariableScope
also has a pointer to the parent VariableScope
. You can get access to the current VariableScope
by using VariableScope.current
.
def foo
x = 1
p [:root, Rubinius::VariableScope.current.locals]
["a", "b", "c"].each do |item|
p [:parent, Rubinius::VariableScope.current.parent.locals]
p [:item, Rubinius::VariableScope.current.locals]
end
end
foo
The output will be:
[:root, #<Rubinius::Tuple: 1>]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "a">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "b">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "c">]
A Rubinius::Tuple
is a Ruby object that is similar to an Array, but is fixed-size, simpler, and usually more performant. The VariableScope
has a number of additional useful methods, which you can learn about by reading kernel/bootstrap/variable_scope.rb
and kernel/common/variable_scope.rb
. One example: locals_layout
gives you the names of the local variables for each slot in the tuple. You can also find out the current visibility of methods that would be declared in the scope (for instance, if you call private
the VariableScope
after that call will have method_visibility
of :private
).
Perhaps the most exciting thing about VariableScope
is that you can look up the VariableScope
of your calling method. Check this out:
def root
x = 1
["a", "b", "c"].each do |item|
call_me
end
end
def call_me
sender = Rubinius::VariableScope.of_sender
p [:parent, sender.parent.locals]
p [:item, sender.locals]
end
root
In this case, you can get the VariableScope
of the caller, and get any live information from it. In this case, you can tell that the method was called from inside a block, what the locals inside the block contain, and what block contexts exist up to the method root. This is extremely powerful, and gives you a level of introspection into a running Ruby program heretofore not possible (or very difficult). Better yet, the existence of this feature doesn't slow down the rest of the program, because Rubinius uses it internally for common Ruby functionality like the implementation of private
and protected
.
Implementing caller_locals
Now that we understand the VariableScope object, what if we want to print out a list of all of the local variables in the caller of a method.
def main
a = "Hello"
[1,2,3].each do |b|
say_hello do
puts "#{a}: #{b}"
end
end
end
def say_hello
# this isn't working as we expect, and we want to look at
# the local variables of the calling method to see what's
# going on
yield
ensure
puts "Goodbye"
end
Let's implement a method called caller_locals
which will give us all of the local variables in the calling method. We will not be able to use VariableScope.of_sender
, since that only goes one level back, and since we will implement caller_locals
in Ruby, we will need to go two levels back. Thankfully, Rubinius::VM.backtrace
has the information we need.
Rubinius::VM.backtrace
takes two parameters. The first is the number of levels back to start from. In this case we want to start two levels back (the caller will be the say_hello
method, and we want its caller). The second parameter indicates whether the Location
objects should contain the VariableScope
, which is what we want.
NOTE: Yes, you read that right. In Rubinius, you can obtain a properly object-oriented backtrace object, where each entry is a Location
object. Check out kernel/common/location.rb
to see the definition of the class.
In this case, if we pass true
, to Rubinius::VM.backtrace
, each Location
will have a method called variables
, which is a Rubinus::VariableScope
object.
def caller_locals
variables = Rubinius::VM.backtrace(2, true).first.variables
end
The next thing you need to know is that each block in a method has its own VariableScope
and a parent
pointing at the VariableScope
for the parent block.
Each VariableScope
also has a pointer to its method
, an object that, among other things, keeps a list of the local variable names. These lists are both ordered the same.
To collect up the locals from the caller, we will walk this chain of VariableScope
objects until we get nil.
def caller_locals
variables = Rubinius::VM.backtrace(2, true).first.variables
locals = []
while variables
names = variables.method.local_names
values = variables.locals
locals << Hash[*names.zip(values).flatten]
variables = variables.parent
end
locals
end
This will return an Array of local variable Hashes, starting from the caller's direct scope and working up to the method body itself. Let's change the original method to use caller_locals
:
def say_hello
# this isn't working as we expect, and we want to look at
# the local variables of the calling method to see what's
# going on. Let's use caller_locals.
p caller_locals
yield
ensure
puts "Goodbye"
end
If we run the program now, the output will be:
[{:b=>1}, {:a=>"Hello"}]
Goodbye
[{:b=>2}, {:a=>"Hello"}]
Goodbye
[{:b=>3}, {:a=>"Hello"}]
Goodbye
This is just the tip of the iceberg of what is possible because of Rubinius' architecture. To be clear, these features are not accidents; because so much of Rubinius (including things like backtraces), it is forced to expose details of the internals to normal Ruby program. And because the Rubinius kernel is real Ruby (and not some kind of special Ruby subset), anything that works in the kernel will work in your Ruby code as well.