Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor.

@olivierlacan sorry :( my office sucks for podcasts; WiFi and rooms. Was the content bad or just the audio? I wish to improve!

Archive for November, 2009

Metaprogramming in Ruby: It’s All About the Self

After writing my last post on Rails plugin idioms, I realized that Ruby metaprogramming, at its core, is actually quite simple.

It comes down to the fact that all Ruby code is executed code–there is no separate compile or runtime phase. In Ruby, every line of code is executed against a particular self. Consider the following five snippets:

class Person
  def self.species
    "Homo Sapien"
  end
end
 
class Person
  class << self
    def species
      "Homo Sapien"
    end
  end
end
 
class << Person
  def species
    "Homo Sapien"
  end
end
 
Person.instance_eval do
  def species
    "Homo Sapien"
  end
end
 
def Person.species
  "Homo Sapien"
end

All five of these snippets define a Person.species that returns Homo Sapien. Now consider another set of snippets:

class Person
  def name
    "Matz"
  end
end
 
Person.class_eval do
  def name
    "Matz"
  end
end

These snippets all define a method called name on the Person class. So Person.new.name will return “Matz”. For those familiar with Ruby, this isn’t news. When learning about metaprogramming, each of these snippets is presented in isolation: another mechanism for getting methods where they “belong”. In fact, however, there is a single unified reason that all of these snippets work the way they do.

First, it is important to understand how Ruby’s metaclass works. When you first learn Ruby, you learn about the concept of the class, and that each object in Ruby has one:

class Person
end
 
Person.class #=> Class
 
class Class
  def loud_name
    "#{name.upcase}!"
  end
end
 
Person.loud_name #=> "PERSON!"

Person is an instance of Class, so any methods added to Class are available on Person as well. What they don’t tell you, however, is that each object in Ruby also has its own metaclass, a Class that can have methods, but is only attached to the object itself.

matz = Object.new
def matz.speak
  "Place your burden to machine's shoulders"
end

What’s going on here is that we’re adding the speak method to matz‘s metaclass, and the matz object inherits from its metaclass and then Object. The reason this is somewhat less clear than ideal is that the metaclass is invisible in Ruby:

matz = Object.new
def matz.speak
  "Place your burden to machine's shoulders"
end
 
matz.class #=> Object

In fact, matz‘s “class” is its invisible metaclass. We can even get access to the metaclass:

metaclass = class << matz; self; end
metaclass.instance_methods.grep(/speak/) #=> ["speak"]

At this point in other articles on this topic, you’re probably struggling to keep all of the details in your head; it seems as though there are so many rules. And what’s this class << matz thing anyway?

It turns out that all of these weird rules collapse down into a single concept: control over the self in a given part of the code. Let’s go back and take a look at some of the snippets we looked at earlier:

class Person
  def name
    "Matz"
  end
 
  self.name #=> "Person"
end

Here, we are adding the name method to the Person class. Once we say class Person, the self until the end of the block is the Person class itself.

Person.class_eval do
  def name
    "Matz"
  end
 
  self.name #=> "Person"
end

Here, we’re doing exactly the same thing: adding the name method to instances of the Person class. In this case, class_eval is setting the self to Person until the end of the block. This is all perfectly straight forward when dealing with classes, but it’s equally straight forward when dealing with metaclasses:

def Person.species
  "Homo Sapien"
end
 
Person.name #=> "Person"

As in the matz example earlier, we are defining the species method on Person‘s metaclass. We have not manipulated self, but you can see using def with an object attaches the method to the object’s metaclass.

class Person
  def self.species
    "Homo Sapien"
  end
 
  self.name #=> "Person"
end

Here, we have opened the Person class, setting the self to Person for the duration of the block, as in the example above. However, we are defining a method on Person‘s metaclass here, since we’re defining the method on an object (self). Also, you can see that self.name while inside the person class is identical to Person.name while outside it.

class << Person
  def species
    "Homo Sapien"
  end
 
  self.name #=> ""
end

Ruby provides a syntax for accessing an object’s metaclass directly. By doing class << Person, we are setting self to Person‘s metaclass for the duration of the block. As a result, the species method is added to Person‘s metaclass, rather than the class itself.

class Person
  class << self
    def species
      "Homo Sapien"
    end
 
    self.name #=> ""
  end
end

Here, we combine several of the techniques. First, we open Person, making self equal to the Person class. Next, we do class << self, making self equal to Person‘s metaclass. When we then define the species method, it is defined on Person‘s metaclass.

Person.instance_eval do
  def species
    "Homo Sapien"
  end
 
  self.name #=> "Person"
end

The last case, instance_eval, actually does something interesting. It breaks apart the self into the self that is used to execute methods and the self that is used when new methods are defined. When instance_eval is used, new methods are defined on the metaclass, but the self is the object itself.

In some of these cases, the multiple ways to achieve the same thing arise naturally out of Ruby’s semantics. After this explanation, it should be clear that def Person.species, class << Person; def species, and class Person; class << self; def species aren’t three ways to achieve the same thing by design, but that they arise out of Ruby’s flexibility with regard to specifying what self is at any given point in your program.

On the other hand, class_eval is slightly different. Because it take a block, rather than act as a keyword, it captures the local variables surrounding it. This can provide powerful DSL capabilities, in addition to controlling the self used in a code block. But other than that, they are exactly identical to the other constructs used here.

Finally, instance_eval breaks apart the self into two parts, while also giving you access to local variables defined outside of it.

In the following table, defines a new scope means that code inside the block does not have access to local variables outside of the block.

mechanism method resolution method definition new scope?
class Person Person same yes
class << Person Person’s metaclass same yes
Person.class_eval Person same no
Person.instance_eval Person Person’s metaclass no

Also note that class_eval is only available to Modules (note that Class inherits from Module) and is an alias for module_eval. Additionally, instance_exec, which was added to Ruby in 1.8.7, works exactly like instance_eval, except that it also allows you to send variables into the block.

UPDATE: Thank you to Yugui of the Ruby core team for correcting the original post, which ignored the fact that self is broken into two in the case of instance_eval.

Better Ruby Idioms

Carl and I have been working on the plugins system over the past few days. As part of that process, we read through the Rails Plugin Guide. While reading through the guide, we noticed a number of idioms presented in the guide that are serious overkill for the task at hand.

I don’t blame the author of the guide; the idioms presented are roughly the same that have been used since the early days of Rails. However, looking at them brought back memories of my early days using Rails, when the code made me feel as though Ruby was full of magic incantations and ceremony to accomplish relatively simple things.

Here’s an example:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    # any method placed here will apply to classes, like Hickwall
    def acts_as_something
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    # any method placed here will apply to instaces, like @hickwall
  end
end

To begin with, the send is completely unneeded. The acts_as_something method will be run on the Class itself, giving the method access to include, a private method.

This code intended to be used as follows:

class ActiveRecord::Base
  include Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

What the code does is:

  1. Register a hook so that when the module is included, the ClassMethods are extended onto the class
  2. In that module, define a method that includes the InstanceMethods
  3. So that you can say acts_as_something in your code

The crazy thing about all of this is that it’s completely reinventing the module system that Ruby already has. This would be exactly identical:

module Yaffle
  # any method placed here will apply to classes, like Hickwall
  def acts_as_something
    send :include, InstanceMethods
  end
 
  module InstanceMethods
    # any method placed here will apply to instances, like @hickwall
  end
end

To be used via:

class ActiveRecord::Base
  extend Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

In a nutshell, there’s no point in overriding include to behave like extend when Ruby provides both!

To take this a bit further, you could do:

module Yaffle
  # any method placed here will apply to instances, like @hickwall, 
  # because that's how modules work!
end

To be used via:

class Article < ActiveRecord::Base
  include Yaffle
end

In effect, the initial code (override included hook to extend a method on, which then includes a module) is two layers of abstraction around a simple Ruby include!

Let’s look at a few more examples:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to behave like extend (instead of just using extend!).

A better solution:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = options[:yaffle_text_field].to_s || "last_squawk"
  end
end
 
ActiveRecord::Base.extend Yaffle

In this case, it’s appropriate to use an acts_as_yaffle, since you’re providing additional options which could not be encapsulated using the normal Ruby extend.

Another “more advanced” case:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to pretend to be an extend, and a send where it is not needed. Identical functionality:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include InstanceMethods
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.extend Yaffle

Of course, it is also possible to do:

module Yaffle
  def squawk(string)
    write_attribute(self.class.yaffle_text_field, string.to_squawk)
  end
end
 
class ActiveRecord::Base
  def self.acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include Yaffle
  end
end

Since the module is always included in ActiveRecord::Base, there is no reason that the earlier code, with its additional modules and use of extend, is superior to simply reopening the class and adding the acts_as_yaffle method directly. Now we can put the squawk method directly inside the Yaffle module, where it can be included cleanly.

It may not seem like a huge deal, but it significantly reduces the amount of apparent magic in the plugin pattern, making it more accessible for new users. Additionally, it exposes the new user to include and extend quickly, instead of making them feel as though they were magic incantations requiring the use of send and special modules named ClassMethods in order to get them to work.

To be clear, I’m not saying that these idioms aren’t sometimes needed in special, advanced cases. However, I am saying that in the most common cases, they’re huge overkill that obscures the real functionality and confuses users.

Using the New Gem Bundler Today

As you might have heard, Carl and I released a new project that allows you to bundle your gems (both pure-ruby and native) with your application. Before I get into the process for using the bundler today, I’d like to go into the design goals of the project.

  • The bundler should allow the specification of all dependencies in a separate place from the application itself. In other words, it should be possible to determine the dependencies for an application without needing to start up the application.
  • The bundler should have a built-in dependency resolving mechanism, so it can determine the required gems for an entire set of dependencies.
  • Once the dependencies are resolved, it should be possible to get the application up and running on a new system without needing to check Rubyforge (or gemcutter) again. This is especially important for compiled gems (it should be possible to get the list of required gems once and compile on remote systems as desired).
  • Above all else, the bundler should provide a reproducible installation of Ruby applications. New gem releases or down remote servers should not be able to impact the successful installation of an application. In most cases, git clone; gem bundle should be all that is needed to get an application on a new system and up and running.
  • Finally, the bundler should not assume anything about Rails applications. While it should work flawlessly in the context of a Rails application, this should be because a Rails application is a Ruby application.

Using the Bundler Today

To use the gem bundler today in a non-Rails application, follow the following steps:

  1. gem install bundler
  2. Create a Gemfile in the root of your application
  3. Add dependencies to your Gemfile. See below for more details on the sorts of things you can do in the Gemfile. At the simplest level, gem "gem_name", "version" will add a dependency of the gem and version to your application
  4. At the root, run gem bundle. The bundler should tell you that it is resolving dependencies, then downloading and installing the gems.
  5. Add vendor/gems/gems, vendor/gems/specifications, vendor/gems/doc, and vendor/gems/environment.rb to your .gitignore file
  6. Inside your application, require vendor/gems/environment.rb to add the bundled dependencies to your load paths.
  7. Use Bundler.require_env :optional_environment to actually require the files.
  8. After committing, run gem bundle in a fresh clone to re-expand the gems. Since you left the vendor/gems/cache in your source control, new machines will be guaranteed to use the same files as the original machine, requiring no remote dependencies

The bundler will also install binaries into the app’s bin directory. You can, therefore, run bin/rackup for instance, which will ensure that the local bundle, rather than the system, is used. You can also run gem exec rackup, which runs any command using the local bundle. This allows things like gem exec ruby -e "puts Nokogiri::VERSION" or the even more adventurous gem exec bash, which will open a new shell in the context of the bundle.

Gemfile

You can do any of the following in the Gemfile:

  • gem "name", "version": version may be a strict version or a version requirement like &gt;= 1.0.6. The version is optional.
  • gem "name", "version", :require_as =&gt; "file": the require_as allows you to specify which file should be required when the require_env is called. By default, it is the gem’s name
  • gem "name", "version", :only =&gt; :testing: The environment name can be anything. It is used later in your require_env call. You may specify either :only, or :except constraints
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor": Specify a git repository to be used to satisfy the dependency. You must use a hard dependency (“1.0.6″) rather than a soft dependency (“>= 1.0.6″). If a .gemspec is found in the repository, it is used for further dependency lookup. If the repository has multiple .gemspecs, each directory will a .gemspec will be considered a gem.
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor", :branch =&gt; "experimental": Further specify a branch, tag, or ref to use. All of :branch, :tag, and :ref are valid options
  • gem "name", "version", :vendored_at =&gt; "vendor/nokogiri": In the next version of bundler, this option will be changing to :path. This specifies that the dependency can be found in the local file system, rather than remotely. It is resolved relative to the location of the Gemfile
  • clear_sources: Empties the list of gem sources to search inside of.
  • source "http://gems.github.com": Adds a gem source to the list of available gem sources.
  • bundle_path "vendor/my_gems": Changes the default location of bundled gems from vendor/gems
  • bin_path "my_executables": Changes the default location of the installed executables
  • disable_system_gems: Without this command, both bundled gems and system gems will be used. You can therefore have things like ruby-debug in your system and use it. However, it also means that you may be using something in development mode that is installed on your system but not available in production. For this reason, it is best to disable_system_gems
  • disable_rubygems: This completely disables rubygems, reducing startup times considerably. However, it often doesn’t work if libraries you are using depend on features of Rubygems. In this mode, the bundler shims the features of Rubygems that we know people are using, but it’s possible that someone is using a feature we’re unaware of. You are free to try disable_rubygems first, then remove it if it doesn’t work. Note that Rails 2.3 cannot be made to work in this mode
  • only :environment { gem "rails" }: You can use only or except in block mode to specify a number of gems at once

Bundler process

When you run gem bundle, a few things happen. First, the bundler attempts to resolve your list of dependencies against the gems you have already bundled. If they don’t resolve, the metadata for each specified source is fetched and the gems are downloaded. Next (either way), the bundler checks to see whether the downloaded gems are expanded. For any gem that is not yet expanded, the bundler expands it. Finally, the bundler creates the environment.rb file with the new settings. This means that running gem bundler over and over again will be extremely fast, because after the first time, all gems are downloaded and expanded. If you change settings, like disable_rubygems, running gem bundle again will simply regenerate the environment.rb.

Rails 2.3

To get this working with Rails 2.3, you need to create a preinitializer.rb and insert the following:

require "#{File.dirname(__FILE__)}/../vendor/bundler_gems/environment"
 
class Rails::Boot
  def run
    load_initializer
    extend_environment
    Rails::Initializer.run(:set_load_path)
  end
 
  def extend_environment
    Rails::Initializer.class_eval do
      old_load = instance_method(:load_environment)
      define_method(:load_environment) do
        Bundler.require_env RAILS_ENV
        old_load.bind(self).call
      end
    end
  end
end

It’s a bit ugly, but you can copy and paste that code and forget it. Astute readers will notice that we’re using vendor/bundler_gems/environment.rb. This is because Rails 2.3 attaches special, irrevocable meaning to vendor/gems. As a result, make sure to do the following in your Gemfile: bundle_path "vendor/bundler_gems".

Gemcutter uses this setup and it’s working great for them.

Bundler 0.7

We’re going to be releasing Bundler 0.7 tomorrow. It has some new features:

  • List outdated gems by passing --outdated-gems. Bundler conservatively does not update your gems simply because a new version came out that satisfies the requirement. This is so that you can be sure that the versions running on your local machine will make it safely to production. This will allow you to check for outdated gems so you can decide whether to update your gems with –update. Hat tip to manfred, who submitted this patch as part of his submission to the Rumble at Ruby en Rails
  • Specify the build requirements for gems in a YAML file that you specify with --build-options. The file looks something like this:
    mysql:
      config: /path/to/mysql_config

    This is equivalent to –with-mysql-config=/path/to/mysql_config

  • Specify :bundle =&gt; false to indicate that you want to use system gems for a particular dependency. This will ensure that it gets resolved correctly during dependency resolution but does not need to be included in the bundle
  • Support for multiple bundles containing multiple platforms. This is especially useful for people moving back and forth between Ruby 1.8 and 1.9 and don’t want to constantly have to nuke and rebundle
  • Deprecate :vendored_at and replace with :path
  • A new directory DSL method in the Gemfile:
    directory "vendor/rails" do
      gem "activesupport", :path =&gt; "activesupport" # :path is optional if it's identical to the gem name
                                                    # the version is optional if it can be determined from
                                                    # a gemspec
    end
  • You can do the same with the git DSL method
    git "git://github.com/rails/rails.git" do # :branch, :tag, or :ref work here
      gem "activesupport", :path =&gt; "activesupport" # same rules as directory, except that the files are
                                                    # first downloaded from git.
    end
  • Fix some bugs in resolving prerelease dependencies