Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

Archive for the ‘Ruby’ Category

August Tokaido Update

It’s been a while since I posted anything on my blog, and I figured I’d catch everyone up on the work I’ve been doing on Tokaido.

Components

Tokaido itself is made up of a number of components, which I am working on in parallel:

  • Ruby binary build, statically compiled
  • A logging and alerting UI
  • Remote Notifications for Rails 3+
  • Integration with Puma Express
  • Integration with code quality tools
  • Resolve bundler issue related to binary builds and deployment to Heroku
  • Work with community members to start shipping binary builds of popular gems (gates on fixing the bundler bug)

A number of people are doing parts of the work to make Tokaido a reality. I specifically want to thank:

  • Michal Papis of the rvm team for using the sm framework to make the Tokaido build maintainable over time and doing the heavy lifting to take the initial spike I did and get a reproducible binary build
  • Terence Lee of Heroku for packaging up these early binary builds for use at several Rails Girls events, with great success!

Ruby Binary Build, Statically Compiled

This is the first work I did, a few months ago, with help from Michal Papis. I detailed the hard parts of making Ruby statically compiled in June’s status update. Since then, Terence Lee (@hone02) has used the binary build at several Rails Girls events, dramatically reducing the time needed for Rails installation on OSX. In addition to the Tokaido binary build, Terence also precompiled a number of gems, and put built a script to download the entire thing as a zip, install it into the user’s home directory, and modify their ~/.profile to include it.

This strategy works well for trainings, but has a number of limitations:

  • Since it relies on precompiled gems already existing in the gem home, the gems cannot be removed or upgraded without requiring a C compiler. The correct solution is what gems authors do for Windows: ship versions of binary gems with specific OSX designations. Currently, binary gems do not interact well with Heroku deploys (or anyone using bundle install --deployment), so we have been working to resolve those issues before foisting a bunch of new binary gems on the world. See below.
  • It relies on modifying all instances of the Terminal, which means that some system edge-cases leak into this solution. It also relies on modifying ~/.profile, which may not work depending on what shell the user is using and what other startup scripts the user is running. For Tokaido, we will have a way to launch a Terminal that reliably injects itself without these problems, and without polluting all instances of Terminal.
  • It is difficult to upgrade components, like patch levels of Ruby or versions of C dependencies like libyaml. Tokaido.app will store its copy of Ruby and gems in a sandbox, loadable from the UI (as I described in the second bullet), which makes it easy for the .app to upgrade patch levels of Ruby or whatever components it needs.

Because I understand that many people WANT the ability to have their Ruby take over their terminal, Tokaido will integrate with rvm (and rbenv, if possible) to mount Tokaido.app as the default version of Ruby in your shell.

A Logging and Alerting UI

The primary UI for Tokaido will be a logging and alerting UI for your Rails application. I expect that you will spend the most time in the “Requests” UI, which will show you a list of the previous requests, and the log of the current request. My goal is to use the logging UI to improve on the typical “tail the log” experience. I’ve been working with Austin Bales, a designer at do.com to develop some mockups, and hope to have something for people to look at in the next few weeks.

In addition to an improved logging experience, Tokaido will alert you when something has gone wrong in your request. This may include exceptions (5xx) or information provided by plugins (bullet can let you know when your code is triggering an N+1 query; see the README for more information). The list of prior requests will highlight requests with issues, and problems in the notifications tray will directly link you to the request where the error occurred.

If everything goes well, the Tokaido UI will replace your logging workflow, without impacting the rest of the tasks you perform from the command-line.

Remote Notifications for Rails 3+

Rails 3 shipped with a new instrumentation API, which provides detailed information about many events that happen inside of Rails. This system is used by Rails’ own logging system, and I would like to use it in Tokaido as well.

The instrumentation API provides all of the information we need, but no built-in way to communicate those notifications across processes. Because Tokaido.app will not run in the same process as the Rails app, I have been working on a gem (remote_notifications) that provides a standard way to send these notifications to another process.

This gem also backports Aaron’s Rails instrumentation work (see his talk at Railsberry for more information) to Rails 3.0 and 3.1, which makes it possible to build a reliable tree from notifications, even on systems with low clock resolution.

It also includes a mechanism for subscribing to notifications sent from another process using the regular notifications API, and pluggable serializers and deserializers. It isn’t quite done yet, but I should have an initial release soon.

Integration with Puma Express

Puma is a new threaded web server built by Evan Phoenix of Rubinius fame. Puma Express manages Puma servers, automatically setting up the DNS (appname.dev) for you. This is similar to the approach used by Pow, but without a Node dependency.

Tokaido will integrate with Puma Express, so you will not need to manually boot and shut down your server. Because Tokaido also takes care of logging, there shouldn’t be any need for dedicated tabs with persistent running processes when you use Tokaido.

You will also be able to install an executable into your system (a la GitX and Textmate) to make it easy to boot up a Tokaido in the context of the current application:

$ tokaido .

This will be especially useful for people using the rvm “mounted” Tokaido.

Integration with Code Quality Tools

By default, Tokaido will integrate with flog, flay and rails-best-practices. It will periodically scan your Rails app for problems and notify you via the app’s notification tray. This mechanism will be extensible, so a future version of Tokaido might integrate with Code Climate or support code quality plugins.

Binary Gems and Heroku

Bundler provides a mechanism for deployment (bundle install --deployment) that specifically rejects deploys that require changes to the Gemfile.lock. When deploying a Gemfile.lock that was built using binary gems from a different platform, this mechanism rejects the deploy. At present, only Windows makes heavy use of binary gems, and Heroku’s solution to date has been to simply remove the Gemfile.lock and re-resolve dependencies.

Unfortunately, this solution eliminates the most important guarantee made by bundler, and is untenable in the long-term. If OS X users started to use binary gems more broadly, this would bring this problem into much wider circulation. Also, because platform-specific gems can contain alternate dependencies (see, for example Nokogiri 1.4.2), it is important that bundle install support alternate platforms more extensively than with a naïve solution.

There are a few possible solutions:

  • Do not use --deployment on Heroku. This would allow bundler to update the Gemfile.lock for the new platform. It would also mean that if a developer didn’t update their Gemfile.lock before deploying, they might run unexpected code. This is a short-term fix, but would probably alleviate most of the symptoms without introducing a lot of new caveats.
  • Fix --deployment to allow changes to the Gemfile.lock, but only in response to a genuine platform change. Unfortunately, because of cases like Nokogiri 1.4.2, this could theoretically result in totally different code running in production.
  • Provide a mechanism for developers to explicitly add a deployment environment: bundle platform add x86_64-linux. This would pre-resolve gems across all available platforms and ensure that the same gem versions could be used everywhere.
  • Improve the bundler resolution mechanism to allow gems with the same name, version and dependencies to be treated the same. Because dependencies can also exhibit this problem (a dependency of some JRuby gem could be another JRuby gem with its own non-standard dependencies), this would need to ensure that the entire subtree was the same.

The last solution is most promising, because the platform-specific gems Tokaido is concerned with are simply precompiled variants of the same gem. In the vast majority of cases, this simply means that the gem author is saving the end-developer the step of using a C compiler, but doesn’t actually change a lot else.

Unfortunately, Rubygems itself doesn’t distinguish between precompiled variants of gems and gems with different dependencies, so we will have to add smarts into bundler to detect the difference. My guess is that we will use a short-term fix like the first option while working on a longer-term fix like the last option.

Work Time

As I said in the original Kickstarter, I planned to take time off from work to work on the project. I have structured that time by working mornings on Tokaido, and shifting client work to the afternoons and evenings.

I don’t have a specific ship-date in mind for Tokaido yet, but you should start seeing more work-product as the project progresses, starting over the next few weeks.

Thanks for your patience. It shall be rewarded!

How to Marshal Procs Using Rubinius

The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.

There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.

One example is the Binding object. To create a new binding in Rubinius, you call Binding.setup:

def self.setup(variables, code, static_scope, recv=nil)
  bind = allocate()
 
  bind.self = recv || variables.self
  bind.variables = variables
  bind.code = code
  bind.static_scope = static_scope
  return bind
end

This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding in pure Ruby.

In fact, Rubinius implements Kernel#binding itself in terms of Binding.setup.

def binding
  return Binding.setup(
    Rubinius::VariableScope.of_sender,
    Rubinius::CompiledMethod.of_sender,
    Rubinius::StaticScope.of_sender,
    self)
end

Yes, you’re reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller’s scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It’s core to how Rubinius manages eval, which of course makes heavy use of bindings.

Marshalling Procs

For a while, I have wanted the ability to Marshal.dump a proc in Ruby. MRI has historically disallowed it, but there’s nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.

Let’s take a quick detour to learn about the constructs in question.

Rubinius::StaticScope

Rubinius represents Ruby’s constant lookup scope as a Rubinius::StaticScope object. Perhaps the easiest way to understand it would be to look at Ruby’s built-in Module.nesting function.

module Foo
  p Module.nesting
 
  module Bar
    p Module.nesting
  end
end
 
module Foo::Bar
  p Module.nesting
end
 
# Output:
# [Foo]
# [Foo::Bar, Foo]
# [Foo::Bar]

Every execution context in Rubinius has a Rubinius::StaticScope, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object.

Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting in Rubinius:

def nesting
  scope = Rubinius::StaticScope.of_sender
  nesting = []
  while scope and scope.module != Object
    nesting << scope.module
    scope = scope.parent
  end
  nesting
end

A static scope also has an addition property called current_module, which is used during class_eval to define which module the runtime should add new methods to.

Adding Marshal.dump support to a static scope is therefore quite easy:

class Rubinius::StaticScope
  def marshal_dump
    [@module, @current_module, @parent]
  end
 
  def marshal_load(array)
    @module, @current_module, @parent = array
  end
end

These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don’t show up in the instance_variables list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.

Rubinius::CompiledMethod

A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.

It’s actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.

class Rubinius::CompiledMethod
  def _dump(depth)
    Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)])
  end
 
  def self._load(string)
    scope, dump = Marshal.load(string)
    cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump)
    cm.scope = scope
    cm
  end
end

Rubinius::VariableScope

A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self, and several other characteristics.

I wrote about the variable scope before. It’s one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.

Dumping and loading the VariableScope is also easy:

class VariableScope
  def _dump(depth)
    Marshal.dump([@method, @module, @parent, @self, nil, locals])
  end
 
  def self._load(string)
    VariableScope.synthesize *Marshal.load(string)
  end
end

The synthesize method is new to Rubinius master; getting a new variable scope previously required synthesizing its locals using class_eval, and the new method is better.

Rubinius::BlockEnvironment

A Proc is basically nothing but a wrapper around a Rubinius::BlockEnvironment, which wraps up all of the objects we’ve been working with so far. Its scope attribute is a VariableScope and its code attribute is a CompiledMethod.

Dumping it should be quite familiar by now.

class BlockEnvironment
  def marshal_dump
    [@scope, @code]
  end
 
  def marshal_load(array)
    scope, code = *array
    under_context scope, code
  end
end

The only thing new here is the under_context method, which gives a BlockEnvironment its variable scope and compiled method. Note that we dumped the static scope along with the compiled method above.

Proc

Finally, a Proc is just a wrapper around a BlockEnvironment, so dumping it is easy:

class Proc
  def _dump(depth)
    Marshal.dump(@block)
  end
 
  def self._load(string)
    block = Marshal.load(string)
    self.__from_block__(block)
  end
end

The __from_block__ method constructs a new Proc from a BlockEnvironment.

So there you have it. Dumping and reloading Proc objects in pure Ruby using Rubinius! (the full source is at https://gist.github.com/1378518).

New Hope for The Ruby Specification

For a few years, a group of Japanese academics have been working on formalizing the Ruby programming language into a specification they hoped would be accepted by ISO. From time to time, I have read through it, and I had one major concern.

Because Ruby 1.9 was still in a lot of flux when they were drafting the specification, the authors left a lot of details out. Unfortunately, some of these details are extremely important. Here’s one example, from the specification of String#[]:

Behavior:
 
a) If the length of args is 0 or larger than 2, raise a
   direct instance of the class ArgumentError.
b) Let P be the first element of args. Let n be the length
   of the receiver.
c) If P is an instance of the class Integer, let b be the
   value of P.
   1) If the length of args is 1:
      i)   If b is smaller than 0, increment b by n. If b is
           still smaller than 0, return nil.
      ii)  If b >= n, return nil.
      iii) Create an instance of the class Object which
           represents the bth character of
           the receiver and return this instance.

The important bit here is c(1)(iii), which says to create “an instance of the class Object which represents the btw character of the receiver”. The reason for this ambiguity, as best as I can determine, is that Ruby 1.8 and Ruby 1.9 differ on the behavior:

1.8 >> "hello"[1]
=> 101
 
1.9 >> "hello"[1]
=> "e"

Of course, neither of these results in a direct instance of the class Object, but since Fixnums and Strings are both “instances of the class Object”, this is technically true. Unfortunately, any real-life Ruby code will need to know what actual object this method will return.

Another very common reason for unspecified behaviors is a failure to specify Ruby’s coercion protocol, so String#+ is unspecified if the other is not a String, even though all Ruby implementations will call to_str on the other to attempt to coerce it. The coercion protocol has been underspecified for a long time, and it’s understandable that the group punted on it, but because it is heavily relied on by real-life code, it is important that we actually describe the behavior.

This week, I am in Matsue in Japan for RubyWorld, and I was glad to learn that the group working on the ISO specification sees the current work as a first step that will continue with a more rigid specification of currently “unspecified” behavior based on Ruby 1.9.

The word “unspecified” appears 170 times in the current draft of the Ruby specification. I hope that the next version will eliminate most if not all of these unspecified behaviors in favor of explicit behavior or explicitly requiring an exception to be thrown. In cases that actually differ between implementations (for instance, Rubinius allows Class to be subclassed), I would hope that these unspecified behaviors be the subject of some discussion at the implementor level.

In any event, I am thrilled at the news that the Ruby specification will become less ambiguous in the future!

What’s Up With All These Changes in Rails?

Yesterday, there was a blog post entitled “What the Hell is Happening to Rails” that stayed at the number one spot on Hacker News for quite a while. The post and many (but not most) the comments on the post reflect deep-seated concern about the recent direction of Rails. Others have addressed the core question about change in the framework, but I’d like to address questions about specific changes that came up in the post and comments.

The intent of this post is not to nitpick the specific arguments that were made, or to address the larger question of how much change is appropriate in the framework, but rather to provide some background on some of the changes that have been made since Rails 2.3 and to explain why we made them.

Block Helpers

I too get a feeling of “change for the sake of change” from Rails at times. That’s obviously not something they’re doing, as all the changes have some motivation, but at times it feels a bit like churn.
At one point in time, you did forms like <%= form …. and then they switched to <% form …. do and now they’ve switched back to <%= form … do again.
Also, the upgrade to Rails 3 is not an easy one. Yeah, you get some nice stuff, but because it’s so painful, it’s not happening for a lot of people, which is causing more problems.

Prior to Rails 3.0, Rails never used <= form_for because it was technically very difficult to make it work. I wrote a post about it in Summer 2009 that walked through the technical problems. The short version is that every ERB template is compiled into a Ruby method, and reliably converting &lt%= with blocks proved to be extremely complicated.

However, knowing when to use <% and when to use <= caused major issues for new developers, and it was a major 3.0 priority to make this work. In addition, because of the backwards compatibility issue, we went to extreme Ruby-exploiting lengths to enable deprecation warnings about the change, so that we could fix the issue for new apps, but also not break old apps.

The night José and I figured out how to do this (the Engine Yard party at Mountain West RubyConf 2009), we were pretty close to coming to the conclusion that it couldn’t be done short of using a full language parser in ERB, which should give you some sense of how vexing a problem it was for us.

Performance

The general disregard for improving performance, inherited from Ruby, is also something endemic from a while back.

Yes, there have been some performance regressions in Rails 3.0. However, the idea that the Rails core team doesn’t care about performance, and neither does the Ruby team doesn’t pass the smell test.

Aaron Patterson, the newest core team member, worked full time for almost a year to get the new ActiveRecord backend into decent performance shape. Totally new code often comes with some performance regressions, and the changes to ActiveRecord were important and a long-time coming. Many of us didn’t see the magnitude of the initial problem until it was too late for 3.0, but we take the problem extremely seriously.

Ruby core (“MRI”) itself has sunk an enormous amount of time into performance improvements in Ruby 1.9, going so far as to completely rewrite the core VM from scratch. This resulted in significant performance improvements, and the Ruby team continues to work on improvements to performance and to core systems like the garbage collector.

The Ruby C API poses some long-term problems for the upper-bound of Ruby performance improvements, but the JRuby and Rubinius projects are showing how you can use Ruby C extensions inside a state-of-the-art virtual machine. Indeed, the JRuby and Rubinius projects show that the Ruby community both cares about, and is willing to invest significant energy into improving the performance of Ruby.

Heroku and the Asset Pipeline

The assets pipeline feels horrible, it’s really slow. I upgraded to Rails 3.1rc, realized it fights with Heroku unless upgrading to the Cedar stack

The problem with Heroku is that the default Gemfile that comes with Rails 3.1 currently requires a JavaScript runtime to boot, even in production. This is clearly wrong and will be fixed posthaste. Requiring Rails apps to have node or a compiled v8 in production is an unacceptable burden.

On the flip side, the execjs gem, which manages compiling CoffeeScript (and more importantly minifying your JavaScript), is actually a pretty smart piece of work. It turns out that both Mac OSX and Windows ship with usable JavaScript binaries, so in development, most Rails users will already have a JavaScript library ready to use.

It’s worth noting that a JavaScript engine is needed to run “uglify.js”, the most popular and most effective JavaScript minifier. It is best practice to minify your JavaScript before deployment, so you can feel free to format and comment your code as you like without effecting the payload. You can learn more about minification in this excellent post by Steve Souders from a few years back. Rails adding minification by default is an unambiguous improvement in the workflow of Rails applications, because it makes it easy (almost invisible) to do something that everyone should be doing, but which has previously been something of a pain.

Again, making node a dependency in production is clearly wrong, and will be removed before the final release.

Change for Change’s Sake

The problem with Rails is not the pace of change so much as the wild changes of direction it takes, sometimes introducing serious performance degradations into official releases. Sometimes it’s hard to see a guiding core philosophy other than the fact they want to be on the shiny edge

When Rails shipped, it came with a number of defaults:

  • ActiveRecord
  • Test::Unit
  • ERB
  • Prototype

Since 2004, alternatives to all of those options have been created, and in many cases (Rspec, jQuery, Haml) have become somewhat popular. Rails 3.0 made no changes to the defaults, despite much clamoring for a change from Prototype to jQuery. Rails 3.1 changed to jQuery only when it became overwhelmingly clear that jQuery was the de facto standard on the web.

As someone who has sat in on many discussions about changing defaults, I can tell you that defaults in Rails are not changed lightly, and certainly not “to be on the shiny edge.” In fact, I think that Rails is more conservative than most would expect in changing defaults.

Right, but the problem is that there doesn’t seem to be a ‘right’ way, That’s the problem.

We were all prototype a few years ago, now it jquery … we (well I) hadn’t heard of coffeescript till a few months ago and now its a default option in rails, The way we were constructing ActiveRecord finders had been set all through Rails 2, now we’ve changed it, the way we dealt with gems was set all through rails 2 now its changed completely in Rails 3.

I like change, I like staying on the cutting edge of web technologies, but I don’t want to learn something, only to discard it and re-do it completely to bring it up to date with a new way of doing things all the time.

First, jQuery has become the de facto standard on the web. As I said earlier, Rails resisted making this change in 3.0, despite a lot of popular demand, and the change to jQuery is actually an example of the stability of Rails’ default choices over time, rather than the opposite.

Changes to gem handling evolved as the Rails community evolved to use gems more. In Rails 1, all extensions were implemented as plugins that got pulled out of svn and dumped into your project. This didn’t allow for versioning or dependencies, so Rails 2 introduced first-class support for Rails plugins as gems.

During the Rails 2 series, Rails added a dependency on Rack, which caused serious problems when Rails was used with other gems that rely on Rack, due to the way that raw Rubygems handles dependency activation. Because Rails 3 uses gem dependencies more extensively, we spent a year building bundler, which adds per-application dependency resolution to Rubygems. This was simply the natural evolution of the way that Rails has used dependencies over time.

The addition of CoffeeScript is interesting, because it’s a pretty young technology, but it’s also not really different from shipping a new template handler. When you create a new Rails app, the JavaScript file is a regular JS file, and the asset compilation and dependency support does not require CoffeeScript. Shipping CoffeeScript is basically like shipping Builder: should you want it, it’s there for you. Since we want to support minification out of the box anyway, CoffeeScript doesn’t add any new requirements. And since it’s just there in the default Gemfile, as opposed to included in Rails proper like Builder, turning it off (if you really want to) is as simple as removing a line in your Gemfile. Nothing scary here.

ActiveRecord Changes

The way we were constructing ActiveRecord finders had been set all through Rails 2, now we’ve changed it

The Rails core team does seem to treat the project as if it’s a personal playground

One of the biggest problems with ActiveRecord was the way it internally used Strings to represent queries. This meant that changes to queries often required gsubing String to make simple changes. Internally, it was a mess, and it expressed itself in the public API in how conditions were generated, and more importantly how named scopes were created.

One goal of the improvements in Rails 3 was to get rid of the ad-hoc query generation in Rails 2 and replace it with something better. ActiveRelation, this library, was literally multiple years in the making, and a large amount of the energy in the Rails 3.0 process was spent on integrating ActiveRelation.

From a user-facing perspective, we wanted to unify all of the different ways that queries were made. This means that scopes, class methods, and one-time-use queries all use the same API. As in the <= case, a significant amount of effort was spent on backwards compatibility with Rails 2.3. In fact, we decided to hold onto support for the old API as least as long as Rails 3.2, in order to soften the community transition to the new API.

In general, we were quite careful about backwards compatibility in Rails 3.0, and while a project as complex as Rails was not going to be perfect in this regard, characterizing the way we handled this as “irresponsible” or “playground” disregards the tremendous amount of work and gymnastics that the core team and community contributors put into supporting Rails 2.3 APIs across the entire codebase when changes were made.

Gem Versioning and Bundler: Doing it Right

Recently, an upgrade to Rake (from version 0.8.7 to version 0.9.0) has re-raised the issue of dependencies and versioning in the Ruby community. I wanted to take the opportunity to reiterate some of the things I talked about back when working on Bundler 1.0. First, I’ll lay out some basic rules for the road, and then go into some detail about the rationale.

Basic Versioning Rules for Apps

  1. After bundling, always check your Gemfile.lock into version control. If you do this, you do not need to specify exact versions of gems in your Gemfile. Bundler will take care of ensuring that all systems use the same versions.
  2. After updating your Gemfile, always run bundle install first. This will conservatively update your Gemfile.lock. This means that except for the gem that you changed in your Gemfile, no other gem will change.
  3. If a conservative update is impossible, bundler will prompt you to run bundle update [somegem]. This will update the gem and any necessary dependencies. It will not update unrelated gems.
  4. If you want to fully re-resolve all of your dependencies, run bundle update. This will re-resolve all dependencies from scratch.
  5. When running an executable, ALWAYS use bundle exec [command]. Quoting from the bundler documentation: In some cases, running executables without bundle exec may work, if the executable happens to be installed in your system and does not pull in any gems that conflict with your bundle. However, this is unreliable and is the source of considerable pain. Even if it looks like it works, it may not work in the future or on another machine. See below, “Executables” for more information and advanced usage.
  6. Remember that you can always go back to your old Gemfile.lock by using git checkout Gemfile.lock or the equivalent command in your version control.

Executables

When you install a gem to the system, Rubygems creates wrappers for every executable that the gem makes available. When you run an executable from the command line without bundle exec, this wrapper invokes Rubygems, which then uses the normal Rubygems activation mechanism to invoke the gem’s executable. This has changed in the past several months, but Rubygems will invoke the latest version of the gem installed in your system, even if your Gemfile.lock specifies a different version. In addition, it will activate the latest (compatible) installed version of dependencies of that gem, even if a different version is specified in your Gemfile.lock.

This means that invoking executables as normal system executables bypasses bundler’s locked dependencies. In many cases, this will not pose a problem, because developers of your app tend to have the right version of the system-installed executable. For a long time, the Rake gem was a good example of this phenomenon, as most Gemfile.locks declared Rake 0.8.7, and virtually all Ruby developers had Rake 0.8.7 installed in their system.

As a result, users fell into the unfortunate belief that running system executables was compatible with bundler’s locked dependencies. To work around some of the remaining cases, people often advocate the use of rvm gemsets. Combined with manually setting up application-specific gemsets, this can make sure that the “system executables” as provided via the gemset remain compatible with the Gemfile.lock.

Unfortunately, this kludge (and others) sufficiently reduced the pain that most people ignored the advice of the bundler documentation to always use bundle exec when running executables tied to gems in the application’s Gemfile.lock.

It’s worth noting that typing in rake foo (or anyexecutable foo) in the presence of a Gemfile.lock, and expecting it to execute in the bundler sandbox doesn’t make any sense, since you’re not invoking Bundler. Bundler’s sandbox relies on its ability to be present at the very beginning of the Ruby process, and to therefore have the ability to ensure that the versions of all loaded libraries will reflect the ones listed in the Gemfile.lock. By running a system executable, you are executing Ruby code before Bundler can modify the load path and replace the normal Rubygems loading mechanism, allowing arbitrary unmanaged gems to get loaded into memory. Once that happens, all bets are off.

bundle install –binstubs

In order to alleviate some of the noise of bundle exec, Bundler 1.0 ships with a --binstubs flag, which will create a bin directory containing each of the executables that the application’s gems expose. Running bin/cucumber, for instance, is the equivalent of running bundle exec cucumber.

The generated bin directory should contain portable versions of the executables, so it should be safe to add them to version control.

The rails Command

The only exception to the above rules is the rails executable. As of Rails 3.0, the executable simply looks for a script/rails file, and execs that file. The generated script/rails loads the local boot environment, which invokes bundler immediately:

# This command will automatically be run when you run "rails" with Rails 3 gems installed from the root of your application.
 
APP_PATH = File.expand_path('../../config/application',  __FILE__)
require File.expand_path('../../config/boot',  __FILE__)
require 'rails/commands'

The generated boot.rb is very simple:

require 'rubygems'
 
# Set up gems listed in the Gemfile.
ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../../Gemfile', __FILE__)
 
require 'bundler/setup' if File.exists?(ENV['BUNDLE_GEMFILE'])

In short, the rails executable does a bunch of work to guarantee that your application’s bootstrapping logic runs before any dependencies are loaded, and it uses Kernel#exec to purge the current process of any already-loaded gems. This is not something that general-purpose gems can or should do, and it’s an open question whether making the rails executable work without bundle exec sufficiently muddies the waters with regard to other gems as to render the benefit not worth it.

Getting Comfortable With Rubinius’ Pure-Ruby Internals

You probably know that Rubinius is a Ruby whose implementation is mostly written in Ruby. While that sounds nice in theory, you may not know what that means in practice. Over the past several years, I’ve contributed on and off to Rubinius, and feel that as Rubinius has matured since the 1.0 release, a lot of its promise has gelled.

One of the great things about Rubinius is that it exposes, as first-class concepts, a lot of things that are merely implicit in other Ruby implementations. For instance, Ruby methods have a variable scope which is implicitly accessible by blocks created in the scope. Rubinius exposes that scope as Rubinus::VariableScope. It also exposes Ruby bindings as richer objects that you can create yourself and use. In this post, I will talk about Rubinius::VariableScope, how it works, and how it fits into the Rubinius codebase.

Prologue

For those of you who don’t know much about Rubinius, it’s a Ruby implementation whose core classes are mostly written in Ruby. Some functionality, such as the core of the object model, as well as low-level methods, are written in C++ and exposed into Ruby via its primitive system. However, the functionality is typically fairly low-level, and the vast majority of the functionality is written on top of those primitives. For example, a Ruby String in Rubinius is a mostly pure-Ruby object with a backing ByteArray, and ByteArray’s implementation is mostly written as a primitive.

For instance, String’s allocate method looks like this:

def self.allocate
  str = super()
  str.__data__ = Rubinius::ByteArray.new(1)
  str.num_bytes = 0
  str.characters = 0
  str
end

If you just want to get to the meat of the matter, feel free to skip this section and go directly to the Internals section below.

In general, Rubinius’ code is broken up into three stages. The first stage, called alpha, creates just enough raw Ruby constructs to get to the next stage, the bootstrap stage. It defines methods like Class#new, Kernel#__send__, Kernel#raise, Kernel#clone, basic methods on Module, Object, String, and Symbol, and some classes on the Rubinius module, used for internal use. It’s a pretty short file, weighing it at under 1,000 lines, and I’d encourage you to read it through. It is located in kernel/alpha.rb.

Next, the bootstrap stage creates the basic methods needed on many of Ruby’s core classes. In some cases, it defines a simpler version of a common structure, like Rubinius::CompactLookupTable and Rubinius::LookupTable instead of Hash. It also contains Rubinius constructs like the Rubinius::CompiledMethod, Rubinius::MethodTable, Rubinius::StaticScope and Rubinius::VariableScope. Many of these methods are defined as primitives. Primitive methods are hooked up in Ruby; check out the bootstrap definition of Rubinius::VariableScope:

module Rubinius
  class VariableScope
    def self.of_sender
      Ruby.primitive :variable_scope_of_sender
      raise PrimitiveFailure, "Unable to get VariableScope of sender"
    end
 
    def self.current
      Ruby.primitive :variable_scope_current
      raise PrimitiveFailure, "Unable to get current VariableScope"
    end
 
    def locals
      Ruby.primitive :variable_scope_locals
      raise PrimitiveFailure, "Unable to get VariableScope locals"
    end
 
    # To handle Module#private, protected
    attr_accessor :method_visibility
  end
end

The bootstrap definitions are in kernel/bootstrap. The file kernel/bootstrap/load_order.txt defines the order that bootstrap methods are loaded.

Next, Rubinius fleshes out the core classes, mostly using Ruby methods. There is some primitive use in this stage, but it’s fairly few and far between. For instance, the Hash class is written entirely in Ruby, using a Rubinius::Tuple for its entries. This code can be found in kernel/common.

Some additional code lives in kernel/platform, which contains platform-specific functionality, like File and POSIX functionality, and kernel/delta, which runs after kernel/common. For the full description of the bootstrapping process, check out Rubinius: Bootstrapping.

Internals

One of the really nice things about Rubinius is that many concepts that are internal in traditional Ruby are exposed directly into the Ruby runtime. In many cases, this is so that more of Rubinius can be implemented in pure Ruby. In other words, if a core method in MRI needs some behavior, it is often implemented directly in C and unavailable to custom Ruby code.

One good example is the behavior of $1 in a Ruby method. If you call the =~ method, MRI’s internal C code will walk back up to the caller and set an internal match object on it. If you write your own method that uses the =~ method, there is no way for you to set the match object on the caller. However, since Rubinius itself implements these methods in Ruby, it needs a way to perform this operation directly from Ruby code. Check out the Rubinius implementation of =~:

class Regex
  def =~(str)
    # unless str.nil? because it's nil and only nil, not false.
    str = StringValue(str) unless str.nil?
 
    match = match_from(str, 0)
    if match
      Regexp.last_match = match
      return match.begin(0)
    else
      Regexp.last_match = nil
      return nil
    end
  end
end

There are a few interesting things going on here. First, Rubinius calls StringValue(str), which is a Ruby method that implements Ruby’s String coercion protocol. If you follow the method, you will find that it calls Type.coerce_to(obj, String, :to_str), which then, in pure-Ruby, coerces the object into a String. You can use this method anywhere in your code if you want to make use of the String coercion protocol with the standard error messages.

Next, Rubinius actually invokes the match call, which ends up terminating in a primitive called search_region, which binds directly into the Oniguruma regular expression engine. The interesting part is next: Regexp.last_match = match. This code sets the $1 property on the caller, from Ruby. This means that if you write a custom method involving regular expressions and want it to obey the normal $1 protocol, you can, because if Rubinius needs functionality like this, it is by definition exposed to Ruby.

VariableScope

One result of this is that you can get access to internal structures like the current “variable scope”. A variable scope is an object that wraps the notion of which locals are available (both real locals and locals defined using eval), as well as the current contents of each of those variables. If the current context is a block, a VariableScope also has a pointer to the parent VariableScope. You can get access to the current VariableScope by using VariableScope.current.

def foo
  x = 1
  p [:root, Rubinius::VariableScope.current.locals]
  ["a", "b", "c"].each do |item|
    p [:parent, Rubinius::VariableScope.current.parent.locals]
    p [:item, Rubinius::VariableScope.current.locals]
  end
end
 
foo

The output will be:

[:root, #<Rubinius::Tuple: 1>]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "a">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "b">]
[:parent, #<Rubinius::Tuple: 1>]
[:item, #<Rubinius::Tuple: "c">]

A Rubinius::Tuple is a Ruby object that is similar to an Array, but is fixed-size, simpler, and usually more performant. The VariableScope has a number of additional useful methods, which you can learn about by reading kernel/bootstrap/variable_scope.rb and kernel/common/variable_scope.rb. One example: locals_layout gives you the names of the local variables for each slot in the tuple. You can also find out the current visibility of methods that would be declared in the scope (for instance, if you call private the VariableScope after that call will have method_visibility of :private).

Perhaps the most exciting thing about VariableScope is that you can look up the VariableScope of your calling method. Check this out:

def root
  x = 1
  ["a", "b", "c"].each do |item|
    call_me
  end
end
 
def call_me
  sender = Rubinius::VariableScope.of_sender
  p [:parent, sender.parent.locals]
  p [:item, sender.locals]
end
 
root

In this case, you can get the VariableScope of the caller, and get any live information from it. In this case, you can tell that the method was called from inside a block, what the locals inside the block contain, and what block contexts exist up to the method root. This is extremely powerful, and gives you a level of introspection into a running Ruby program heretofore not possible (or very difficult). Better yet, the existence of this feature doesn’t slow down the rest of the program, because Rubinius uses it internally for common Ruby functionality like the implementation of private and protected.

Implementing caller_locals

Now that we understand the VariableScope object, what if we want to print out a list of all of the local variables in the caller of a method.

def main
  a = "Hello"
 
  [1,2,3].each do |b|
    say_hello do
      puts "#{a}: #{b}"
    end
  end
end
 
def say_hello
  # this isn't working as we expect, and we want to look at
  # the local variables of the calling method to see what's
  # going on
  yield
ensure
  puts "Goodbye"
end

Let’s implement a method called caller_locals which will give us all of the local variables in the calling method. We will not be able to use VariableScope.of_sender, since that only goes one level back, and since we will implement caller_locals in Ruby, we will need to go two levels back. Thankfully, Rubinius::VM.backtrace has the information we need.

Rubinius::VM.backtrace takes two parameters. The first is the number of levels back to start from. In this case we want to start two levels back (the caller will be the say_hello method, and we want its caller). The second parameter indicates whether the Location objects should contain the VariableScope, which is what we want.

NOTE: Yes, you read that right. In Rubinius, you can obtain a properly object-oriented backtrace object, where each entry is a Location object. Check out kernel/common/location.rb to see the definition of the class.

In this case, if we pass true, to Rubinius::VM.backtrace, each Location will have a method called variables, which is a Rubinus::VariableScope object.

def caller_locals
  variables = Rubinius::VM.backtrace(2, true).first.variables
end

The next thing you need to know is that each block in a method has its own VariableScope and a parent pointing at the VariableScope for the parent block.

Each VariableScope also has a pointer to its method, an object that, among other things, keeps a list of the local variable names. These lists are both ordered the same.

To collect up the locals from the caller, we will walk this chain of VariableScope objects until we get nil.

def caller_locals
  variables = Rubinius::VM.backtrace(2, true).first.variables
 
  locals = []
 
  while variables
    names  = variables.method.local_names
    values = variables.locals
 
    locals << Hash[*names.zip(values).flatten]
    variables = variables.parent
  end
 
  locals
end

This will return an Array of local variable Hashes, starting from the caller’s direct scope and working up to the method body itself. Let’s change the original method to use caller_locals:

def say_hello
  # this isn't working as we expect, and we want to look at
  # the local variables of the calling method to see what's
  # going on. Let's use caller_locals.
  p caller_locals
  yield
ensure
  puts "Goodbye"
end

If we run the program now, the output will be:

[{:b=>1}, {:a=>"Hello"}]
Goodbye
[{:b=>2}, {:a=>"Hello"}]
Goodbye
[{:b=>3}, {:a=>"Hello"}]
Goodbye

This is just the tip of the iceberg of what is possible because of Rubinius’ architecture. To be clear, these features are not accidents; because so much of Rubinius (including things like backtraces), it is forced to expose details of the internals to normal Ruby program. And because the Rubinius kernel is real Ruby (and not some kind of special Ruby subset), anything that works in the kernel will work in your Ruby code as well.

Ruby 2.0 Refinements in Practice

First Shugo announced them at RubyKaigi. Then Matz showed some improved syntax at RubyConf. But what are refinements all about, and what would they be used for?

The first thing you need to understand is that the purpose of refinements in Ruby 2.0 is to make monkey-patching safer. Specifically, the goal is to make it possible to extend core classes, but to limit the effect of those extensions to a particular area of code. Since the purpose of this feature is make monkey-patching safer, let’s take a look at a dangerous case of monkey-patching and see how this new feature would improve the situation.

A few months ago, I encountered a problem where some accidental monkey-patches in Right::AWS conflicted with Rails’ own monkey-patches. In particular, here is their code:

unless defined? ActiveSupport::CoreExtensions
  class String #:nodoc:
    def camelize()
      self.dup.split(/_/).map{ |word| word.capitalize }.join('')
    end
  end
end

Essentially, Right::AWS is trying to make a few extensions available to itself, but only if they were not defined by Rails. In that case, they assume that the Rails version of the extension will suffice. They did this quite some time ago, so these extensions represent a pretty old version of Rails. They assume (without any real basis), that every future version of ActiveSupport will return an expected vaue from camelize.

Unfortunately, Rails 3 changed the internal organization of ActiveSupport, and removed the constant name ActiveSupport::CoreExtensions. As a result, these monkey-patches got activated. Let’s take a look at what the Rails 3 version of the camelize helper looks like:

class String
  def camelize(first_letter = :upper)
    case first_letter
      when :upper then ActiveSupport::Inflector.camelize(self, true)
      when :lower then ActiveSupport::Inflector.camelize(self, false)
    end
  end
end
 
module ActiveSupport
  module Inflector
    extend self
 
    def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
      if first_letter_in_uppercase
        lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
      else
        lower_case_and_underscored_word.to_s[0].chr.downcase + camelize(lower_case_and_underscored_word)[1..-1]
      end
    end
  end
end

There are a few differences here, but the most important one is that in Rails, “foo/bar” becomes “Foo::Bar”. The Right::AWS version converts that same input into “Foo/bar”.

Now here’s the wrinkle. The Rails router uses camelize to convert controller paths like “admin/posts” to “Admin::Posts”. Because Right::AWS overrides camelize with this (slightly) incompatible implementation, the Rails router ends up trying to find an “Admin/posts” constant, which Ruby correctly complains isn’t a valid constant name. While situations like this are rare, it’s mostly because of an extremely diligent library community, and a general eschewing of applying these kinds of monkey-patches in library code. In general, Right::AWS should have done something like Right::Utils.camelize in their code to avoid this problem.

Refinements allow us to make these kinds of aesthetically pleasing extensions for our own code with the guarantee that they will not affect any other Ruby code.

First, instead of directly reopening the String class, we would create a refinement in the ActiveSupport module:

module ActiveSupport
  refine String do
    def camelize(first_letter = :upper)
      case first_letter
        when :upper then ActiveSupport::Inflector.camelize(self, true)
        when :lower then ActiveSupport::Inflector.camelize(self, false)
      end
    end
  end
end

What we have done here is define a String refinement that we can activate elsewhere with the using method. Let’s use the refinement in the router:

module ActionDispatch
  module Routing
    class RouteSet
      using ActiveSupport
 
      def controller_reference(controller_param)
        unless controller = @controllers[controller_param]
          controller_name = "#{controller_param.camelize}Controller"
          controller = @controllers[controller_param] =
            ActiveSupport::Dependencies.ref(controller_name)
        end
        controller.get
      end      
    end
  end
end

It’s important to note that the refinement only applies to methods physically inside the same block. It will not apply to other methods in ActionDispatch::Routing::RouteSet defined in a different block. This means that we can use different refinements for different groups of methods in the same class, by defining the methods in different class blocks, each with their own refinements. So if I reopened the RouteSet class somewhere else:

module ActionDispatch
  module Routing
    class RouteSet
      using RouterExtensions
 
      # I can define a special version of camelize that will be used
      # only in methods defined in this physical block
      def route_name(name)
        name.camelize
      end
    end
  end
end

Getting back to the real-life example, even though Right::AWS created a global version of camelize, the ActiveSupport version (applied via using ActiveSupport) will be used. This means that we are guaranteed that our code (and only our code) uses the special version of camelize.

It’s also important to note that only explicit calls to camelize in the physical block will use the special version. For example, let’s imagine that some library defines a global method called constantize, and uses a camelize refinement:

module Protection
  refine String do
    def camelize()
      self.dup.split(/_/).map{ |word| word.capitalize }.join('')
    end
  end
end
 
class String #:nodoc:
  using Protection
 
  def constantize
    Object.module_eval("::#{camelize}", __FILE__, __LINE__)
  end
end

Calling String#constantize anywhere will internally call the String#camelize from the Protection refinement to do some of its work. Now let’s say we create a String refinement with an unusual camelize method:

module Wycats
  refine String do
    def camelize
      result = dup.split(/_/).map(&:capitalize).join
      "_#{result}_"
    end
  end
end
 
module Factory
  using Wycats
 
  def self.create(class_name, string)
    klass = class_name.constantize
    klass.new(string.camelize)
  end
end
 
class Person
  def initialize(string)
    @string = string
  end
end
 
Factory.create("Person", "wycats")

Here, the Wycats refinement should not leak into the call to constantize. If it did, it would mean that any call into any method could leak a refinement into that method, which is the opposite of the purpose of the feature. Once you realize that refinements apply lexically, they create a very orderly, easy to understand way to apply targeted monkey patches to an area of code.

In my opinion, the most important feature of refinements is that you can see the refinements that apply to a chunk of code (delineated by a physical class body). This allows you to be sure that the changes you are making only apply where you want them to apply, and makes refinements a real solution to the general problem of wanting aesthetically pleasing extensions with the guarantee that you can’t break other code. In addition, refinements protect diligent library authors even when other library authors (or app developers) make global changes, which makes it possible to use the feature without system-wide adoption. I, for one, am looking forward to it.

Postscript

There is one exception to the lexical rule, which is that refinements are inherited from the calling scope when using instance_eval. This actually gives rise to some really nice possibilities, which I will explore in my next post.

Bundler: As Simple as What You Did Before

One thing we hear a lot by people who start to use bundler is that the workflow is more complicated than it used to be when they first start. Here’s one (anonymized) example: “Trying out Bundler to package my gems. Still of the opinion its over-complicating a relatively simple concept, I just want to install gems.”

Bundler has a lot of advanced features, and it’s definitely possible to model fairly complex workflows. However, we designed the simple case to be extremely simple, and to usually be even less work than what you did before. The problem often comes when trying to handle a slightly off-the-path problem, and using a much more complex solution than you need to. This can make *everything* much more complicated than it needs to be.

In this post, I’ll walk through the bundler happy path, and show some design decisions we made to keep things moving as smoothly as before, but with far fewer snags and problems than the approach you were using before. I should be clear that there are probably bugs in some cases when using a number of advanced features together, and we should fix those bugs. However, they don’t reflect core design decisions of bundler.

In the Beginning

When you create a Rails application for the first time, the Rails installer creates a Gemfile for you. You’ll note that you don’t actually have to run bundle install before starting the Rails server.

$ gem install rails
Successfully installed activesupport-3.0.0
Successfully installed builder-2.1.2
Successfully installed i18n-0.4.1
Successfully installed activemodel-3.0.0
Successfully installed rack-1.2.1
Successfully installed rack-test-0.5.6
Successfully installed rack-mount-0.6.13
Successfully installed tzinfo-0.3.23
Successfully installed abstract-1.0.0
Successfully installed erubis-2.6.6
Successfully installed actionpack-3.0.0
Successfully installed arel-1.0.1
Successfully installed activerecord-3.0.0
Successfully installed activeresource-3.0.0
Successfully installed mime-types-1.16
Successfully installed polyglot-0.3.1
Successfully installed treetop-1.4.8
Successfully installed mail-2.2.6.1
Successfully installed actionmailer-3.0.0
Successfully installed thor-0.14.2
Successfully installed railties-3.0.0
Successfully installed rails-3.0.0
22 gems installed
$ rails s
=> Booting WEBrick
=> Rails 3.0.0 application starting in development on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
[2010-09-30 12:25:51] INFO  WEBrick 1.3.1
[2010-09-30 12:25:51] INFO  ruby 1.9.2 (2010-08-18) [x86_64-darwin10.4.0]
[2010-09-30 12:25:51] INFO  WEBrick::HTTPServer#start: pid=26380 port=3000

If you take a look at your directory, you’ll see that Bundler noticed that you didn’t yet have a Gemfile.lock, saw that you had all the gems that you needed already in your system, and created a Gemfile.lock for you. If you didn’t have a necessary gem, the Rails server would error out. For instance, if you were missing Erubis, you’d get Could not find RubyGem erubis (~> 2.6.6). In this case, you’d run bundle install to install the gems.

As you can see, while bundler is providing some added features for you. If you take your application to a new development or production machine with the Gemfile.lock, you will always use exactly the same gems. Additionally, bundler will resolve all your dependencies at once, completely eliminating the problem exemplified by this thread on the Rails core mailing list. But you didn’t have to run bundle install or any bundle command unless you were missing some needed gems. This makes the core experience of bundler the same as “I just want to install gems”, while adding some new benefits.

Adding Gems

When you add a gem that is already installed on your system, you just add it to the Gemfile and start the server. Bundler will automatically pick it up and add it to the Gemfile.lock. Here again, you don’t need to run bundle install. To install the gem you added, you can run bundle install or you can even just use gem install to install the missing gems.

Once again, while bundler is handling a lot for you behind the scenes (ensuring compatibility of gems, tracking dependencies across machines), if you have all the gems that you need (and specified in your Gemfile) already installed on your system, no additional commands are required. If you don’t, a simple bundle install will get you up to date.

Deployment

After developing your application, you will probably need to deploy it. With bundler, if you are deploying your application with capistrano, you just add require "bundler/capistrano" to your deploy.rb. This will automatically install the gems in your Gemfile, using deployment-friendly settings. That’s it!

Before bundler, you had two options:

  • Most commonly, “make sure that all the gems I need are on the remote system”. This could involve including the gem install in the capistrano task, or even sshing into the remote system to install the needed gems. This solution would work, but would often leave no trail about what exactly happened. This was especially problematic for applications with a lot of dependencies, or applications with sporadic maintenance
  • Using the GemInstaller gem, which allowed you to list the gems you were using, and then ran gem install on each of the gems. Heroku used a similar approach, with a .gems manifest. While this solved the problem of top-level dependencies, it did not lock in dependencies of dependencies. This caused extremely common problems with rack and activesupport, and a long tail of other issues. One particularly egregious example is DreamHost, which directs users to use a brittle patch to Rails

In both of these cases, there was no guarantee that the gems you were using in development (including dependencies of dependencies like Rack and Active Support) remained the same after a deploy. One recent example that I saw was that carrierwave added an updated dependency on the newest activesupport, which depends on Ruby 1.8.7. Without making any change to his .gems manifest, Heroku failed to deploy his application on their Ruby 1.8.6 stack.

With bundler, this sort of problem can never happen, because you’re always running exactly the same versions of third-party code in production as you are in development and staging. This should increase your confidence in deployments, especially when you come back to an application months after it was originally developed.

Tweaked Gems

Some times, you will need to make small changes to one of your dependencies, and deploy your application with those changes. Of course, in some cases you can make the tweaks via monkey-patching, but in other cases, the only real solution is to change the gem itself.

With bundler, you can simply fork the gem on Github, make your changes, and change the gem line in your Gemfile to include a reference to the git repository, like this:

# change this
gem "devise"
 
# to this
gem "devise", :git => "git://github.com/plataformatec/devise.git"

This behaves exactly the same as a gem version, but uses your tweaks from Github. It participates in dependency resolution just like a regular gem, and can be installed with bundle install just like a regular gem. Once you’ve added a git repository, you run bundle install to install it (since the normal gem command can’t install git repositories).

You can use :branch, :tag, or :ref flags to specify a particular branch, tag or revision. By default, :git gems use the master branch. If you use master or a :branch, you can update the gem by running bundle update gem_name. This will check out the latest revision of the branch in question.

Conclusion

Using bundler for the sorts of things that you would have handled manually before should be easier than before. Bundler will handle almost all of the automatable record-keeping for you. While doing so, it offers the incredible guarantee that you will always run the same versions of third-party code.

There are a lot of advanced features of Bundler, and you can learn a lot more about them at the bundler web site.

A Tale of Abort Traps (or Always Question Your Assumptions)

For a few months now, the bundler team has been getting intermittent reports of segfaults in C extensions that happen when using bundler with rvm. A cursory investigation revealed that the issue was that the C extensions were compiled for the wrong version of Ruby.

For instance, we would get reports of segfaults in nokogiri when using Ruby 1.9 that resulted from the $GEM_HOME in Ruby 1.9 containing a .bundle file compiled for Ruby 1.8. We got a lot of really angry bug reports, and a lot of speculation that we were doing something wrong that could be obviously fixed.

I finally ran into the issue myself, on my own machine a couple days ago, and tracked it down, deep into the fires of Mount Doom. A word of warning: this story may shock you.

It Begins

I usually use rvm for my day-to-day work, but I tend to not use gemsets unless I want a guaranteed clean environment for debugging something. Bundler takes care of isolation, even in the face of a lot of different gems mixed together in a global gemset. Last week, I was deep in the process of debugging another issue, so I was making pretty heavy use of gemset (and rvm gemset empty).

I had switched to Ruby 1.9.2 (final, which had just come out), created a new gemset, and run bundle install on the project I was working on, which included thin in the Gemfile. I had run bundle install to install the gems in the project, and among other things, bundler installed thin “with native extensions”. I use bundler a lot, and I had run this exact command probably thousands of times.

This time, however, I got a segfault in Ruby 1.9, that pointed at the require call to the rubyeventmachine.bundle file.

Debugging

Now that I had the bug on a physical machine, I started debugging. Our working hypothesis was that bundler or Rubygems was somehow compiling gems against Ruby 1.8, even when on Ruby 1.9, but we couldn’t figure out exactly how it could be happening.

The first thing I did was run otool -L on the binary (rubyeventmachine.bundle):

$ otool -L rubyeventmachine.bundle 
rubyeventmachine.bundle:
  /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/libruby.1.dylib
  (compatibility version 1.8.0, current version 1.8.7)
  /usr/lib/libssl.0.9.8.dylib (compatibility version 0.9.8, current version 0.9.8)
  /usr/lib/libcrypto.0.9.8.dylib (compatibility version 0.9.8, current version 0.9.8)
  /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.3)
  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.0.1)
  /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.9.0)

So that’s weird. I had run bundle install from inside an rvm gemset, yet it was being compiled against the libruby that ships with OSX. I had always suspecting that the problem was a leak from another rvm-installed Ruby, so this was definitely a surprise.

Just out of curiosity, I ran which bundle:

$ which bundle
/usr/bin/bundle

Ok, now I knew something was rotten. I printed out the $PATH:

$ echo $PATH
/Users/wycats/.rvm/gems/ruby-1.9.2-p0/bin:...
$ ls /Users/wycats/.rvm/gems/ruby-1.9.2-p0/bin
asdf*             erubis*           rake2thor*        sc-build-number*
autospec*         prettify_json.rb* redcloth*         sc-docs*
bundle*           rackup*           ruby-prof*        sc-gen*
edit_json.rb*     rake*             sc-build*         sc-init*

In other words, a big fat WTF.

I asked around, and some people had the vague idea that there was a $PATH cache in Unix shells. Someone pointed me at this post about clearing the cache.

Sure enough, running hash -r fixed the output of which. I alerted Wayne of rvm to this problem, and he threw in a fix to rvm that cleared the cache when switching rvms. Momentarily, everything seemed fine.

Digging Further

I still didn’t exactly understand how this condition could happen in the first place. When I went digging, I discovered that shells almost uniformly clear the path cache when modifying the $PATH. Since rvm pushes its bin directory onto the $PATH when you switch to a different rvm, I couldn’t understand how exactly this problem was happening.

I read through Chapter 3 of the zsh guide, and it finally clicked:

The way commands are stored has other consequences. In particular, zsh won’t look for a new command if it already knows where to find one. If I put a new ls command in /usr/local/bin in the above example, zsh would continue to use /bin/ls (assuming it had already been found). To fix this, there is the command rehash, which actually empties the command hash table, so that finding commands starts again from scratch. Users of csh may remember having to type rehash quite a lot with new commands: it’s not so bad in zsh, because if no command was already hashed, or the existing one disappeared, zsh will automatically scan the path again; furthermore, zsh performs a rehash of its own accord if $path is altered. So adding a new duplicate command somewhere towards the head of $path is the main reason for needing rehash.

By using the hash command (which prints out all the entries in this cache), I was able to confirm that the same behavior exists in bash, but it seems that the which command (which I was using for testing the problem) implicitly rehashes in bash.

During this time, I also took a look at /usr/bin/bundle, which looks like this:

$ cat /usr/bin/ruby
#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
#
# This file was generated by RubyGems.
#
# The application 'bundler' is installed as part of a gem, and
# this file is here to facilitate running it.
#
 
require 'rubygems'
 
version = ">= 0"
 
if ARGV.first =~ /^_(.*)_$/ and Gem::Version.correct? $1 then
  version = $1
  ARGV.shift
end
 
gem 'bundler', version
load Gem.bin_path('bundler', 'bundle', version)

As you can see, executable wrappers created by Rubygems hardcode the version of Ruby that was used to install them. This is one of the reasons that rvm keeps its own directory for installed executables.

A Working Hypothesis

It took me some time to figure all this out, during which time I was speaking to a bunch of friends and the guys in #zsh. I formed a working hypothesis. I figured that people were doing something like this:

  1. Install bundler onto their system, and use it there
  2. Need to work on a new project, so switch to rvm, and create a new gemset
  3. Run bundle install, resulting in an error
  4. Run gem install bundler, to install bundler
  5. Run bundle install, which works, but has a subtle bug

Let’s walk through each of these steps, and unpack exactly what happens.

Install bundler on their system

This results in an executable at /usr/bin/bundle that looks like this:

#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
#
# This file was generated by RubyGems.
#
# The application 'bundler' is installed as part of a gem, and
# this file is here to facilitate running it.
#
 
require 'rubygems'
 
version = ">= 0"
 
if ARGV.first =~ /^_(.*)_$/ and Gem::Version.correct? $1 then
  version = $1
  ARGV.shift
end
 
gem 'bundler', version
load Gem.bin_path('bundler', 'bundle', version)

There are two important things here. First, it hardcodes the shebang to the system Ruby location. Second, it uses Rubygems to look up the location of the executable shipped with bundler, which will respect GEM_HOME.

Switch to rvm, and create a new gemset

Switching to an rvm gemset does a few relevant things. First, it prepends ~/.rvm/gems/ruby-1.9.2-p0@gemset/bin onto the $PATH. This effectively resets the shell’s built-in command cache. Second, it sets $GEM_HOME to ~/.rvm/gems/ruby-1.9.2-p0. The $GEM_HOME is where Ruby both looks for gems as well as where it installs gems.

Run bundle install, resulting in an error

Specifically,

$ bundle install
/Library/Ruby/Site/1.8/rubygems.rb:777:in `report_activate_error': 
Could not find RubyGem bundler (>= 0) (Gem::LoadError)
	from /Library/Ruby/Site/1.8/rubygems.rb:211:in `activate'
	from /Library/Ruby/Site/1.8/rubygems.rb:1056:in `gem'
	from /usr/bin/bundle:18

Most people don’t take such a close look at this error, interpreting it as the equivalent of command not found: bundle. What’s actually happening is a bit different. Since bundle is installed at /usr/bin/bundle, the shell finds it, and runs it. It uses the system Ruby (hardcoded in its shebang), and the $GEM_HOME set by rvm. Since I just created a brand new gemset, the bundler gem is not found in $GEM_HOME. As a result, the line gem 'bundler', version in the executable fails with the error I showed above.

However, because the shell found the executable, it ends up in the shell’s command cache. You can see the command cache by typing hash. In both bash and zsh, the command cache will include an entry for bundle pointing at /usr/bin/bundle. zsh is a more aggressive about populating the cache, so you’ll see the list of all commands in the system in the command cache (which, you’ll recall, was reset when you first switched into the gemset, because $PATH was altered).

Run gem install bundler, to install bundler

This will install the bundle executable to ~/.rvm/gems/ruby-1.9.2-p0@gemset/bin, which is the first entry on the $PATH. It will also install bundler to the $GEM_HOME.

Run bundle install, which works, but has a subtle bug

Here’s where the fun happens. Since we didn’t modify the cache, or call hash -r, the next call to bundle still picks up the bundle in /usr/bin/bundle, which is hardcoded to use system Ruby. However, it will use the version of Bundler we just installed, since it was installed to $GEM_HOME, and Rubygems uses that to look for gems. In other words, even though we’re using system Ruby, the /usr/bin/bundle executable will use the rvm gemset’s Bundler gem we just installed.

Additionally, because $GEM_HOME points to the rvm gemset, bundler will install all its gems to the rvm gemset. Taken together, these factors make it almost completely irrelevant that the system Ruby was used to install gems. After all, who cares which Ruby installed the gems, as long as they end up in the right place for the rvm’s Ruby.

There are actually two problems. First, as we’ve seen, the version of Ruby that installs a gem also hardcodes itself into the shebang of the executables for the gem.

Try this experiment:

$ rvm use 1.9.2
$ rvm gemset create experiment
$ rvm gemset use experiment
$ /usr/bin/gem install rack
$ cat ~/.rvm/gems/ruby-1.9.2-p0@experiment/bin/rackup
#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
#
# This file was generated by RubyGems.
#
# The application 'rack' is installed as part of a gem, and
# this file is here to facilitate running it.
#
 
require 'rubygems'
 
version = ">= 0"
 
if ARGV.first =~ /^_(.*)_$/ and Gem::Version.correct? $1 then
  version = $1
  ARGV.shift
end
 
gem 'rack', version
load Gem.bin_path('rack', 'rackup', version)

As a result, any gems that bundler installed will put their executables in the gemset’s location, but still be hardcoded to use the system’s Ruby. This is not actually the problem we’re encountering here, but it could cause some weird results.

More concerningly, since our system Ruby installed the gems, it will also link any C extensions it compiles against its own copy of libruby. This shouldn’t be a major issue if you’re using a version of Ruby 1.8 in rvm, but it is a major issue if you’re using Ruby 1.9.

Now, we have a gem stored in your rvm’s Ruby that has a .bundle in it that is linked against the wrong Ruby. When running your application, Ruby 1.9 will try to load it, and kaboom: segfault.

Postscript

The crazy thing about this story is that it’s a lot of factors conspiring to cause problems. The combination of the shell command cache, $GEM_HOME making it look like SYSTEM Ruby was doing the right thing, and hardcoding the version of Ruby in the shebang line of installed gems made this segfault possible.

Thankfully, now that we figured the problem out, the latest version of rvm fixes it. The solution: wrap the gem command in a shell function that calls hash -r afterward.

Amazing.

Using >= Considered Harmful (or, What’s Wrong With >=)

TL;DR Use ~> instead.

Having spent far, far too much time with Rubygems dependencies, and the problems that arise with unusual combinations, I am ready to come right out and say it: you basically never, ever want to use a >= dependency in your gems.

When you specify a dependency for your gem, it should mean that you are fairly sure that the unmodified code in the released gem will continue to work with any future version of the dependency that matches the version you specified. So for instance, let’s take a look at the dependencies listed in the actionpack gem:

activemodel (= 3.0.0.rc, runtime)
activesupport (= 3.0.0.rc, runtime)
builder (~> 2.1.2, runtime)
erubis (~> 2.6.6, runtime)
i18n (~> 0.4.1, runtime)
rack (~> 1.2.1, runtime)
rack-mount (~> 0.6.9, runtime)
rack-test (~> 0.5.4, runtime)
tzinfo (~> 0.3.22, runtime)

Since we release the Rails gems as a unit, we declare hard dependencies on activemodel and activesupport. We declare soft dependencies on builder, erubis, i18n, rack, rack-mount, rack-test, and tzinfo.

You might not know what exactly the ~> version specifier means. Essentially, it decomposes into two specifiers. So ~> 2.1.2 means >= 2.1.2, < 2.2.0. In other words, it means “2.1.x, but not less than 2.1.2″. Specifying ~> 1.0, like many people do for Rack, means “any 1.x”.

You should make your dependencies as soft as the versioning scheme and release practices of your dependencies will allow. If you’re monkey-patching a library outside of its public API (not a very good practice for libraries), you should probably stick with an = dependency.

One thing for certain though: you cannot be sure that your gem works with every future version of your dependencies. Sanely versioned gems take the opportunity of a major release to break things, and until you have actually tested against the new versions, it’s madness to claim compatibility. One example: a number of gems have dependencies on activesupport >= 2.3. In a large number of cases, these gems do not work correctly with ActiveSuport 3.0, since we changed how components of ActiveSupport get loaded to make it easier to cherry-pick.

Now, instead of receiving a version conflict, users of these gems will get cryptic runtime error messages. Even worse, everything might appear to work, until some weird edge-case is exercised in production, and which your tests would have caught.

But What Happens When a New Version is Released?

One reason that people use the activesupport >= 2.3 is that, assuming Rails maintains backward-compatibility, their gem will continue to work in newer Rails environments without any difficulty. If everything happens to work, it saves you the time of running their unit tests against newer versions of dependencies and cutting a new release.

As I said before, this is a deadly practice. By specifying appropriate dependencies (based on your confidence in the underlying library’s versioning scheme), you will have a natural opportunity to run your test suite against the new versions, and release a new gem that you know actually works.

This does mean that you will likely want to release patch releases of old versions of your gem. For instance, if I have AuthMagic 1.0, which worked against Rails 2.3, and I release AuthMagic 2.0 once Rails 3.0 comes out, it makes sense to continue patching AuthMagic 1.0 for a little while, so your Rails 2.3 users aren’t left out in the cold.

Applications and Gemfiles

I should be clear that this versioning advice doesn’t necessarily apply to an application using Bundler. That’s because the Gemfile.lock, which you should check into version control, essentially converts all >= dependencies into hard dependencies. However, because you may want to run bundle update at some point in the future, which will update everything to the latest possible versions, you might want to use version specifiers in your Gemfile that seem likely to work into the future.

Archives

Categories

Meta