Named Gem Environments and Bundler

In the beginning, Rubygems made a decision to allow multiple versions of individual gems in the system repository of gems. This allowed people to use whatever versions of gems they needed for individual scripts, without having to partition the gems for specific purposes.

This was a nice starting place. Being able to just install whatever gems into one place and have scripts just work, without having to partition your gems into buckets made Rubygems an extremely pleasant tool to work with. In my opinion, it was a good decision.

As the ecosystem has matured, and more people build gems with dependencies (especially loose dependencies, like "rack", ">= 1.0"), this original sin has introduced the dreaded activation error:

can't activate rspec-rails (= 1.3.2, runtime) for [], 
already activated rspec-rails-2.0.0.beta.6

This error occurs because of the linear way that gems are "activated". For instance, consider the following scenario:

# On your system:
thin (1.2.7)
- daemons (>= 1.0.9)
- eventmachine (>= 0.12.6)
- rack (>= 1.0.0)

actionpack (2.3.5)
- activesupport (= 2.3.5)
- rack (~> 1.0.0)

rack (1.0.0)
rack (1.1.0)
activesupport (2.3.5)
daemons (1.0.9)
eventmachine (0.12.6)

Quickly glancing at these dependencies, you can see that these two gems are "compatible". The gems have the rack dependency in common, and the >= 1.0.0 is compatible with ~> 1.0.0.

However, there are two ways to require these gems. First, you could require actionpack first.

require "action_pack"
# will result in "activating" ActiveSupport 2.3.5 and Rack 1.0.0

require "thin"
# will notice that Rack 1.0.0 is already activated, and satisfies
# the >= 1.0.0 requirement, so just activates daemons and eventmachine

Second, you could require thin first.

require "thin"
# will result in "activating" Rack 1.1.0, Daemons 1.0.9, 
# and EventMachine 0.12.6

require "action_pack"
# will notice that Rack 1.1.0 is already activated, but that
# it is incompatible with ~> 1.0.0. Therefore, it will emit:
# ---
# can't activate rack (~> 1.0.0, runtime) for ["actionpack-2.3.5"], 
# already activated rack-1.1.0 for ["thin-1.2.7"]

In this case, because thin pulled in Rack 1.1, it becomes impossible to load in actionpack, despite the fact that a potentially valid combination exists.

This problem is fundamental to the approach of supporting different versions of the same gem in the system and activating gems linearly. In other words, because no single entity ever has the opportunity to examine the entire list of dependencies, the onus is on the user to make sure that the gems requires are ordered correctly. Sometimes, it means that the user must explicitly require the right version of a child dependency just to make sure that the right versions are loaded.

There are two possible solutions to this problem.

Multiple Named Environments

One solution to this problem is to catch potential conflicts at installation time and ask the user to manage multiple named environments, each with a fully consistent, non-conflicting view of the world.

The best way to implement this looks something like this (I'll use a fictitious gemenv command to illustrate):

$ gemenv install thin
- Installing daemons (1.0.10)
- Installing eventmachine (0.12.10)
- Installing rack (1.1.0)
- Installing thin (1.2.7)

$ gemenv install actionpack -v 2.3.5
- Uninstalling rack (1.1.0)
- Installing rack (1.0.1)
- Installing actionpack (2.3.5)

$ gemenv install rack -v 1.1.0
- rack (1.1.0) conflicts with actionpack (2.3.5)
- if you want rack (1.1.0) and actionpack (2.3.5)
  you will need a new environment.

$ gemenv create rails3
$ gemenv install actionpack -v 3.0.0.beta3
- Installing abstract (1.0.0)
- Installing builder (2.1.2)
- Installing i18n (0.3.7)
- Installing memcache-client (1.8.2)
- Installing tzinfo (0.3.20)
- Installing activesupport (3.0.0.beta3)
- Installing activemodel (3.0.0.beta3)
- Installing erubis (2.6.5)
- Installing rack (1.1.0)
- Installing rack-mount (0.6.3)
- Installing rack-test (0.5.3)
- Installing actionpack (3.0.0.beta3)

$ gemenv use default
$ ruby -e "puts Rack::VERSION"
1.0.1
$ gemenv use rails3
$ ruby -e "puts Rack::VERSION"
1.1.0

Essentially, the single entity with full knowledge of all dependencies is the installer, and the user creates as many environments as he or she needs for the various non-conflicting sets of gems in use.

This works nicely, because it guarantees that once using an environment, all gems available are compatible. Note that the above command is fictitious, but it bears similarity to rip.

Virtual, Anonymous Environments

Another solution, the one bundler uses, is to allow multiple, conflicting versions of gems to exist in the system repository of packages, but to ensure a lack of conflicts by resolving the dependencies used by individual applications.

First, install conflicting gems. In this case, Rails 2.3 and Rails 3.0 require different, incompatible versions of Rack (1.0.x and 1.1.x).

$ gem install rails
... output ...
$ gem install rails -v 3.0.0.beta3
... output ...

Next, specify which version of Rails to use in your application's Gemfile:

gem "rails", "2.3.5"

When running script/server in an app using Bundler, Bundler can determine that Rails needs Rack 1.0.x, and pulls in Rack 1.0.1 from the system. If you had specified gem "rails", "3.0.0.beta3", bundler would have pulled in Rack 1.1.0 from the system.

In essence, we have the same kind of isolation as the fictitious command above, but instead of manually managing named environments, Bundler creates virtual isolated environments based on the list of gems used in your application.

Why Did We Use Virtual, Anonymous Environments?

When considering the tradeoffs between these two solutions, we realized that applications already typically have a list (executable or not) of its dependencies. The gem install command already works great for installing dependencies, and introducing a dependency resolution step there feels awkward and out of place.

Additionally, as an application evolves, it's natural to continue updating its list of gems, keeping a record of the changes as you go. You could keep a separate named environment for each application, but you'd probably want to keep a list of the dependencies in the application anyway so that it's possible to get up and running on a second machine.

In short, since a list of an application's dependencies makes sense anyway, why burden the end-user with the need to maintain separate named environments and manually install gems.

And once we're doing this, why not build a toolchain around installing gems and keeping records of not only the top-level installed gems, but the exact versions of all gems used at a given time? Why not build in support for "edge" gems on your local machine or in git repositories? Why not create workflows for sharing your application with other developers and deploying your application?

As application and gem developers ourselves, we wanted a tool that managed application's dependencies across the lifecycle of an application, in the context of an application.

That said, there is absolutely room for experimentation in this space. Tools that enforce consistency at install time might be extremely appropriate for managing the gems that you use in scripts, but which you don't share with other machines. Context matters.

Postscript

We commonly hear something to the effect of "but why add all this complexity? Without all these features, bundler could be so much smaller!". The truth is that the bundler code itself is under 2,000 lines of code, and hasn't grown a whole lot in the past few major revisions.

The new rpg tool, recently released by the venerable Ryan Tomayko, is also in that range. The original rip (currently in the "classic" branch) was also in that ballpark. The new rip (the current master branch), may well demonstrate that a fully-featured dependency management system for Ruby can be written in many fewer lines of code. If so, I'm excited to see the abstractions that they use to make it so.

But the bottom line is that all of the new package management solutions have done a good job of packing features into a small number of new lines of code, bundler included. By starting with good abstractions, it's often possible to add rich new features without having to add a whole lot of new code (or even by deleting code!). We've certainly found that in our journey with bundler.