Rails Bundling -- Revisited

One of the things I spent quite a bit of time on in Merb was trying to get a worker gem bundler that we could be really proud of. Merb had particular problems because of the sheer amount of gems with webbed dependencies, so we hit some limitations of Rubygems quite early. Eventually, we settled on a solution with the following characteristics:

  • A single dependencies file that listed out the required dependencies for the application
  • Only required that the gems you cared about were listed. All dependencies of those gems were resolved automatically
  • A task to go through the dependencies, get the appropriate gems, and install them (merb:gem:install)
  • Gems were installed in a standard rubygems structure inside the application, so normal Rubygems could activate and run them
  • Only the .gem files were stored in source control. These files were expanded out to their full structures on each new machine (via the merb:gem:redeploy task). This allowed us to support native gems without any additional trouble
  • When gems were re-expanded, they took into consideration gems that were already present, meaning that running the deployment task when there were no new gems added to the repo took no time at all (so it could just be added to the normal cap task).

Most importantly, the Merb bundling system relied on a mandatory one-version-per-gem rule that was enforced by keeping the dependencies file in sync with the .gem files in gems/cache. In other words, it would be impossible to have leftover gems or gem activation problems with this system.

There were, however, some flaws. First of all, it was a first pass, before we knew Rubygems all that well. As a result, the code is more clumsy than it needed to be to achieve the task in question. Second, it was coupled to Merb's dependencies DSL and runtime loader (as well as thor), making it somewhat difficult to port to Rails cleanly. It has since been ported, but it is not really possible to maintain the underlying bundling bits independent of the Rails/Merb parts.

Most importantly, while we did solve the problem of conflicting gems to a reasonable extent, it was still somewhat possible to get into a conflicting state at installation time, even if a possible configuration could be found.

For Rails, we've discussed hoisting as much of this stuff as possible into Rubygems itself or a standard library that Rails could interact with, that could also be used by others who wished to bundle gems in with an application. And we have a number of projects at Engine Yard that could benefit from a standard bundler that was not directly coupled with Rails or Merb.

It's too early to really use it for anything, but Carl and I have made a lot of progress on a gem bundler along these lines. A big part of the reason this is possible is a project I worked on with Tim Carey-Smith a while back (he really did most of the work) called GemResolver. GemResolver takes a set of dependencies and a gem source index and returns back a list of all of the gems, including their dependencies, that need to be installed to satisfy the original list. It does a search of all options, so even if the simple solution would have resulted in the dreaded activation error, it will still be able to find a solution if one exists.

Unlike the Merb bundler, the new bundler does not assume a particular DSL for specifying dependencies, making it suitable for use with Rails, Merb or other projects that wish to have their own DSL for interacting with the bundler. It works as follows:

  • A Manifest object that receives a list of Rubygems sources and dependencies for the application
  • The bundler then fetches the full gem list from each of the sources and resolves the dependencies using GemResolver (which we have merged into the bundler)
  • Once the list is determined, each of the .gem files is retrieved from their sources and stashed
  • Next, each gem is installed, without the need to download their dependencies, since the resolution process has already occurred. This guarantees a single gem per version and a working environment that will not produce activation errors in any circumstance
  • This second step can be run in isolation from the first, so it is possible to expand the gems on remote machines. This means that you can store just the necessary .gem files in version control, and be entirely isolated from network dependencies for deployments
  • Both the fetching and installation steps will not clobber existing .gem files or installed gems, so if there are no new gems, those steps take no time
  • After installation is complete, environment-specific load-path files are created, which means:
  • The bundler will be able to work with or without Rubygems, even though the installed gems are still inside a normal Rubygems structure.

I am providing all this detail for the curious. In the end, as a user, your experience will be quite simple:

  1. List out your dependencies, including what environments those dependencies should be used in
  2. Run rake gem:install
  3. Run your Rails app

In other words, quite similar to the existing gem bundling solution, with fewer warts, and a standard system that you can use outside of Rails if you want to.