Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; he spends his daytime hours at the startup he founded, Tilde Inc.. Yehuda is co-author of best-selling jQuery in Action and Rails 3 in Action. He spends most of his time hacking on open source—his main projects, like Thor, Handlebars and Janus—or traveling the world doing evangelism work. He can be found on Twitter as @wycats and on Github.

Rails Bundling — Revisited

One of the things I spent quite a bit of time on in Merb was trying to get a worker gem bundler that we could be really proud of. Merb had particular problems because of the sheer amount of gems with webbed dependencies, so we hit some limitations of Rubygems quite early. Eventually, we settled on a solution with the following characteristics:

  • A single dependencies file that listed out the required dependencies for the application
  • Only required that the gems you cared about were listed. All dependencies of those gems were resolved automatically
  • A task to go through the dependencies, get the appropriate gems, and install them (merb:gem:install)
  • Gems were installed in a standard rubygems structure inside the application, so normal Rubygems could activate and run them
  • Only the .gem files were stored in source control. These files were expanded out to their full structures on each new machine (via the merb:gem:redeploy task). This allowed us to support native gems without any additional trouble
  • When gems were re-expanded, they took into consideration gems that were already present, meaning that running the deployment task when there were no new gems added to the repo took no time at all (so it could just be added to the normal cap task).

Most importantly, the Merb bundling system relied on a mandatory one-version-per-gem rule that was enforced by keeping the dependencies file in sync with the .gem files in gems/cache. In other words, it would be impossible to have leftover gems or gem activation problems with this system.

There were, however, some flaws. First of all, it was a first pass, before we knew Rubygems all that well. As a result, the code is more clumsy than it needed to be to achieve the task in question. Second, it was coupled to Merb’s dependencies DSL and runtime loader (as well as thor), making it somewhat difficult to port to Rails cleanly. It has since been ported, but it is not really possible to maintain the underlying bundling bits independent of the Rails/Merb parts.

Most importantly, while we did solve the problem of conflicting gems to a reasonable extent, it was still somewhat possible to get into a conflicting state at installation time, even if a possible configuration could be found.

For Rails, we’ve discussed hoisting as much of this stuff as possible into Rubygems itself or a standard library that Rails could interact with, that could also be used by others who wished to bundle gems in with an application. And we have a number of projects at Engine Yard that could benefit from a standard bundler that was not directly coupled with Rails or Merb.

It’s too early to really use it for anything, but Carl and I have made a lot of progress on a gem bundler along these lines. A big part of the reason this is possible is a project I worked on with Tim Carey-Smith a while back (he really did most of the work) called GemResolver. GemResolver takes a set of dependencies and a gem source index and returns back a list of all of the gems, including their dependencies, that need to be installed to satisfy the original list. It does a search of all options, so even if the simple solution would have resulted in the dreaded activation error, it will still be able to find a solution if one exists.

Unlike the Merb bundler, the new bundler does not assume a particular DSL for specifying dependencies, making it suitable for use with Rails, Merb or other projects that wish to have their own DSL for interacting with the bundler. It works as follows:

  • A Manifest object that receives a list of Rubygems sources and dependencies for the application
  • The bundler then fetches the full gem list from each of the sources and resolves the dependencies using GemResolver (which we have merged into the bundler)
  • Once the list is determined, each of the .gem files is retrieved from their sources and stashed
  • Next, each gem is installed, without the need to download their dependencies, since the resolution process has already occurred. This guarantees a single gem per version and a working environment that will not produce activation errors in any circumstance
  • This second step can be run in isolation from the first, so it is possible to expand the gems on remote machines. This means that you can store just the necessary .gem files in version control, and be entirely isolated from network dependencies for deployments
  • Both the fetching and installation steps will not clobber existing .gem files or installed gems, so if there are no new gems, those steps take no time
  • After installation is complete, environment-specific load-path files are created, which means:
  • The bundler will be able to work with or without Rubygems, even though the installed gems are still inside a normal Rubygems structure.

I am providing all this detail for the curious. In the end, as a user, your experience will be quite simple:

  1. List out your dependencies, including what environments those dependencies should be used in
  2. Run rake gem:install
  3. Run your Rails app

In other words, quite similar to the existing gem bundling solution, with fewer warts, and a standard system that you can use outside of Rails if you want to.

7 Responses to “Rails Bundling — Revisited”

Sounds like a nice amount of polish ontop of the merb bundler – look forward to testing this out!

Let me ask a question – you really like to do it and maintain it in longterm, or is it just another hype about shit? I don’t intend to be rude, but you guys always start something, it’s very cool, there are blog posts, discussion and so and after some time anyone maintain it. For example Merb, Rack-router, Thor … or am I wrong? (I’d like to believe so).

One of the biggest problem for me was unability to specify gems which should be installed just on development machine and gems for production. It was because of merb.thor parse config/dependencies.rb but do not run merb and I wasn’t able to use Merb.env to determine the actual environment. I’ve develop the solution for installation based on runtime dependencies even before the current solution was out, created lighthouse ticket and send patch, but anyone care … later I created another ticket to warn about the problems, someone at least noticed and confirmed the ticket, but it was the only thing what happen with this ticket for fucking long time

@be4ce: You may not intend to be rude, but you’re doing a good job of it anyway. Yehuda is telling you about code he’s writing and will be giving to you and everyone else for free. You don’t have to like his code or use his code or think he’s smart, but to come onto his blog and talk as if he owes you anything shows a shocking lack of understanding as to how open source works.

And if you really think he’s full of it, you ought to show him up by releasing something that solves the same problem, and does it better. Code speaks.

A couple of questions I have: will this handle gem dependencies *before* loading the app environment (and initializing plugins). And what about local rake files with gem dependencies?

From your post, it sounds like the answer to my first question is “yes”. But I just want to confirm, because that’s one of the biggest shortcomings of the current “config.gem” system.

In order to successfully *load* all of ‘lib/tasks/**/*.rake’, I need to be sure that certain of my gem dependencies exist. If they don’t, then “rake gems:install” crashes. One (ugly hack) solution would be to wrap any rake files that have external dependencies in a begin…rescue block. That’s what I currently do. Another (more elegant?) solution would be to handle gem dependencies prior to loading lib/tasks/**/*.rake. If a gem dependency (that has been flagged as necessary for loading rake) is missing, then simply don’t load lib/tasks/**/*.rake (and print out an warning message).

Both of these would require pulling the gem dependencies out of config/environment.rb and putting them into a separate config file.

At any rate, I’m glad to hear that you’re tackling this! It’s an important issue, and I suspect you’ll do a good job with it.

Good stuff. Is this source anywhere public yet?

Katz, I heard you talk about this at WindyCityRails last weekend and just want you to know this feature alone would make going through an upgrade to Rails 3 worthwhile.

The disaster that is config.gem needs to die. I thought about this a bit awhile ago and think the only way for this to work properly is to be in a separate file with the dependencies there, and am quite happy to see that’s what you found as well.

@chad (or anyone who comes across this) you can get it at http://github.com/wycats/bundler/tree (or since github is down right now http://google.com/search?q=cache%3Ahttp%3A%2F%2Fgithub.com%2Fwycats%2Fbundler%2Ftree )

Leave a Reply

Archives

Categories

Meta