RubyGems: Problems and (proposed) Solutions

There's been a fair bit of discussion around RubyGems lately, and some suggestions about what the core problems with RubyGems are.

People have the general sense that there's something wrong with dependencies, and that it might have something to do with multiple versions being installed in one repository. It also seems (to people) that having require do magical things is Bad(tm). And in general, people like knowing exactly what versions of things are being loaded.

To some degree, all of these concerns are valid, and led to the rather hackish solution that we distributed with Merb called merb.thor. What we did:

Created a manifest for your application that would describe the gems and versions you wanted to use. That same manifest was used at runtime to load those gems.
Create a virtual environment just for your application, with the one-version-per-environment rule. This meant that it was always possible to see what versions and gems were being used.
Make it reasonably easy to update the local environment when the manifest changes. Make such changes *not* require knowledge of the dependencies and versions of either the old or new gems.

What we did not do:

Put all the gems in a single directory, so normal Ruby require would work.

At first glance, this seems like a very good idea. Instead of relying on magical runtime load-path manipulation, just take, for instance, the merb-core gem, and stick it in a top-level. Then add that top-level to the load path and you don't need Rubygems at runtime.

The problem with this fabulous idea is that there isn't a consistent way that people use Rubygems. Consider the following scenario:

A gem called "bad-behavior" that has a lib loadpath, but puts server.rb, initializer.rb, and omg.rb at the top-level. In omg.rb, the gem does Dir["#{File.dirname(FILE)}/*"].each {|f| require f }. This works fine when the gem actually owns the entire directory. But if you drop the gem into a larger file structure (similar to how other package managers handle the problem), its top-level is now everyone else's top-level.

Another scenario: A gem called rack-silliness that puts its files in rack/, and then calls Dir["#{File.dirname(FILE)}/"].each {|f| require f } from rack/silliness.rb. Again, this works fine if the gem owns the entire directory, but if multiple gems put things in rack/*, moving everything to a shared structure will fail.

With all that said, if we could use a shared structure, things would automatically fall into place. We wouldn't need rubygems at runtime. It would be easy to have separate environments with the one-version rule. It would be easy to have local environments. All within the existing Rubygems structure.

The solution I promised

So how do we solve this problem? We need to agree to deprecate everything but the following structure for Rubygems:

Given a gem foo, there should be a foo.rb at the top-level, and optionally, a foo directory underneath. No other files or directories are allowed

Update:What I meant here was lib/foo.rb and lib/foo/..., which will be the directory that gets added to the load path. As a result, the vast majority of existing gems would not need to change.

Other solutions that work with Rubygems but use a single shared directory structure assume well-behaved gems only. If we could enforce well-behaved gems, we would both have an excellent solution in Rubygems proper, and make it easier for people to build additional solutions and plugins around the gem format.

So here's my proposal: For the next version of Rubygems, print a warning if installing a gem that does not comply. Over the next few months, get the few existing gem authors who have non-complying gems to release new versions that comply.

At the same time, I will release a gem plugin that provides virtual environments and local environments for Rubygems (I have already been working on this). It will support the one-version rule, named virtual environments, a gem manifest for applications, and gem resolution (thanks to the hard work by Tim Carey-Smith on gem_resolver).

In the interim, we have a slightly clunky solution that will work well. Instead of putting all gems into a single load-path and using that, we leave the current structure (each gem has its own space). Then, when a gem is installed into an environment, we preresolve all load-paths, and keep a list of them. When you switch into an environment, we add those load-paths to the default set of Ruby load-paths, which will behave exactly the same, but still support misbehaving gems.

In the long-term, all gems will be able to live side-by-side in a single load-path, which will allow us to create a cleaner version of the virtual environments (and will improve startup times, especially on JRuby and Google App Engine, but won't have any user-facing implications).

So, are we up for finally getting our gem packaging format under control?

P.S. I am aware that rip was just announced, and is attempting to do a lot of the same things. This blog post has been a long time coming (the ideas were hatched a year ago, and many are available today as part of Merb). What I'd like to do here is take the good ideas that exist in Merb, rip, and the Python community and make them native to Rubygems, addressing the problems I outlined above that are inherent to the transition. It's perfectly fine for rip to simply require well-formed gems, but a solution that gets us from here to there as a community is important.

RubyGems: Problems and (proposed) Solutions

The solution I promised

Enjoy this Post?