Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; he spends his daytime hours at the startup he founded, Tilde Inc.. Yehuda is co-author of best-selling jQuery in Action and Rails 3 in Action. He spends most of his time hacking on open source—his main projects, like Thor, Handlebars and Janus—or traveling the world doing evangelism work. He can be found on Twitter as @wycats and on Github.

RubyGems: Problems and (proposed) Solutions

There’s been a fair bit of discussion around RubyGems lately, and some suggestions about what the core problems with RubyGems are.

People have the general sense that there’s something wrong with dependencies, and that it might have something to do with multiple versions being installed in one repository. It also seems (to people) that having require do magical things is Bad(tm). And in general, people like knowing exactly what versions of things are being loaded.

To some degree, all of these concerns are valid, and led to the rather hackish solution that we distributed with Merb called merb.thor. What we did:

  • Created a manifest for your application that would describe the gems and versions you wanted to use. That same manifest was used at runtime to load those gems.
  • Create a virtual environment just for your application, with the one-version-per-environment rule. This meant that it was always possible to see what versions and gems were being used.
  • Make it reasonably easy to update the local environment when the manifest changes. Make such changes *not* require knowledge of the dependencies and versions of either the old or new gems.

What we did not do:

  • Put all the gems in a single directory, so normal Ruby require would work.

At first glance, this seems like a very good idea. Instead of relying on magical runtime load-path manipulation, just take, for instance, the merb-core gem, and stick it in a top-level. Then add that top-level to the load path and you don’t need Rubygems at runtime.

The problem with this fabulous idea is that there isn’t a consistent way that people use Rubygems. Consider the following scenario:

A gem called “bad-behavior” that has a lib loadpath, but puts server.rb, initializer.rb, and omg.rb at the top-level. In omg.rb, the gem does Dir["#{File.dirname(__FILE__)}/*"].each {|f| require f }. This works fine when the gem actually owns the entire directory. But if you drop the gem into a larger file structure (similar to how other package managers handle the problem), its top-level is now everyone else’s top-level.

Another scenario: A gem called rack-silliness that puts its files in rack/*, and then calls Dir["#{File.dirname(__FILE__)}/*"].each {|f| require f } from rack/silliness.rb. Again, this works fine if the gem owns the entire directory, but if multiple gems put things in rack/*, moving everything to a shared structure will fail.

With all that said, if we *could* use a shared structure, things would automatically fall into place. We wouldn’t need rubygems at runtime. It would be easy to have separate environments with the one-version rule. It would be easy to have local environments. *All within the existing Rubygems structure*.

The solution I promised

So how do we solve this problem? We need to agree to deprecate everything but the following structure for Rubygems:

Given a gem foo, there should be a foo.rb at the top-level, and optionally, a foo directory underneath. No other files or directories are allowed

Update:What I meant here was lib/foo.rb and lib/foo/…, which will be the directory that gets added to the load path. As a result, the vast majority of existing gems would not need to change.

Other solutions that work with Rubygems but use a single shared directory structure *assume* well-behaved gems only. If we could enforce well-behaved gems, we would both have an excellent solution in Rubygems proper, and make it easier for people to build additional solutions and plugins around the gem format.

So here’s my proposal: For the next version of Rubygems, print a warning if installing a gem that does not comply. Over the next few months, get the few existing gem authors who have non-complying gems to release new versions that comply.

At the same time, I will release a gem plugin that provides virtual environments and local environments for Rubygems (I have already been working on this). It will support the one-version rule, named virtual environments, a gem manifest for applications, and gem resolution (thanks to the hard work by Tim Carey-Smith on gem_resolver).

In the interim, we have a slightly clunky solution that will work well. Instead of putting all gems into a single load-path and using that, we leave the current structure (each gem has its own space). Then, when a gem is installed into an environment, we preresolve all load-paths, and keep a list of them. When you switch into an environment, we add those load-paths to the default set of Ruby load-paths, which will behave exactly the same, but still support misbehaving gems.

In the long-term, all gems will be able to live side-by-side in a single load-path, which will allow us to create a cleaner version of the virtual environments (and will improve startup times, especially on JRuby and Google App Engine, but won’t have any user-facing implications).

So, are we up for finally getting our gem packaging format under control?

P.S. I am aware that rip was just announced, and is attempting to do a lot of the same things. This blog post has been a long time coming (the ideas were hatched a year ago, and many are available today as part of Merb). What I’d like to do here is take the good ideas that exist in Merb, rip, and the Python community and make them native to Rubygems, addressing the problems I outlined above that are inherent to the transition. It’s perfectly fine for rip to simply require well-formed gems, but a solution that gets us from here to there as a community is important.

9 Responses to “RubyGems: Problems and (proposed) Solutions”

So what you’re proposing is essentially to push the namespacing problem onto the gem producer, whereas current rubygems makes this the problem of the consumer. The proposed mechanism will also mean conflating gem names with the actual require paths, which is a fairly big change, and perhaps sub-optimal.

One effect would be requires will need to contain the user name for github gem distribution. Another would be gems that provide files at something other than a top-level namespace would need to be flattened. Currently I have installed win32-process, and win32-clipboard, which provide for requiring ‘win32/process’ and ‘win32/clipboard’ respectively.

In which case, if we’re going to have to change the client code require paths, we can simple make any given gem minimially “compliant” by unpacking into a directory based on its gem name and creating the stub ruby file at the top level which loads the gem within. Alternatively, if you want to avoid overriding require, just put them all in the loadpath – $:.unshift(*Dir[File.dirname(__FILE__) + '/gems/*/lib']).

Another thing to consider is auxiliary files like tests, misc data files etc. By putting them in the same foo/ directory, they would be available within the load path, which seems a bit inelegant.

I think it could be more interesting to see more work on the issue of loading multiple versions of a library, or more generally the ability to require a gem (or whatever) into its own isolated namespace (similar to python’s ability to make namespacing the job of a module consumer). I’ve done some work on this exploiting the anonymous module option of Kernel#load, to provide what is essentially, require ‘mycode’, :into => MyModule.

@charles

I don’t really see this as pushing the problem to the producer… in fact, I think it simplifies the choices of the producer, making it easier to make standard gems.

It is true that multiple gems that install into win32/process and win32/clipboard would become win32-process and win32-clipboard. However, the current situation creates potential conflicts, because no top-level name is owned by anyone. As a result, if I write a gem called win32-pid and put something in win32/process, I might stomp over the win32-process gem. Stuff like this happens in the wild.

Auxiliary files would not need to be contained in the lib directory. I was perhaps a bit imprecise. What I meant to say was that only a single directory could be added to the load path, and that directory must contain only the foo directory and foo.rb. This is already what the vast majority of gems do, since it’s a pretty reasonable practice even without enforcing.

I’ve done some work on loading multiple versions of a library. It’s quite possible for many situations, but not all, because it’s possible for loading a library to manipulate global state or run code in the global context. If gems would agree to put all running code inside a single namespace, I have code that would enable loading a gem into a different namespace (but again, this sort of thing would not work for many existing gems because of how dynamic Ruby is).

A bit OT but I’d like to expand gem dependencies talk to include platform dependencies.

If we are looking for low hanging fruit that would make lives a lot easier could we do something more with required_ruby_version to better express Ruby version compatibility from authors? Perhaps make it a requirement or use a less inclusive default like the ~> ruby version at the time of gem build?

I’d love gem list, search etc. to show me only gems expected to work on my platform but myself (like most authors?) don’t even set the gemspec property because of the (current?) > 0 default I think. This would be good first step before attempting a gem install … –test

Obviously this has come about because of Ruby 1.9 but it applies equally to people running any version of Ruby on any platform.

I couldn’t agree more with the warnings for bad convention usage.

Sometimes there are valid reasons for doing this, but clearly it’s rare. When there are, it’s just one or two more files in the top level, for require (more on this later).

The pain point here for a lot of folks I think is testing and managing the load path. Under good advice, they’re told not to modify the load path, but this kind of layout can become a real pain to run libs locally or do exploratory testing work from the repo.

Gems installed that run an extconf.rb or other external builder should be made platform specific before installation. In other words, especially in ~/.gem, they should be installed as -x86* or -java* if they build on those platforms. These gems should also be explicitly interpreter version specific.

The manifest problem reaches a bit farther, however. It’s not just lib dir layout that needs a hike in order to make memory usage sane (on MRI), it’s also adding some new solution for plugin discovery. At present we’re at the crossroads, where rubygems is now running through a lot of code (on YARV) to avoid loading the gemspecs (manifests). This is great, and works well, until one starts searching for plugins. All of a sudden we’re splatting many a string all over ram, just to find a match.

It makes me almost tempted to go down the amalgamite route and to start hiding away parts of the implementation in order to avoid loading the specs. Amusingly it would have made Tims job a lot easier too, being that we could have used a simple cached nested set arrangement in a db, and just selected from it.

Just more food for thought.

My wish for gems…
in no order…

some naming convention–if your gem is named ParseTree, you require ‘parse_tree’ or what not. mime-types you are killing me!

different dependencies based on platform.

ability to have different binaries [i.e. so if I'm in 1.9 and do an install with a binary gem, it won't install a 1.8 binary].

ability to install from a url

searching for gems not be case sensitive.

Sorry they’re not related to your post just wanted to get that off my chest.
-=r

@roger actually, your first wish is handled by my proposal. I’m suggesting that if your gem is named ParseTree, you are only allowed to have ParseTree.rb.

Re: one folder within lib/ – this would prevent the rubygems + hoe plugin mechanism which require a lib/hoe/my_plugin.rb structure. I’ll re-read what you propose again, nonetheless.

Having outlined some existing conflicts, there’s no real need for “plugins” to be within the lib folder of a gem. Why not a “plugins/hoe/my_plugin.rb” structure, for example.

FYI Ryan Davis’ summary of hoe 2′s plugins is at http://blog.zenspider.com/2009/06/hoe-2-electric-boogaloo.html

“What I’d like to do here is take the good ideas that exist in Merb, rip, and the Python community and make them native to Rubygems, addressing the problems I outlined above that are inherent to the transition.”

It wouldn’t do RubyGems any harm If you could also take a page (or two!) out of Perl & CPAN’s book.

/I3az/

One of the things I really liked about the CPAN installation process was that the tests were run for each package that you attemp to install. When there are failures you know up-front rather than later on when you try to use the package. It would be nice if there was a way with either rubygems or rip for the author to specify if you want the tests/specs to be executed prior to installing the package.

Leave a Reply

Archives

Categories

Meta