Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

Archive for December, 2010

Clarifying the Roles of the .gemspec and Gemfile

TL;DR

Although apps and gems look like they share the concept of “dependency”, there are some important differences between them. Gems depend on a name and version range, and intentionally don’t care about where exactly the dependencies come from. Apps have more controlled deployments, and need a guarantee that the exact same code is used on all machines (dev, ci and production).

When developing a gem, use the gemspec method in your Gemfile to avoid duplication. In general, a gem’s Gemfile should contain the Rubygems source and a single gemspec line. Do not check your Gemfile.lock into version control, since it enforces precision that does not exist in the gem command, which is used to install gems in practice. Even if the precision could be enforced, you wouldn’t want it, since it would prevent people from using your library with versions of its dependencies that are different from the ones you used to develop the gem.

When developing an app, check in your Gemfile.lock, since you will use the bundler tool across all machines, and the precision enforced by bundler is extremely desirable for applications.


Since my last post on this topic, a lot of people have asked follow-up questions about the specific roles of the Gemfile, used by bundler, and the .gemspec, used by Rubygems. In short, people seem to feel that there is duplication between the two files during gem development, and end up using yet another tool to reduce the duplication.

Ruby Libraries

Ruby libraries are typically packaged up using the gem format, and distributed using rubygems.org. A gem contains a number of useful pieces of information (this list is not exhaustive):

  • A bunch of metadata, like name, version, description, summary, email address, etc.
  • A list of all the files contained in the package.
  • A list of executables that come with the gem and their location in the gem (usually bin)
  • A list of directories that should be put on the Ruby load path (usually lib)
  • A list of other Ruby libraries that this library needs in order to function (dependencies)

Only the last item, the list of dependencies, overlaps with the purpose of the Gemfile. When a gem lists a dependency, it lists a particular name and a range of version numbers. Importantly, it does not care where the dependency comes from. This makes it easy to internally mirror a Rubygems repository or use a hacked version of a dependency that you install to your system. In short, a Rubygems dependency is a symbolic name. It is not a link to a particular location on the Internet where the code can be found.

Among other things, this characteristic is responsible for the resilience of the overall system.

Also, gem authors choose a range of acceptable versions of their dependencies, and have to operate under the assumption that the versions of the dependencies that the author tested against may change before the gem is used in deployment. This is especially true about dependencies of dependencies.

Ruby Applications

A Ruby application (for instance, a Rails application), has a complex set of requirements that are generally tested against a precise set of third-party code. While the application author may not have cared much (up front), which version of nokogiri the application used, he does care a lot that the version used in development will remain the version used in production. Since the application author controls the deployment environment (unlike the gem author), he does not need to worry about the way that a third-party will use the code. He can insist on precision.

As a result, the application author will want to specify the precise gem servers that the deployment environment should use. This is in contrast to the gem author, who should prefer long-term resilience and mirroring to the brittleness that comes with a short-term guarantee.

The application author often also wants a way to tweak shared community gems for the particular project at hand (or just use the “edge” version of a gem which hasn’t yet been released). Bundler handles this problem by allowing application authors to override a particular gem and version with a git repository or a directory. Because the gem author has specified the gem’s dependencies as a name and version range, the application developer can choose to satisfy that dependency from an alternate source (in this case, git).

Now, the application developer must be able to demand that the gem comes from that particular source. As you can see, there are critical differences between the desires of the gem developer (which are focused around longevity and flexibility) and the app developer (which are focused around absolute guarantees that all of the code, including third-party code, remains precisely the same between environments).

Rubygems and Bundler

The Rubygems libraries, CLI and gemspec API are built around the needs of the gem author and the Rubygems ecosystem. In particular, the gemspec is a standard format for describing all of the information that gets packed with gems then deployed to rubygems.org. For the reasons described above, gems do not store any transient information about where to find dependencies.

On the other hand, bundler is built around the needs of the app developer. The Gemfile does not contain metadata, a list of files in the package, executables exposed by the package, or directories that should be put on the load path. Those things are outside the scope of bundler, which focuses on making it easy to guarantee that all of the same code (including third-party code) is used across machines. It does, however, contain a list of dependencies, including information about where to find them.

Git “Gems”

For the reasons described above, it can sometimes be useful for application developers to declare a dependency on a gem which has not yet been released. When bundler fetches that gem, it uses the .gemspec file found in the git repository’s root to extract its metadata as well as discover additional dependencies declared by the in-development gem. This means that git gems can seamlessly be used by bundler, because their .gemspec allows them to look exactly like normal gems (including, importantly, having more dependencies of its own). The only difference is that the application developer has told bundler exactly where to look for the gem in question.

Gem Development

Active development on gems falls in between these two views of the world, for two reasons:

  • Active development on a gem often involves dependencies that have not yet been released to rubygems.org, so it becomes important to specify the precise location to find those dependencies, during development
  • During gem development, you cannot rely on rubygems to set up an environment with everything on the load path. Instead, you need a hybrid environment, with one gem coming from the file system (the current gem under development), some gems possibly coming from git (where you depend on a gem that hasn’t yet been released), and yet additional gems coming from traditional sources. This is the situation that Rails development is in.

Because gems under active development will eventually become regular gems, they will need to declare all of the regular metadata needed by gem build. This information should be declared in a .gemspec, since the Gemfile is an API for declaring dependencies only.

In order to make it easy to use the bundler tool for gem development without duplicating the list of dependencies in both the .gemspec and the Gemfile, bundler has a single directive that you can use in your Gemfile (documented on the bundler site):

source "http://www.rubygems.org"
 
gemspec

The gemspec line tells bundler that it can fine a .gemspec file alongside the Gemfile. When you run bundle install, bundler will find the .gemspec and treat the local directory as a local, unpacked gem. It will find and resolve the dependencies listed in the .gemspec. Bundler’s runtime will add the load paths listed in the .gemspec to the load path, as well as the load paths of any of the gems listed as dependencies (and so on). So you get to use bundler while developing a gem with no duplication.

If you find that you need to develop against a gem that hasn’t yet been released (for instance, Rails often develops against unreleased Rack, Arel, or Mail gems), you can add individual lines in the Gemfile that tell bundler where to find the gems. It will still use the dependencies listed in the .gemspec for dependency resolution, but it now knows exactly where to find the dependency during gem development.

source "http://www.rubygems.org"
 
gemspec
 
# if the .gemspec in this git repo doesn't match the version required by this
# gem's .gemspec, bundler will print an error
gem "rack", :git => "git://github.com/rack/rack.git"

You wouldn’t want to include this information in the .gemspec, which will eventually be released to Rubygems after the in-development gem has also been released. Again, this makes the overall system more resilient, as it doesn’t depend on transient external URLs. This information is purely something used to set up a complete environment during development, which requires some precision.

This is also why we recommend that people do not check in their Gemfile.lock files in gem development. That file is emitted by bundler, and guarantees that all gems, including dependencies of dependencies, remain the same. However, when doing gem development, you want to know immediately when some change in the overall ecosystem breaks your setup. While you may want to insist on using a particular gem from its git location, you do not want to hardcode your development to a very specific set of gems, only to find out later that a gem released after you ran bundle install, but compatible with your version range, doesn’t work with your code.

It’s not a way to guarantee that your code works on all versions that you specified in your .gemspec, but the incentive to lock down the exact versions of each gem simply doesn’t exist when the users of your code will get it via gem install.

Update

Just to clarify, you should check in your Gemfile.lock in applications, and not in gems. The Gemfile.lock remembers the exact versions and sources of every piece of third-party code that you use. For applications, you want this. When developing a gem, this can obscure issues that will occur because gems are deployed (using the gem command) without the benefit of bundler.

Archives

Categories

Meta