Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor.
The Web Doesn’t Suck. Browsers Are Innovating.
April 30th, 2010
This week saw a flurry of back-and-forth about the future of the web platform. In the “web sucks” camp were Sachin Agarwal of Posterous (The Web Sucks. Browsers need to innovate) and Joe Hewitt (Sachin summarized some of his tweets at @joehewitt agrees with me).
Chris Blizzard responded with a few narrow examples of what Firefox has been doing lately (geolocation, orientation, WebGL).
On Hacker News, most of the comments took Sachin to task for his argument that standards don’t matter, pointing people back to the “bad old days” of the browser wars.
In my opinion, both camps kind of miss the point.
Sachin made some very pointed factual claims which are just complete, pure hokum. Close to the top of his post, he says:
Web applications don’t have threading, GPU acceleration, drag and drop, copy and paste of rich media, true offline access, or persistence. Are you kidding me?
Really? Are you kidding me. In fact, browsers have implemented, and the WHAT-WG has been busily standardizing, all of these things for quite a number of years now. Threading? Check. GPU acceleration? Check and check. Drag and drop? Check. Offline access and persistence? Check and check.
Almost universally, these features, which exist in shipping browsers today, were based on experiments conducted years ago by browsers (notably Firefox and Safari, more recently Chrome), and did not wait for approval from the standards bodies to begin implementing.
In fact, large swaths of HTML5 are based on proposals from browsers, who have either already implemented the feature (CSS-based animations and transitions) or who then build the feature without waiting for final approval. And this is transparently obvious to anyone, since HTML5 has not not yet been approved, and some parts are the subject of internal W3C debate, yet the browsers are implementing the parts they like and innovating in other areas.
You can see this explicitly in the features prefixed with -moz or -webkit, because they’re going ahead and implementing features that have not gotten consensus yet.
In 2010, the WHAT-WG is a functioning place to bring a proposal for further review once you’re conducting some experiments or have a working implementation. Nobody is sitting on their hands waiting for final approval of HTML5. Don’t confuse the fact that people are submitting their ideas to the standards bodies with the misguided idea that only approved standards are being implemented.
And in fact, this is basically never how it’s worked. Apple implemented the <canvas> tag and <input type="search" /> in 2004 (here’s a 2004 rant from the editor of the then-brand-new HTML5 spec). Opera and Apple worked on the <video> tag in 2007. The new CSS flexible box model is based on XUL, which Firefox has had for over a decade. HTML5 drag and drop support is based on implementations shipped by the IE team over a decade ago. Firefox has been extending JavaScript for almost its entire history (most recently with JavaScript 1.8).
CSS transition, transforms and animations were built by Apple for the iPhone before they were a spec, and only later ported to the desktop. Firefox did the initial experimentation on WebGL in 2006, back when they were calling it Canvas 3d (here‘s a working add-on by Mozilla developer Vladimir Vukićević).
Google Gears shipped what would become the Application Cache, Web Storage, and Web Workers, proving out the concepts. It also shipped geolocation, which is now part of the HTML5 spec.
@font-face originally shipped in Internet Explorer 5.5. Apple has added new mobile events (touchstart, touchmove, touchend, onorientationchange) with accompanying JavaScript APIs (Apple developer account needed) without any spec activity that I can discern. Firefox, at the same time, added support for hardware accelerometers.
Even Microsoft bucked the standards bodies by shipping its own implementation of the W3C’s “cross-origin-resource-sharing” spec called Cross domain request, and shipped it in IE8.
It’s perfectly understandable that people who haven’t been following along for the past few years might have missed all of this. But the fact remains that browser vendors have been moving at a very rapid clip, implementing all kinds of interesting features, and then sharing them with other browsers for feedback, and, one hopes, consensus.
Some of these implementations suck (I’ve heard some people say not-nice things about WebGL and drag-and-drop, for instance). But that’s quite a bit different from saying that browsers are sitting on their hands waiting for the W3C or WHAT-WG to tell them what to do.
Named Gem Environments and Bundler
April 21st, 2010
In the beginning, Rubygems made a decision to allow multiple versions of individual gems in the system repository of gems. This allowed people to use whatever versions of gems they needed for individual scripts, without having to partition the gems for specific purposes.
This was a nice starting place. Being able to just install whatever gems into one place and have scripts just work, without having to partition your gems into buckets made Rubygems an extremely pleasant tool to work with. In my opinion, it was a good decision.
As the ecosystem has matured, and more people build gems with dependencies (especially loose dependencies, like “rack”, “>= 1.0″), this original sin has introduced the dreaded activation error:
can't activate rspec-rails (= 1.3.2, runtime) for [], already activated rspec-rails-2.0.0.beta.6
This error occurs because of the linear way that gems are “activated”. For instance, consider the following scenario:
# On your system: thin (1.2.7) - daemons (>= 1.0.9) - eventmachine (>= 0.12.6) - rack (>= 1.0.0) actionpack (2.3.5) - activesupport (= 2.3.5) - rack (~> 1.0.0) rack (1.0.0) rack (1.1.0) activesupport (2.3.5) daemons (1.0.9) eventmachine (0.12.6)
Quickly glancing at these dependencies, you can see that these two gems are “compatible”. The gems have the rack dependency in common, and the >= 1.0.0 is compatible with ~> 1.0.0.
However, there are two ways to require these gems. First, you could require actionpack first.
require "action_pack" # will result in "activating" ActiveSupport 2.3.5 and Rack 1.0.0 require "thin" # will notice that Rack 1.0.0 is already activated, and satisfies # the >= 1.0.0 requirement, so just activates daemons and eventmachine
Second, you could require thin first.
require "thin" # will result in "activating" Rack 1.1.0, Daemons 1.0.9, # and EventMachine 0.12.6 require "action_pack" # will notice that Rack 1.1.0 is already activated, but that # it is incompatible with ~> 1.0.0. Therefore, it will emit: # --- # can't activate rack (~> 1.0.0, runtime) for ["actionpack-2.3.5"], # already activated rack-1.1.0 for ["thin-1.2.7"]
In this case, because thin pulled in Rack 1.1, it becomes impossible to load in actionpack, despite the fact that a potentially valid combination exists.
This problem is fundamental to the approach of supporting different versions of the same gem in the system and activating gems linearly. In other words, because no single entity ever has the opportunity to examine the entire list of dependencies, the onus is on the user to make sure that the gems requires are ordered correctly. Sometimes, it means that the user must explicitly require the right version of a child dependency just to make sure that the right versions are loaded.
There are two possible solutions to this problem.
Multiple Named Environments
One solution to this problem is to catch potential conflicts at installation time and ask the user to manage multiple named environments, each with a fully consistent, non-conflicting view of the world.
The best way to implement this looks something like this (I’ll use a fictitious gemenv command to illustrate):
$ gemenv install thin - Installing daemons (1.0.10) - Installing eventmachine (0.12.10) - Installing rack (1.1.0) - Installing thin (1.2.7) $ gemenv install actionpack -v 2.3.5 - Uninstalling rack (1.1.0) - Installing rack (1.0.1) - Installing actionpack (2.3.5) $ gemenv install rack -v 1.1.0 - rack (1.1.0) conflicts with actionpack (2.3.5) - if you want rack (1.1.0) and actionpack (2.3.5) you will need a new environment. $ gemenv create rails3 $ gemenv install actionpack -v 3.0.0.beta3 - Installing abstract (1.0.0) - Installing builder (2.1.2) - Installing i18n (0.3.7) - Installing memcache-client (1.8.2) - Installing tzinfo (0.3.20) - Installing activesupport (3.0.0.beta3) - Installing activemodel (3.0.0.beta3) - Installing erubis (2.6.5) - Installing rack (1.1.0) - Installing rack-mount (0.6.3) - Installing rack-test (0.5.3) - Installing actionpack (3.0.0.beta3) $ gemenv use default $ ruby -e "puts Rack::VERSION" 1.0.1 $ gemenv use rails3 $ ruby -e "puts Rack::VERSION" 1.1.0
Essentially, the single entity with full knowledge of all dependencies is the installer, and the user creates as many environments as he or she needs for the various non-conflicting sets of gems in use.
This works nicely, because it guarantees that once using an environment, all gems available are compatible. Note that the above command is fictitious, but it bears similarity to rip.
Virtual, Anonymous Environments
Another solution, the one bundler uses, is to allow multiple, conflicting versions of gems to exist in the system repository of packages, but to ensure a lack of conflicts by resolving the dependencies used by individual applications.
First, install conflicting gems. In this case, Rails 2.3 and Rails 3.0 require different, incompatible versions of Rack (1.0.x and 1.1.x).
$ gem install rails ... output ... $ gem install rails -v 3.0.0.beta3 ... output ...
Next, specify which version of Rails to use in your application’s Gemfile:
gem "rails", "2.3.5"
When running script/server in an app using Bundler, Bundler can determine that Rails needs Rack 1.0.x, and pulls in Rack 1.0.1 from the system. If you had specified gem "rails", "3.0.0.beta3", bundler would have pulled in Rack 1.1.0 from the system.
In essence, we have the same kind of isolation as the fictitious command above, but instead of manually managing named environments, Bundler creates virtual isolated environments based on the list of gems used in your application.
Why Did We Use Virtual, Anonymous Environments?
When considering the tradeoffs between these two solutions, we realized that applications already typically have a list (executable or not) of its dependencies. The gem install command already works great for installing dependencies, and introducing a dependency resolution step there feels awkward and out of place.
Additionally, as an application evolves, it’s natural to continue updating its list of gems, keeping a record of the changes as you go. You could keep a separate named environment for each application, but you’d probably want to keep a list of the dependencies in the application anyway so that it’s possible to get up and running on a second machine.
In short, since a list of an application’s dependencies makes sense anyway, why burden the end-user with the need to maintain separate named environments and manually install gems.
And once we’re doing this, why not build a toolchain around installing gems and keeping records of not only the top-level installed gems, but the exact versions of all gems used at a given time? Why not build in support for “edge” gems on your local machine or in git repositories? Why not create workflows for sharing your application with other developers and deploying your application?
As application and gem developers ourselves, we wanted a tool that managed application’s dependencies across the lifecycle of an application, in the context of an application.
That said, there is absolutely room for experimentation in this space. Tools that enforce consistency at install time might be extremely appropriate for managing the gems that you use in scripts, but which you don’t share with other machines. Context matters.
Postscript
We commonly hear something to the effect of “but why add all this complexity? Without all these features, bundler could be so much smaller!”. The truth is that the bundler code itself is under 2,000 lines of code, and hasn’t grown a whole lot in the past few major revisions.
The new rpg tool, recently released by the venerable Ryan Tomayko, is also in that range. The original rip (currently in the “classic” branch) was also in that ballpark. The new rip (the current master branch), may well demonstrate that a fully-featured dependency management system for Ruby can be written in many fewer lines of code. If so, I’m excited to see the abstractions that they use to make it so.
But the bottom line is that all of the new package management solutions have done a good job of packing features into a small number of new lines of code, bundler included. By starting with good abstractions, it’s often possible to add rich new features without having to add a whole lot of new code (or even by deleting code!). We’ve certainly found that in our journey with bundler.
Ruby Require Order Problems
April 17th, 2010
Bundler has inadvertantly exposed a number of require order issues in existing gems. I figured I’d take the opportunity to talk about them. There are basically two kinds of gem ordering issues:
Missing Requires
Imagine a gem that uses nokogiri, but never requires it. Instead, it assumes that something that is required before it will do the requiring. This happens relatively often with Rails plugins (which were previously able to assume that they were loaded at a particular time and place).
A well-known example of this problem is gems that didn’t require “yaml” because Rubygems required it. When Rubygems removed its require in 1.3.6, a number of gems broke. In general, bundler assumes that gems require what they need. If gems do so, this class of ordering problems is eliminated.
Constant Definition as API
This is where a gem checked for defined?(SomeConstant) to decide whether to activate some functionality.
This is a bit tricky. Essentially, the gem is saying that the API for using it is “require some other gem before me”. Personally, I consider that to be a problematic API, because it’s very implicit, and can be hard to track down exactly who did what require.
A better solution is to provide an explicit hook for people to activate the optional functionality. For instance, Haml
provides: Haml.init_rails(binding) which you can run after activating Haml.
This is slightly more manual than some would like, but the API of “make sure you require the optional dependencies before me” is also manual, and more error-prone.
Even if Bundler “respected require order”, which we plan to do (in some way) in 0.10, it’s still up to the user of the gem to ensure that they listed the optional gem above the required gem. This is not ideal.
A workaround that works great in Bundler 0.9 is to simply require the order-dependent gems above Bundler.require. We do this in Rails, so that gems can test for the existence of the Rails constant to decide whether to add optional Rails dependencies.
require "rails/all" Bundler.require(:default, Rails.env)
In the case of shoulda and mocha, a better solution could be:
# Gemfile group :test do gem "shoulda" gem "mocha" # possible other gems where order doesn't matter end # application.rb Bundler.require(:default) Bundler.require(Rails.env) unless Rails.env.test? # test_helper.rb # Since the order matters, require these gems manually, in the right order require "shoulda" Bundler.require(:test)
In my opinion, if you have gems that specifically depend on other gems, it is appropriate to manually require them first before automatically requiring things using Bundler.require. You should treat Bundler.require as a shortcut
for listing out a stack of requires.
Possible solutions?
One long-term solution, if we get gem metadata in Rubygems 1.4 (or optional dependencies down the line), would be to specify that a gem has optional dependencies on another gem and specify a file to run both gems are
available.
For instance, the Haml gem could say:
# gemspec s.integrates_with "rails", "~> 3.0.0.beta2", "haml/rails" # haml/rails.rb require "haml" require "rails" Haml.init_rails(binding)
Bundler would handle this in Bundler.require (or possibly Bundler.setup).
If this feature existed in Rubygems, it would work via gem activation. If the Haml gem was activated after the Rails gem, it would require “haml/rails” immediately.
If the Haml gem was activated otherwise, it wouldn’t do anything until the Rails gem was activated. When the Rails gem was activated, it would require the file.
Some of the Problems Bundler Solves
April 12th, 2010
This post does not attempt to convince you to use bundler, or compare it to alternatives. Instead, I will try to articulate some of the problems that bundler tries to solve, since people have often asked. To be clear, users of bundler should not need to understand these issues, but some might be curious.
If you’re looking for information on bundler usage, check out the official Bundler site.
Dependency Resolution
This is the problem most associated with bundler. In short, by asking you to list all of your dependencies in a single manifest, bundler can determine, up front, a valid list of all of the gems and versions needed to satisfy that manifest.
Here is a simple example of this problem:
$ gem install thin Successfully installed rack-1.1.0 Successfully installed eventmachine-0.12.10 Successfully installed daemons-1.0.10 Successfully installed thin-1.2.7 4 gems installed $ gem install rails Successfully installed activesupport-2.3.5 Successfully installed activerecord-2.3.5 Successfully installed rack-1.0.1 Successfully installed actionpack-2.3.5 Successfully installed actionmailer-2.3.5 Successfully installed activeresource-2.3.5 Successfully installed rails-2.3.5 7 gems installed $ gem dependency actionpack -v 2.3.5 Gem actionpack-2.3.5 activesupport (= 2.3.5, runtime) rack (~> 1.0.0, runtime) $ gem dependency thin Gem thin-1.2.7 daemons (>= 1.0.9, runtime) eventmachine (>= 0.12.6, runtime) rack (>= 1.0.0, runtime) $ irb >> require "thin" => true >> require "actionpack" Gem::LoadError: can't activate rack (~> 1.0.0, runtime) for ["actionpack-2.3.5"], already activated rack-1.1.0 for ["thin-1.2.7"]
What happened here?
Thin declares that it can support any version of Rack above 1.0. ActionPack declares that it can support versions 1.0.x of Rack. When we require thin, it looks for the highest version of Rack that thin can support (1.1), and makes it available on the load path. When we require actionpack, it notes that the version of Rack already on the load path (1.1) is incompatible with actionpack (which requires 1.0.x) and throws an exception.
Thankfully, newer versions of Rubygems provide reasonable information about exactly what gem (“thin 1.2.7″) put Rack 1.1.0 on the load path. Unfortunately, there is often nothing the end user can do about it.
Rails could theoretically solve this problem by loosening its Rack requirement, but that would mean that ActionPack declared compatibility with any future version of Rack, a declaration ActionPack is unwilling to make.
The user can solve this problem by carefully ordering requires, but the user is never in control of all requires, so the process of figuring out the right order to require all dependencies can get quite tricky.
It is conceptually possible in this case, but it gets extremely hard when more than a few dependencies are in play (as in Rails 3).
Groups of Dependencies
When writing applications for deployments, developers commonly want to group their dependencies. For instance, you might use SQLite in development but Postgres in production.
For most people, the most important part of the grouping problem is making it possible to install the gems in their Gemfile, except the ones in specific groups. This introduces two additional problems.
First, consider the following Gemfile:
gem "rails", "2.3.5" group :production do gem "thin" end
Bundler allows you to install the gems in a Gemfile minus the gems in a specific group by running bundle install --without production. In this case, since rails depends on Rack, specifying that you don’t want to include thin means no thin, no daemons and no eventmachine but yes rack. In other words, we want to exclude the gems in the group specified, and any dependencies of those gems that are not dependencies of other gems.
Second, consider the following Gemfile:
gem "soap4r", "1.5.8" group :production do gem "dm-salesforce", "0.10.3" end
The soap4r gem depends on httpclient >= 2.1.1, while the dm-salesforce gem depends on httpclient =2.1.5.2. Initially, when you did bundle install --without production, we did not include gems in the production group in the dependency resolution process.
In this case, consider the case where httpclient 2.1.5.2 and httpclient 2.2 exist on Rubyforge.org. In development mode, your app will use the latest version (2.2), but in production, when dm-salesforce is included, the older version will be used.
Note that this happened even though you specified only hard versions at the top level, because not all gems use hard versions as their dependencies.
To solve this problem, Bundler downloads (but does not install) all gems, including gems in groups that you exclude (via --without). This allows you to specify gems with C extensions that can only compile in production (or testing requirements that depend on OSX for compilation) while maintaining a coherent list of gems used across all of these environments.
System Gems
In 0.8 and before, bundler installed all gems in the local application. This provided a neat sandbox, but broke the normal path for running a new Rails app:
$ gem install rails $ rails myapp $ cd myapp $ rails server
Instead, in 0.8, you’d have to do:
$ gem install rails $ rails myapp $ cd myapp $ gem bundle $ rails server
Note that the gem bundle command became bundle install in Bundler 0.9.
In addition, this meant that Bundler needed to download and install commonly used gems over and over again if you were working on multiple apps. Finally, every time you changed the Gemfile, you needed to run gem bundle again, adding a “build step” that broke the flow of early Rails application.
In Bundler 0.9, we listened to this feedback, making it possible for bundler to use gems installed in the system. This meant that the ideal Rails installation steps could work, and you could share common gems between applications.
However, there were a few complications.
Since we now use gems installed in the system, Bundler resolves the dependencies in your Gemfile against your system sources at runtime, making a list of all of the gems to push onto the load path. Calling Bundler.setup kicks off this process. If you specified some gems not to install, we needed to make sure bundler did not try to find those gems in the system.
In order to solve this problem, we create a .bundle directory inside your application that remembers any settings that need to persist across bundler invocations.
Unfortunately, this meant that we couldn’t simply have people run sudo bundle install because root would own your application’s .bundle directory.
On OSX, root owns all paths that are, by default, in $PATH. It also owns the default GEM_HOME. This has two consequences. First, we could not trivially install executables from bundled gems into a system path. Second, we could not trivially install gems into a place that gem list would see.
In 0.9, we solved this problem by placing gems installed by bundler into BUNDLE_PATH, which defaults to ~/.bundle/#{RUBY_ENGINE}/#{RUBY_VERSION}. rvm, which does not install executables or gems into a path owned by root, helpfully sets BUNDLE_PATH to the same location as GEM_HOME. This means that when using rvm, gems installed via bundle install appear in gem list.
This also means that when not using rvm, you need to use bundle exec to place the executables installed by bundler onto the path and set up the environment.
In 0.10, we plan to bump up the permissions (by shelling out to sudo) when installing gems so we can install to the default GEM_HOME and install executables to a location on the $PATH. This will make executables created by bundle install available without bundle exec and will make gems installed by bundle install available to gem list on OSX without rvm.
Another complication: because gems no longer live in your application, we needed a way to snapshot the list of all versions of all gems used at a particular time, to ensure consistent versions across machines and across deployments.
We solved this problem by introducing a new command, bundle lock, which created a file called Gemfile.lock with a serialized representation of all versions of all gems in use.
However, in order to make Gemfile.lock useful, it would need to work in development, testing, and production, even if you ran bundle install --without production in development and then ran bundle lock. Since we had already decided that we needed to download (but not install) gems even if they were excluded by --without, we could easily include all gems (including those from excluded groups) in the Gemfile.lock.
Initially, we didn’t serialize groups exactly right in the Gemfile.lock causing inconsistencies between how groups behaved in unlocked and locked mode. Fixing this required a small change in the lock file format, which caused a small amount of frustration by users of early versions of Bundler 0.9.
Git
Very early (0.5 era) we decided that we would support prerelease “gems” that lived in git repositories.
At first, we figured we could just clone the git repositories and add the lib directory to the load path when the user ran Bundler.setup.
We abstracted away the idea of “gem source”, making it possible for gems to be found in system rubygems, remote rubygems, or git repositories. To specify that a gem was located in a git “source”, you could say:
gem "rspec-core", "2.0.0.beta.6", :git => "git://github.com/rspec/rspec-core.git"
This says: “You’ll find version 2.0.0.beta.6 in git://github.com/rspec/rspec-core.git”.
However, there were a number of issues involving git repositories.
First, if a prerelease gem had dependencies, we’d want to include those dependencies in the dependency graph. However, simply trying to run rake build was a nonstarter, as a huge number of prerelease gems have dependencies in their rake file that are only available to a tool like bundler once the gem is built (a chicken and egg problem). On the flip side, if another gem depended on a gem provided by a git repository, we were asking users to supply the version, an error-prone process since the version could change in the git repository and bundler wouldn’t be the wiser.
To solve this, we asked gem authors to put a .gemspec in the root of their repository, which would allow us to see the dependencies. A lot of people were familiar with this process, since github had used it for a while for automatically generating gems from git repositories.
At first, we assumed (like github did) that we could execute the .gemspec standalone, out of the context of its original repository. This allowed us to avoid cloning the full repository simply to resolve dependencies. However, a number of gems required files that were in the repository (most commonly, they required a version file from the gem itself to avoid duplication), so we modified bundler to do a full checkout of the repository so we could execute the gemspec in its original context.
Next, we found that a number of git repositories (notably, Rails) actually contained a number of gems. To support this, we allowed any number of .gemspec files in a repository, and would evaluate each in the context of its root. This meant that a git repository was more analogous to a gem source (like Rubygems.org) than a single .gem file.
Soon enough, people started complaining that they tried to use prerelease gems like nokogiri from git and bundler wasn’t compiling C extensions. This proved tricky, because the process that Rubygems uses to compile C extensions is more than a few lines, and we wanted to reuse the logic if possible.
In most cases, we were able to solve this problem by having bundler run gem build gem_name.gemspec on the gemspec, and using Rubygems’ native C extension process to compile the gem.
In a related problem, we started receiving reports that bundler couldn’t find rake while trying to compile C extensions. It turns out that Rubygems supports a rake compile mode if you use s.extensions = %w(Rakefile) or something containing mkrf. This essentially means that Rubygems itself has an implicit dependency on Rake. Since we sort the installed gems to make sure that dependencies get installed before the gems that depend on them, we needed to make sure that Rake was installed before any gem.
For git gems, we needed to make sure that Gemfile.lock remembered the exact revision used when bundler took the snapshot. This required some more abstraction, so sources could provide and load in agnostic information that they could use to reinstall everything identically to when bundler took the snapshot.
If a git gem didn’t supply a .gemspec, we needed to create a fake .gemspec that we could use throughout the process, based on the name and version the user specified for the repository. This would allow it to participate in the dependency resolution process, even if the repository itself didn’t provide a .gemspec.
If a repository did provide a .gemspec, and the user supplied a version or version range, we needed to confirm that the version provided matched the version specified in the .gemspec.
We checked out the git repositories into BUNDLE_PATH (again, defaulting to ~/.bundle/#{RUBY_ENGINE}/#{RUBY_VERSION} or $GEM_HOME with rvm) using the --bare option. This allows us to share git repositories like the rails repository, and then make local checkouts of specific revisions, branches or tags as specified by individual Gemfiles.
One final problem, if your Gemfile looks like this:
source "http://rubygems.org" gem "nokogiri" gem "rails", :git => "git://github.com/rails/rails.git", :tag => "v2.3.4"
You do not expect bundler to pull in the version from Rubygems.org, even though it’s newer. Because bundler treats the git repository as a gem source, it initially pulled in the latest version of the gem, regardless of the source. To solve this problem, we added the concept of “pinned dependencies” to the dependency resolver, allowing us to ask it to skip traversing paths that got the rails dependencies from other sources.
Paths
Now that we had git repositories working, it was a hop, skip and jump to support any path. We could use all of the same heuristics as we used for git repositories (including using gem build to install C extensions and having multiple version) on any path in the file system.
With so many sources in the mix, we started seeing cases where people had different gems with the exact same name and version in different sources. Most commonly, people would have created a gem from a local checkout of something (like Rack), and then, when the final version of the gem was released to Rubygems.org, we were still using the version installed locally.
We tried to solve this problem by forcing a lookup in Rubygems.org for a gem, but this contrasted with people who didn’t want to have to hit a remote repository when they had all the gems locally.
When we first started talking to early adopters, they were incredulous that this could happen. “If you do something stupid like that, f*** you”. One by one, those very same people fell victim to the “bug”. Unfortunately, it manifests itself as can't find active_support/core_ext/something_new, which is extremely confusing and can appear to be a generic “bundler bug”. This is especially problematic if the dependencies change in two copies of the gem with identical names and versions.
To solve this problem, we decided that if you had snapshotted the repository via bundle lock and had all of the required gems on your local machine, we would not try to hit a remote. However, if you run bundle install otherwise, we always check to see if there is a newer version in the remote.
In fact, this class of error (two different copies of the gems with the same name and version) has resulting in a fairly intricate prioritization system, which can be different in different scenarios. Unfortunately, the principle of least surprise requires that we tweak these priorities for different scenarios.
While it seems that we could just say “if you rake install a gem you’re on your own”, it’s very common, and people expect things to mostly work even in this scenario. Small tweaks to these priorities have also resulted in small changes in behavior between versions of 0.9 (but only in cases where the exact same name and versioned gems, in different sources, provides different code).
In fact, because of the overall complexity of the problem, and because of different ways that these features can interact, very small tweaks to different parts of the system can result in unexpected changes. We’ve gotten pretty good at seeing the likely outcome of these tweaks, but they can be baffling to users of bundler. A major goal of the lead-in to 1.0 has been to increase determinism, even in cases where we have to arbitrarily pick a “right” answer.
Conclusion
This is just a small smattering of some of the problems we’ve encountered while working on bundler. Because the problem is non-trivial (and parts are np-complete), adding an apparently simple feature can upset the equilibrium of the entire system. More frustratingly, adding features can sometimes change “undefined” behavior that accidentally breaks a working system as a result of an upgrade.
As we head into 0.10 and 1.0, we hope to add some additional features to smooth out the typical workflows, while stabilizing some of the seeming indeterminism in Bundler today. One example is imposing a standard require order for gems in the Gemfile, which is currently “undefined”.
Thanks for listening, and getting to the end of this very long post.
Using .gemspecs as Intended
April 2nd, 2010
When you clone a repository containing a Unix tool (or download a tarball), there’s a standard way to install it. This is expected to work without any other dependencies, on all machines where the tool is supported.
$ autoconf $ ./configure $ make $ sudo make install
This provides a standard way to download, build and install Unix tools. In Ruby, we have a similar (little-known) standard:
$ gem build gem_name.gemspec $ gem install gem_name-version.gem
If you opt-into this convention, not only will it simplify the install process for your users, but it will make it possible for bundler (and other future automated tools) to build and install your gem (including binaries, proper load path handling and compilation of C extensions) from a local path or git repository.
What to Do
Create a .gemspec in the root of your repository and check it in.
Feel free to use dynamic code in here. When your gem is built, Rubygems will run that code and create a static representation. This means it’s fine to pull your gem’s version or other shared details out of your library itself. Do not, however, use other libraries or dependencies.
You can also use Dir[] in your .gemspec to get a list of files (and remove files you don’t want with -; see the example below).
Here’s bundler’s .gemspec:
# -*- encoding: utf-8 -*- lib = File.expand_path('../lib/', __FILE__) $:.unshift lib unless $:.include?(lib) require 'bundler/version' Gem::Specification.new do |s| s.name = "bundler" s.version = Bundler::VERSION s.platform = Gem::Platform::RUBY s.authors = ["Carl Lerche", "Yehuda Katz", "André Arko"] s.email = ["carlhuda@engineyard.com"] s.homepage = "http://github.com/carlhuda/bundler" s.summary = "The best way to manage your application's dependencies" s.description = "Bundler manages an application's dependencies through its entire life, across many machines, systematically and repeatably" s.required_rubygems_version = ">= 1.3.6" s.rubyforge_project = "bundler" s.add_development_dependency "rspec" s.files = Dir.glob("{bin,lib}/**/*") + %w(LICENSE README.md ROADMAP.md CHANGELOG.md) s.executables = ['bundle'] s.require_path = 'lib' end
If you didn’t already know this, the DSL for gem specifications is already pretty clean and straight-forward, there is no need to generate your gemspec using alternative tools.
Your gemspec should run standalone, ideally with no additional dependencies. You can assume its __FILE__ is located in the root of your project.
When it comes time to build your gem, use gem build.
$ gem build bundler.gemspec
This will spit out a .gem file properly named with a static version of the gem specification inside, having resolved things like Dir.glob calls and the version you might have pulled in from your library.
Next, you can push your gem to Rubygems.org quickly and painlessly:
$ gem push bundler-0.9.15.gem
If you’ve already provided credentials, you’ve now published your gem. If not, you will be asked for your credentials (once per machine).
You can easily automate this process using Rake:
$LOAD_PATH.unshift File.expand_path("../lib", __FILE__) require "bundler/version" task :build do system "gem build bundler.gemspec" end task :release => :build do system "gem push bundler-#{Bunder::VERSION}" end
Using tools that are built into Ruby and Rubygems creates a more streamlined, conventional experience for all involved. Instead of trying to figure out what command to run to create a gem, expect to be able to run gem build mygem.gemspec.
A nice side-effect of this is that those who check in valid .gemspec files can take advantage of tools like bundler that allow git repositories to stand in for gems. By using the gem build convention, bundler is able to generate binaries and compile C extensions from local paths or git repositories in a conventional, repeatable way.
Try it. You’ll like it.
Ruby’s Implementation Does Not Define its Semantics
February 25th, 2010
When I was first getting started with Ruby, I heard a lot of talk about blocks, and how you could “cast” them to Procs by using the & operator when calling methods. Last week, in comments about my last post (Ruby is NOT a Callable Oriented Language (It’s Object Oriented)), I heard that claim again.
To be honest, I never really thought that much about it, and the idea of “casting” a block to a Proc never took hold in my mental model, but when discussing my post with a number of people, I realized that a lot of people have this concept in their mental model of Ruby.
It is not part of Ruby’s semantics.
In some cases, Ruby’s internal implementation performs optimizations by eliminating the creation of objects until you specifically ask for them. Those optimizations are completely invisible, and again, not part of Ruby’s semantics.
Let’s look at a few examples.
Blocks vs. Procs
Consider the following scenarios:
def foo yield end def bar(&block) puts block.object_id baz(&block) end def baz(&block) puts block.object_id yield end foo { puts "HELLO" } #=> "HELLO" bar { puts "HELLO" } #=> "2148083200\n2148083200\nHELLO"
Here, I have three methods using blocks in different ways. In the first method (foo), I don’t specify the block at all, yielding to the implicit block. In the second case, I specify a block parameter, print its object_id, and then send it on to the baz method. In the third case, I specify the block, print out its object_id and yield to the block.
In Ruby’s semantics, these three uses are identical. In the first case, yield calls an implicit Proc object. In the second case, it takes an explicit Proc, then sends it on to the next method. In the last case, it takes an explicit Proc, but yields to the implicit copy of the same object.
So what’s the & thing for?
In Ruby, in addition to normal arguments, methods can receive a Proc in a special slot. All methods can receive such an argument, and Procs passed in that slot are silently ignored if not yielded to:
def foo puts "HELLO" end foo { something_crazy } #=> "HELLO"
On the other hand, if you want a method to receive a Proc in that slot, and thus be able to yield to it, you specify that by prefixing it with an &:
def foo yield end my_proc = Proc.new { puts "HELLO" } foo(&my_proc)
Here, you’re telling the foo method that my_proc is not a normal argument; it should be placed into the proc slot and made available to yield.
Additionally, if you want access to the Proc object, you can give it a name:
def foo(&block) puts block.object_id yield end foo { puts "HELLO" } #=> "2148084320\nHELLO"
This simply means that you want access to the implicit Proc in a variable named block.
Because, in most cases, you’re passing in a block (using do/end or {}), and calling it using yield, Ruby provides some syntax sugar to make that simple case more pleasing. That does not, however, mean that there is a special block construct in Ruby’s semantics, nor does it mean that the & is casting the a block to a Proc.
You can tell that blocks are not being semantically wrapped and unwrapped because blocks passed along via & share the same object_id across methods.
Mental Models
For the following code there are two possible mental models.
def foo(&block) puts block.object_id yield end b = Proc.new { puts "OMG" } puts b.object_id foo(&b) #=> 2148084040\n2148084040\nOMG
In the first, the &b unwraps the Proc object, and the &block recasts it into a Proc. However, it somehow also wraps it back into the same wrapper that it came from into the first place.
In the second, the &b puts the b Proc into the block slot in foo‘s argument list, and the &block gives the implicit Proc a name. There is no need to explain why the Proc has the same object_id; it is the same Object!
These two mental models are perfectly valid (the first actually reflects Ruby’s internal implementation). I claim that those who want to use the first mental model have the heavy burden of introducing the new concept to the Ruby language of a non-object block, and that as a result, it should be generally rejected.
Metaclasses
Similarly, in Ruby’s internal implementation, an Object does not get a metaclass until you ask for one.
obj = Object.new # internally, obj does not have a metaclass here obj.to_s # internally, Ruby skips right up to Object when searching for #to_s, since # it knows that no metaclass exists def obj.hello puts "HELLO" end # now, Ruby internally rewrites obj's class pointer to point to a new internal # metaclass which has the hello method on it obj.to_s # Now, Ruby searches obj's metaclass before jumping up to Object obj.to_s # Now, Ruby skips the search because it's already cached the method # lookup
All of the comments in the above snippet are correct, but semantically, none of them are important. In all cases, Ruby is semantically looking for methods in obj‘s metaclass, and when it doesn’t find any, it searches higher up. In order to improve performance, Ruby skips creating a metaclass if no methods or modules are added to an object’s metaclass, but that doesn’t change Ruby’s semantics. It’s just an optimization.
By thinking in terms of Ruby’s implementation, instead of Ruby’s semantics, you are forced to think about a mutable class pointer and consider the possibility that an object has no metaclass.
Mental Models
Again, there are two possible mental models. In the first, Ruby objects have a class pointer, which they manipulate to point to new metaclass objects which are created only when methods or modules are added to an object. Additionally, Ruby objects have a method cache, which they use to store method lookups. When a method is looked up twice, in this mental model, some classes are skipped because Ruby already knows that they don’t have the method.
In the second mental model, all Ruby objects have a metaclass, and method lookup always goes through the metaclass and up the superclass chain until a method is found.
As before, I claim that those who want to impose the first mental model on Ruby programmers have the heavy burden of introducing the new concepts of “class pointer” and “method cache”, which are not Ruby objects and have no visible implications on Ruby semantics.
Regular Expression Matches
In Ruby, certain regular expression operations create implicit local variables that reflect parts of the match:
def foo(str) str =~ /e(.)l/ p $~ p $` p $' p $1 p $2 end foo("hello") #=> #<MatchData "ell" 1:"l">\n"h"\n"o"\n"l"\nnil
This behavior is mostly inherited from Perl, and Matz has said a few times that he would not support Perl’s backrefs if he had it to do over again. However, the provide another opportune example of implicit objects in Ruby.
Mental Models
In this case, there are three possible mental models.
In the first mental model, if you don’t use any $ local variables, they don’t exist. When you use a specific one, it springs into existence. For instance, when using $1, Ruby looks at some internal representation of the last match and retrieves the last capture. If you use $~, Ruby creates a MatchData object out of it.
In the second mental model, when you call a method that uses regular expressions, Ruby walks back up the stack frames, and inserts the $ local variables on it when it finds the original caller. If you later use the variables, they are already there. Ruby must be a little bit clever, because the most recent frame on the stack (which might include C frames, Java frames, or internal Ruby frames in Rubinius) is not always the calling frame.
In the last mental model, when you call a method that uses regular expressions, there is an implicit match object available (similar to the implicit Proc object that is available in methods). The $~ variable is mapped to that implicit object, while the $1 variable is the equivalent of $~[1].
Again, this last mental model introduces the least burdensome ideas into Ruby. The first mental model introduces the idea of an internal representation of the last match, while the second mental model (which again, has the upside of being how most implementations actually do it) introduces the concept of stack frames, which are not Ruby objects.
The last mental model uses an actual Ruby object, and does not introduce new concepts. Again, I prefer it.
Conclusion
In a number of places, it is possible to imbue Ruby semantics with mental models that reflect the actual Ruby implementation, or the fact that it’s possible to imagine that a Ruby object only springs into existence when it is asked for.
However, these mental models require that Ruby programmers add non-objects to the semantics of Ruby, and requiring contortions to explain away Ruby’s own efforts to hide these internals from the higher-level constructs of the language. For instance, while Ruby internally wraps and unwraps Procs when passing them to methods, it makes sure that the Proc object attached to a block is always the same, in an effort to hide the internal details from programmers.
As a result, explaining Ruby’s semantics in terms of these internals requires contortions and new constructs that are not natively part of Ruby’s object model, and those explanations should be avoided.
To be clear, I am not arguing that it’s not useful to understand Ruby’s implementation. It can, in fact, be quite useful, just as it’s useful to understand how C’s calling convention works under the covers. However, just as day-to-day programmers in C don’t need to think about the emitted Assembler, day-to-day Ruby programmers don’t need to think about the implementation. And finally, just as C implementations are free to use different calling conventions without breaking existing processor-agnostic C (or the mental model that C programmers use), Ruby implementations are free to change the internal implementation of these constructs without breaking pure-Ruby code or the mental model Ruby programmers use.
Ruby is NOT a Callable Oriented Language (It’s Object Oriented)
February 21st, 2010
I recently ran across a presentation entitled Python vs. Ruby: A Battle to the Death. I didn’t consider it to be a particularly fair battle, and may well reply in more detail in a later post.
However, what struck me as most worthy of explanation was the presenter’s concern about the fact that Procs are not callable via parens.
x = Proc.new { puts "HELLO" } x() #=> undefined method `x' for #<Object:0x1001bd298>< x.call #=> "HELLO" x[] #=> "HELLO"
For those coming from a callable-oriented language, like Python, this seems horribly inconsistent. Why are methods called with (), while Procs are called with [].
But what’s going on here is that Ruby doesn’t have a notion of “callable”, like Python. Instead, it has a pervasive notion of Object. Here, x is an instance of the Proc class. Both call and [] are methods on the Proc class.
Designing for the Common Case: Calling Methods
Coming from a callable-oriented language, this might seem jarring. But Ruby is designed around Objects and the common cases of working with objects.
Calling methods is far more common than wanting to get an instance of Method, so Ruby optimizes the case of calling methods, with a slightly less elegant form to access an instance:
class Greeter def say puts "Hello world!" end end Greeter.new.say #=> "Hello world!" Greeter.method(:new) #=> #<Method: Class#new> Greeter.new.method(:say) #=> #<Method: Greeter#say> # This is so that you don't have to say: Greeter.new().say()
Ruby considers the common case of calling methods, and optimizes that case while still making the less common case possible.
Designing for the Common Case: How Blocks Are Really Used
One of the reasons that the Proc.new case throws off Rubyists in debates is that Rubyists literally never call Proc objects using [].
In Ruby, Procs are the object passed to methods when using block syntax. Here is how Procs are actually used:
def hello puts "Hello world!" yield puts "Goodbye cruel world!" end hello { puts "I am in the world!" }
When examining languages that support passing anonymous functions to functions (like JavaScript), it turns out that the vast majority of such cases involve a single anonymous function. As a result, Matz (inspired by Smalltalk) built in the idea of a block as a core construct in Ruby. In the vast majority of cases, blocks are created using lightweight syntax ({} or do/end) and called using yield.
In some cases, blocks are passed from one method to the next, before they are finally called using yield:
def step1(&block) puts "Step 1" step2(&block) end def step2 puts "Step 2" yield end step1 { puts "Do the action!" } #=> "Step 1\nStep 2\nDo the action!"
As you can see, Ruby builds in the idea of calling a block into the core language. I searched through Rails (a fairly large codebase) for instances of using [] to call a Proc and while we use blocks extremely commonly, we don’t use [] to call them.
I suspect that the reason this comes up is that people who are used to having to define standalone functions, pass them around, and then call them are looking for the analogous constructs in Ruby, but are missing the different paradigm used by Ruby.
Consistent Method Execution
Fundamentally, the issue here comes down to this:
def foo proc {} end foo()
In Ruby, methods are invoked with our without parentheses. All methods return values, which are always Objects. All Objects have methods. So foo() is a method call that returns a Proc object. It’s extremely consistent, with very few axioms. The fact that the axioms aren’t the same as those in a callable-oriented language doesn’t make them “weird”.
AbstractQueryFactoryFactories and alias_method_chain: The Ruby Way
February 15th, 2010
In the past week, I read a couple of posts that made me really want to respond with a coherent explanation of how I build modular Ruby code.
The first post, by Nick Kallen of Twitter, gushed about the benefits of PerQueryTimingOutQueryFactory and called out Ruby (and a slew of other “hipster” languages) for using language features (like my “favorite” alias_method_chain) and leveraging dynamicism to solve problems that he argues are more appropriately solved with laugh-inducing pattern names:
In a very dynamic language like Ruby, open classes and method aliasing (e.g., alias_method_chain) mitigate this problem, but they don’t solve it. If you manipulate a class to add logging, all instances of that class will have logging; you can’t take a surgical approach and say “just objects instantiated in this context”.
If you haven’t read it yet, you should probably read it now (at least skim it).
As if on cue, a post by Pivot Rob Olson demonstrated the lengths some Rubyists will go to torture alias_method_chain to solve essentially the same problem that Nick addressed.
In short, while I agree in principle with Nick, his examples and the jargon he used demonstrated exactly why so few Rubyists take his point seriously. It is possible to write modular code in Ruby with the same level of flexibility but with far less code and fewer concept hoops to jump through.
Let’s take a look at the problem Rob was trying to solve:
module Teacher def initialize puts "initializing teacher" end end class Person include Teacher def initialize puts "initializing person" end end # Desired output: # > Person.new # initializing teacher # initializing person
This is a classic problem involving modularity. In essence, Rob wants to be able to “decorate” the Person class to include teacher traits.
Nick’s response would have been to create a factory that creates a Person proxy decorated with Teacher properties. And he would have been technically correct, but that description obscures the Ruby implementation, and makes it sound like we need new “Factory” and “Decorator” objects, as we do, in fact, need when programming in Java.
In Ruby, you’d solve this problem thusly:
# The base person implementation. Never instantiate this. # Instead, create a subclass that mixes in appropriate modules. class AbstractPerson def initialize puts "Initializing person" end end # Provide additional "teacher" functionality as a module. This can be # mixed into subclasses of AbstractPerson, giving super access to # methods on AbstractPerson module Teacher def initialize puts "Initializing teacher" super end end # Our actual Person class. Mix in whatever modules you want to # add new functionality. class Person < AbstractPerson include Teacher end # > Person.new # Initializing teacher # Initializing person
Including modules essentially decorates existing classes with additional functionality. You can include multiple modules to layer on existing functionality, but you don’t need to create special factory or decorator objects to make this work.
For those following along, the classes used here are “factories”, and the modules are “decorators”. But just as it’s not useful to constantly think about classes as “structs with function pointers” because that’s historically how they were implemented, I’d argue it’s not useful to constantly think about classes and modules as factories and decorators, simply because they’re analogous to those concepts in languages like Java.
The Case of the PerQueryTimingFactoryFactory
Nick’s example is actually a great example of a case where modularity is important. In this case, he has a base Query class that he wants to extend to add support for timeouts. He wrote his solution in Scala; I’ll transcode it into Ruby.
Feel free to skim the examples that follow. I’m transcoding the Scala into Ruby to demonstrate something which you will be able to understand without fully understanding the examples.
class QueryProxy def initialize(query) @query = query end def select delegate { @query.select { yield } } end def execute delegate { @query.execute } end def cancel @query.cancel end def delegate yield end end
Then, in order to add support for Timeouts, he creates a new subclass of QueryProxy:
class TimingOutQuery < QueryProxy def initialize(query, timeout) @timeout = timeout @query = query end def delegate begin Timeout.timeout(@timeout) do yield end rescue Timeout::Error cancel raise SqlTimeoutException end end end
Next, in order to instantiate a TimingOutQuery, he creates a TimingOutQueryFactory:
class TimingOutQueryFactory def initialize(query_factory, timeout) @query_factory = query_factory @timeout = timeout end def self.call(connection, query, *args) TimingOutQuery.new(@query_factory.call(connection, query, *args), timeout) end end
As his coup de grâce, he shows how, now that everything is so modular, it is trivial to extend this system to support timeouts that were per-query.
class PerQueryTimingOutQueryFactory def initialize(query_factory, timeouts) @query_factory = query_factory @timeouts = timeouts end def self.call(connection, query, *args) TimingOutQuery.new(@query_factory.call(connection, query, *args), @timeouts[query]) end end
This is all true. By using factories and proxies, as you would in Java, this Ruby code is modular. It is possible to create a new kind of QueryFactory trivially.
However, this code, by tacking close to vocabulary created to describe Java patterns, rebuilds functionality that exists natively in Ruby. It would be equivalent to creating a Hash of Procs in Ruby when a Class would do.
The Case: Solved
Ruby natively provides factories, proxies and decorators via language features. In fact, that vocabulary obscures the obvious solution to Nick’s problem.
# No need for a proxy at all, so we skip it module Timeout # super allows us to delegate to the Query this # module is included into, even inside a block def select timeout { super } end def execute timeout { super } end # We get the cancel delegation natively, because # we can use subclasses, rather than separate # proxy object, to implement the proxy private # Since we're not using a proxy, we'll just implement # the timeout method directly, and skip "delegate" def timeout # The Timeout module expects a duration method # which classes that include Timeout should provide Timeout.timeout(duration) do yield end rescue Timeout::Error cancel raise SqlTimeoutException end end # Classes in Ruby serve double duty as "proxies" and # "factories". This behavior is part of Ruby semantics. class TimingOutQuery < Query include Timeout private # implement duration to hardcode the value of 1 def duration 1 end end # Creating a second subclass of Query, this time with # per-query timeout semantics. class PerQueryTimingOutQuery < Query TIMEOUTS = Hash.new(0.5).merge("query1" => 1, "query2" => 3) include Timeout private def duration TIMEOUTS[query] end end
As Nick would point out, what we’re doing here, from a very abstract perspective, isn’t all that different from his example. Our subclasses are proxies, our modules are decorators, and our classes are serving as factories. However, forcing that verbiage on built-in Ruby language features, in my opinion, only serves to complicate matters. More importantly, by starting to think about the problem in terms of the Java-inspired patterns, it’s easy to end up building code that looks more like Nick’s example than my solution above.
For the record, I think that designing modularly is very important, and while Ruby provides built-in support for these modular patterns, we don’t see enough usage of them. However, we should not assume that the reason for the overuse of poor modularity patterns (like alias_method_chain) result from a lack of discussion around proxies, decorators, and factories.
By the way, ActionController in Rails 3 provides an abstract superclass called ActionController::Metal, a series of modules that users can mix in to subclasses however they like, and a pre-built ActionController::Base with all the modules mixed in (to provide the convenient “default” experience). Additionally, users or extensions can easily provide additional modules to mix in to ActionController::Metal subclasses. This is precisely the pattern I am describing here, and I strongly recommend that Rubyists use it more when writing code they wish to be modular.
Postscript: Scala
When researching for this article, I wondered why Nick hadn’t used Ruby’s equivalent to modules (traits) in his examples. It would be possible to write Scala code that was extremely similar to my preferred solution to the problem. I asked both Nick and the guys in #scala. Both said that while traits could solve this problem in Scala, they could not be used flexibly enough at runtime.
In particular, Nick wanted to be able to read the list of “decorators” to use at runtime, and compose something that could create queries with the appropriate elements. According to the guys in #scala, it’s a well-understood issue, and Kevin Wright has a compiler plugin to solve this exact problem.
Finally, the guys there seemed to generally agree with my central thesis: that thinking about problems in terms of patterns originally devised for Java can leave a better, more implementation-appropriate solution sitting on the table, even when the better solution can be thought of in terms of the older pattern (with some contortions).
The Blind Men and the Elephant: A Story of Noobs
February 9th, 2010
If you will indulge me, I’d like to paraphrase a familiar tale:
Once upon a time, deep in the forest, there was a tribe of elephant curators. The elders of this tribe kept sophisticated, detailed notes about the proper care and feeding of elephants, and the villagers tended to follow along.
Eventually, they dedicated a large section of the local library to books and articles on the care and feeding of elephants.
One day, a group of blind nomads appeared in the village. Each of the blind men went to greet the villagers, and were met with welcomes. Wanting to be helpful, they walked over to one of the elephants and tried to learn about it.
The first man, who stood next to the elephant’s tail said, “I feel a snake”.
The second, who stood next to the elephant’s leg said, “I feel a tree trunk”.
And so on.
One of the group of nomads, who thought he felt a snake, went to the elders of the village and asked, “I would like to help. How can I feed this creature?”
The elder replied: “Sir, if you can’t be bothered to search the library for information on the care and feeding of elephants, surely you are wasting our time”.
Years passed and after a grueling series of trials, the blind men became integrated into the culture, becoming some of the most successful at caring for the elephants.
Eventually, another group of blind nomads appeared. The entire village, including the original group, proceeded to berate the new travelers. “We’ve spent quite a bit of time putting together a section of the library about how to care for these animals. You are wasting our time”.
Of course, the travelers did not know the creatures were elephants, and so they entered the library, searching for books on feeding snakes and caring for trees.
Eventually, an elderly blind man, of the original group stood up and said: “Have we forgotten than we, too, started in this confused state. We should help these travelers and perhaps they will become as wise and helpful as we became”.
To many participants in open source communities, this is a familiar tale. When a developer first comes across an open source project, either to use it in a project or to help, he is like a blind man feeling an elephant.
It’s easy to spit out “lmgtfy.com” or RTFM, but in truth, these beginners barely know where to look. All too often, we (open source leaders) assume that if someone couldn’t figure out the right search term on Google, they can never become a viable community member.
When I first started working on Rails, I distinctly remember not knowing what the request method in Rails controllers was. To some degree, this could be attributed to its exclusion in api.rubyonrails.org, but some judicious Googling turned up the ActionController::Request class. Writing this post years later, the request method still does not reside in the API docs, but I found the Request documentation in seconds.
The problem is that a new developer simply has no conceptual model for the problem at all. In most cases, the “noob” can stare at “the f***ing manual” all day and simply fail to find something staring him in the face. Importantly, this does not reflect a failing on the part of the new developer. Virtually everyone I know who worked their way from noob to senior Rails developer starting feeling around the elephant.
As open source leaders, if we are interested in growing our communities, we should treat new developers as confused people with real potential. That’s not to say that sinking dozens of hours down a black hole is a good use of time. On the other hand, the mismatch between how we think about problems once we become experienced and the way we feel around like a blind man when getting started makes the experience of getting started with an open source project far more painful than it needs to be.
Using Bundler in Real Life
February 9th, 2010
A lot of people have asked me what the recommended workflows for bundler are. Turns out, they’re quite simple.
Let’s step through a few use-cases.
You Get a Repo for the First Time
You’ve just checked out a git (or other) repository for an application that uses bundler. Regardless of any other features of bundler in use, just run:
bundle install
This will resolve all dependencies and install the ones that aren’t already installed in system gems or in your system’s bundler cache.
You Update a Repo Using Bundler
If you update a repository using bundler, and it has updated its dependencies in the Gemfile, regardless of any other features in use, just run:
bundle install
As above, this will resolve all dependencies and install any gems that are not already installed.
You have created a new Rails application
If you’ve created a new Rails application, go inside it and run:
bundle install
This will make sure that Rails’ dependencies (such as SQLite3) are available in system gems or your system’s bundler cache. In most cases, this will not need to install anything.
To check whether your system already satisfied the application’s dependencies, run:
bundle check
As you work in the application, you may wish to add dependencies. To do so, update the Gemfile, and run:
bundle install
You are ready to share your new Rails application
Now that you have a working Rails application, and wish to share, you might wish to ensure that your collaborators get the same versions of the gems as you have. For instance, if webrat specified “nokogiri >= 1.4″ as a dependency, you might want to ensure that an update to nokogiri does not change the actual gems that bundle install will install.
To achieve this, run:
bundle lock
This will create a new file called Gemfile.lock in your root directory that contains the dependencies that you specified, as well as the fully resolved dependency graph. Check that file into source control.
When your collaborators receive the repository and run bundle install, bundler will use the resolved dependencies in Gemfile.lock.
You have a locked application, and wish to add a new dependency
If you add a new dependency to your Gemfile in a locked application, Bundler will give you an error if you try to perform any operations.
You will want to run bundle unlock to remove the lock, then bundle install to ensure that the new dependencies are installed on your system, and finally bundle lock again to relock your application to the new dependencies.
We will add a command in a near-future version to perform all these steps for you (something like bundle install –relock).
You want a self-contained application
In many cases, it is desirable to be able to have a self-contained application that you can share with others which contains all of the required gems.
In addition to a general desire to remove a dependency on Gemcutter, you might have dependencies on gems that are not on a publicly accessible gem repository.
To collect up all gems and place them into your app, run:
bundle pack
When running bundle install in the future, Bundler will use packed gems, if available, in preference to gems available in other sources.
Conclusion
I hope these workflows have clarified the intent of Bundler 0.9 (and 1.0). During our work on earlier versions, the lack of these workflows came up again and again as a source of frustration. This was the primary reason for the big changes in 0.9, so I hope you find them useful.
