Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; he spends his daytime hours at the startup he founded, Tilde Inc.. Yehuda is co-author of best-selling jQuery in Action and Rails 3 in Action. He spends most of his time hacking on open source—his main projects, like Thor, Handlebars and Janus—or traveling the world doing evangelism work. He can be found on Twitter as @wycats and on Github.

What’s New in Bundler 1.0.0.rc.1

Taking into consideration the huge amount of feedback we received during the Bundler 0.9 series, we streamlined Bundler 1.0 significantly, and made it fit user expectations better.

Whether you have used bundler before or not, the easiest way to get up to speed is to read the following notes and go to http://gembundler.com/v1.0 for more in-depth information.

(note that gembundler.com is still being updated for the 1.0 changes, and should be ready for the final release).

Starting a new project with bundler

When you generate a new Rails application, Rails will create a Gemfile for you, which has everything needed to boot your application.

Otherwise, you can use bundle init to create a stub Gemfile, ready to go.

First, run bundle install to make sure that you have all the needed dependencies. If you already do, this process will happen instantaneously.

Bundler will automatically create a file called Gemfile.lock. This file is a snapshot of your application’s dependencies at that time.

You SHOULD check both files into version control. This will ensure that all team members (as well as your production server) are working with identical dependencies.

Checking out an existing project using bundler

After checking out an existing project using bundler, check to make sure that the Gemfile.lock snapshot is checked in. If it is not, you may end up using different dependencies than the person who last used and tested the project.

Next, run bundle install. This command will check whether you already have all the required dependencies in your system. If you do not, it will fetch the dependencies and install them.

Updating dependencies

If you modify the dependencies in your Gemfile, first try to run bundle install, as usual. Bundler will attempt to update only the gems you have modified, leaving the rest of the snapshot intact.

This may not be possible, if the changes conflict with other gems in the snapshot (or their dependencies). If this happens, Bundler will instruct you to run bundle update. This will re-resolve all dependencies from scratch.

The bundle update command will update the versions of all gems in your Gemfile, while bundle install will only update the gems that have changed since the last bundle install.

After modifying dependencies, make sure to check in your Gemfile and Gemfile.lock into version control.

By default, gems are installed to your system

If you follow the instructions above, Bundler will install the gems into the same place as gem install.

If necessary, Bundler will prompt you for your sudo password.

You can see the location of a particular gem with bundle show [GEM_NAME]. You can open it in your default editor with bundle open [GEM_NAME].

Bundler will still isolate your application from other gems. Installing your gems into a shared location allows multiple projects to avoid downloading the same gem over and over.

You might want to install your bundled gems to a different location, such as a directory in the application itself. This will ensure that each application has its own copies of the gems, and provides an extra level of isolation.

To do this, run the install command with bundle install /path/to/location. You can use a relative path as well: bundle install vendor.

In RC1, this command will use gems from the system, if they are already there (it only affects new gems). To ensure that all of your gems are located in the path you specified, run bundle install path --disable-shared-gems.

In Bundler 1.0 final, bundle install path will default to --disable-shared-gems.

Deployment

When deploying, we strongly recommend that you isolate your gems into a local path (using bundle install path --disable-shared-gems). The final version of bundler will come with a --production flag, encapsulating all of the best deployment practices.

For now, please follow the following recommendations (described using Capistrano concepts):

  • Make sure to always check in a Gemfile.lock that is up to date. This means that after modifying your Gemfile, you should ALWAYS run bundle install.
  • Symlink the vendor/bundle directory into the application’s shared location (symlink release_path/current/vendor/bundle to release_path/shared/bundled_gems)
  • Install your bundle by running bundle install vendor/bundle --disable-shared-gems

Encodings, Unabridged

I wrote somewhat extensively about the problem of encodings in Ruby 1.9 in general last week.

For those who didn’t read that post, let me start with a quick refresher.

What’s an Encoding?

An encoding specifies how to take a list of characters (such as “hello”) and persist them onto disk as a sequence of bytes. You’re probably familiar with the ASCII encoding, which specifies how to store English characters in a single byte each (taking up the space in 0-127, leaving 128-255 empty).

Another common encoding is ISO-8859-1 (or Latin-1), which uses ASCII’s designation for the first 127 characters, and designates the numbers 128-255 for Latin characters (such as “é” or “ü”).

Obviously, 255 characters isn’t enough for all languages, so there are a number of ISO-8859-* encodings which each designate numbers 128-255 for their own purposes (for instance, ISO-8859-5 uses that space for Russian characters).

Unfortunately, the raw bytes themselves do not contain an “encoding specifier” or any kind, and the exact same bytes can either mean something in Western characters, Russian, Japanese, or any other language, depending on the character set that was originally used to store off the characters as bytes.

As a general rule, protocols (such as HTTP), provide a mechanism for specifying the encoding. For instance, in HTTP, you can specify the encoding in the Content-Type header, like this: Content-Type: text/html; charset=UTF-8. However, this is not a requirement, so it is possible to receive some content over HTTP and not know its encoding.

This brings us to an important point: Strings have no inherent encoding. By default, Strings are just BINARY data. Since the data
could be encoded using any number of different incompatible encodings, simply combining BINARY data from different sources could easily result in a corrupted String.

When you see a diamond with a question mark inside on the web, or gibberish characters (like a weird A with a 3/4 symbol), you’re seeing a mistaken attempt to combine binary data encoded differently into a single String.

What’s Unicode

Unicode is an effort to map every known character (to a point) to a number. Unicode does not define an encoding (how to represent those numbers in bytes). It simply provides a unique number for each known character.

Unicode tries to unify characters from different encodings that represent the same character. For instance, the A in ASCII, the A in
ISO-8859-1, and the A in the Japanese encoding SHIFT-JIS all map to the same Unicode character.

Unicode also takes pains to ensure round-tripping between existing encodings and Unicode. Theoretically, this should mean that it’s possible to take some data
encoded using any known encoding, use Unicode tables to map the characters to Unicode numbers, and then use the reverse versions of those tables to map the Unicode numbers back into the original encoding.

Unfortunately, both of these characteristics cause some problems for Asian character sets. First, there have been some historical errors in the process of
unification, which requires the Unicode committee to properly identify which characters in different existing Chinese, Japanese and Korean (CJK) character sets actually represent the same character.

In Japanese, personal names use slight variants of the non-personal-name version of the same character. This would be equivalent to the difference (in English) between “Cate” and “Kate”. Many of these characters (sometimes called Gaiji) cannot be represented in Unicode at all.

Second, there are still hundreds of characters in some Japanese
encodings (such as the Microsoft encoding to SHIFT-JIS called CP932 or Windows-31J) that simply do not round-trip through Unicode.

To make matters worse, Java and MySQL use a different mapping table than the standard Unicode mapping tables (making “This costs ¥5″ come out in Unicode as “This costs \5″). The standard Unicode mapping tables handle this particular case correctly (but cannot fully solve the round-tripping problem), but these quirks only serve to further raise doubts about Unicode in the minds of Japanese developers.

For a lot more information on these issues, check out the XML Japanese Profile document created by the W3C to explain how to deal with some of these problems in XML documents.

In the Western world, all encodings in use do not have these problems. For instance, it is trivial to take a String encoded as ISO-8859-1, convert it into
Unicode, and then convert it back into ISO-8859-1 when needed.

This means that for most of the Western world, it is a good idea to
use Unicode as the “one true character set” inside programming languages. This means that programmers can treat Strings as simple sequences of Unicode code points (several
code points may add up to a single character, such as the ¨ code point, which can be
applied to other code points to form characters like ü). In the Asian world, while this can sometimes be a good strategy, it is often significantly simpler to use the original encoding and handle merging Strings in different encodings together manually (when an appropriate decision about the tradeoffs around fidelity can be made).

Before I continue, I would note that the above is a vast simplification of the
issues surrounding Unicode and Japanese. I believe it to be a fair characterization,
but die-hard Unicode folks, and die-hard anti-Unicode folks would possibly disagree
with some elements of it. If I have made any factual errors, please let me know.

A Digression: UTF-*

Until now, I have talked only about “Unicode”, which simply maps code points
to numbers. Because Unicode uses counting numbers, it can accommodate as many
code points as it wants.

However, it is not an encoding. In other words, it does not specify how to
store the numbers on disk. The most obvious solution would be to use a few
bytes for each character. This is the solution that UTF-32 uses, specifying
that each Unicode character be stored as four bytes (accommodating over 4 billion characters). While this has the advantage of being simple, it also uses huge amounts of memory and disk space compared to the original encodings (like ASCII, ISO-8859-1 and SHIFT-JIS) that it is replacing.

On the other side of the spectrum is UTF-8. UTF-8 uses a single byte for English characters, using the exact same mapping as ASCII. This means that a UTF-8 string that contains only characters found in ASCII will have the identical bytes as a String stored in ASCII.

It then uses the high bit (the bytes representing 128-255) to specify a series of escape characters that can specify a multibyte character. This means that Strings using Western characters use relatively few bytes (often comparable with the original encodings Unicode replaces), because they are in the low area of the Unicode space, while the large number of characters in the Asian languages use more bytes than their native encodings, because they use characters with larger Unicode numbers.

This is another reason some Asian developers resent Unicode; while it does not significantly increase the memory requirements for most Western documents, it does so for Asian documents.

For the curious, UTF-16 uses 16-bits for the most common characters (the BMP, or basic multilingual plane), and 32-bits to represent characters from planes 1 through 16. This means that UTF-8 is most efficient for Strings containing mostly ASCII characters. UTF-8 and UTF-16 are approximately equivalent for Strings containing mostly characters outside ASCII but inside the the BMP. For Strings containing mostly characters outside the BMP, UTF-8, UTF-16, and UTF-32 are approximately equivalent. Note that when I say “approximately equivalent”, I’m not saying that they’re exactly the same, just that the differences are small in large Strings.

Of the Unicode encodings, only UTF-8 is compatible with ASCII. By this I mean that if a String is valid ASCII, it is also valid UTF-8. UTF-16 and UTF-32 encode ASCII characters using two or four bytes.

What Ruby 1.9 Does

Accepting that there are two very different ways of handling this problem, Ruby 1.9 has a String API that is somewhat different from most other languages, mostly influenced by the issues I described above in dealing with Japanese in Unicode.

First, Ruby does not mandate that all Strings be stored in a single internal encoding. Unfortunately, this is not possible to do reliably with common Japanese encodings (CP932, aka Windows-31J has 300 characters than cannot round-trip through Unicode without corrupting data). It is possible that the Unicode committee will some day fully solve these problems to everyone’s satisfaction, but that day has not yet come.

Instead, Ruby 1.9 stores Strings as the original sequence of bytes, but allows a String to be tagged with its encoding. It then provides a rich API for converting Strings from one encoding to another.

string = "hello"                     # by default, string is encoded as "ASCII"
string.force_encoding("ISO-8859-1")  # this simply retags the String as ISO-8859-1
                                     # this will work since ISO-8859-1
                                     # is a superset of ASCII.
 
string.encode("UTF-8")               # this will ask Ruby to convert the bytes in
                                     # the current encoding to bytes in
                                     # the target encoding, and retag it with the
                                     # new encoding
                                     #
                                     # this is usually a lossless conversion, but
                                     # can sometimes be lossy

A more advanced example:

# encoding: UTF-8
 
# first, tell Ruby that our editor saved the file using the UTF-8 encoding.
# TextMate does this by default. If you lie to Ruby, very strange things
# will happen
 
utf8 = "hellö"
iso_8859_1 = "hellö".encode("ISO-8859-1")
 
# Because we specified an encoding for this file, Strings in here default
# to UTF-8 rather than ASCII. Note that if you didn't specify an encoding
# characters outside of ASCII will be rejected by the parser.
 
utf8 << iso_8859_1
 
# This produces an error, because Ruby does not automatically try to
# transcode Strings from one encoding into another. In practice, this
# should rarely, if ever happen in applications that can rely on
# Unicode; you'll see why shortly
 
utf8 << iso_8859_1.encode("UTF-8")
 
# This works fine, because you first made the two encodings the same

The problems people are really having

The problem of dealing with ISO-8859-1 encoded text and UTF-8 text in the same Ruby is real, and we’ll see soon how it is handled in Ruby. However, the problems people have been having are not of this variety.

If you examine virtually all of the bug reports involving incompatible encoding exceptions, you will find that one of the two encodings is ASCII-8BIT. In Ruby, ASCII-8BIT is the name of the BINARY encoding.

So what is happening is that a library somewhere in the stack is handing back raw bytes rather than encoded bytes. For a long time, the likely perpetrator here was database drivers, which had not been updated to properly encode the data they were getting back from the database.

There are several other potential sources of binary data, which we will discuss in due course. However, it’s important to note that a BINARY encoded String in Ruby 1.9 is the equivalent of a byte[] in Java. It is a type that cannot be reasonably concatenated onto an encoded String. In fact, it is best to think of BINARY encoded Strings as a different class with many methods in common.

In practice, as Ruby libraries continue to be updated, you should rarely ever see BINARY data inside of your application. If you do, it is because the library that handed it to you genuinely does not know the encoding, and if you want to combine it with non-BINARY String, you will need to convert it into an encoded String manually (using force_encoding).

Why this is, in practice, a rare problem

The problem of incompatible encodings is likely to happen in Western applications only when combining ISO-8859-* data with Unicode data.

In practice, most sources of data, without any further work, are already encoded as UTF-8. For instance, the default Rails MySQL connection specifies a UTF-8 client encoding, so even an ISO-8859-1 database will return UTF-8 data.

Many other data sources, such as MongoDB, only support UTF-8 data internally, so their Ruby 1.9-compatible drivers already return UTF-8 encoded data.

Your text editor (TextMate) likely defaults to saving your templates as UTF-8, so the characters in the templates are already encoded in UTF-8.

This is why Ruby 1.8 had the illusion of working. With the exception of some (unfortunately somewhat common) edge-cases, most of your data is already encoded in UTF-8, so simply treating it as BINARY data, and smashing it all together (as Ruby 1.8 does) works fairly reliably.

The only reason why this came tumbling down in Ruby 1.9 is that drivers that should have returned Strings tagged with UTF-8 were returning Strings tagged with BINARY, which Ruby rightly refused to concatenate with UTF-8 Strings. In other words, the vast majority of encoding problems to date are the result of buggy Ruby libraries.

Those libraries, almost entirely, have now been updated. This means that if you use UTF-8 data sources, which you were likely doing by accident already, everything will continue to work as it did in Ruby 1.8.

Digression: force_encoding

When people encounter this problem for the first time, they are often instructed by otherwise well-meaning people to simply call force_encoding("UTF-8") on the offending String.

This will work reliably if the original data is stored in UTF-8, which is often true about the person who made the original suggestion. However, it will mysteriously fail to work (resulting in “�” characters appearing) if the original data is encoded in ISO-8859-1. This can cause major confusion because some people swear up and down that it’s working and others can clearly see that it’s not.

Additionally, since ISO-8859-1 and UTF-8 are both compatible with ASCII, if the characters being force_encoded are ASCII characters, everything will appear to work until a non-ASCII character is entered one day. This further complicates efforts of members of the community to identify and help resolve issues if they are not fluent in the general issues surrounding encodings.

I’d note that this particular issue (BINARY data entering the system that is actually ISO-8859-1) would cause similar problems in Java and Python, which would either silently assume Unicode, or present a byte[], forcing you to force_encoding it into something like UTF-8.

Where it doesn’t work

Unfortunately, there are a few sources of data that are common in Rails applications that are not already encoded in UTF-8.

In order to identify these cases, we will need to identify the boundary between a Rails application and the outside world. Let’s look at a common web request.

First, the user goes to a URL. That URL is probably encoded in ASCII, but can also contain Unicode characters. The encoding for this part of the request (the URI) is not provided by the browser, but it appears safe to assume that it’s UTF-8 (which is a superset of ASCII). I have tested in various versions of Firefox, Safari, Chrome, and Internet Explorer and it seems reliable. I personally thank the Encoding gods for that.

Next, the request goes through the Rack stack, and makes its way into the Rails application. If all has gone well, the Rails application will see the parameters and other bits of the URI exposed through the request object encoded as UTF-8. At the moment (and after this post, it will probably be true for mere days), Rack actually returns BINARY Strings for these elements.

At the moment, Ruby allows BINARY Strings that contain only ASCII characters to be concatenated with any ASCII-compatible encoding (such as UTF-8). I believe this is a mistake, because it will make scenarios such as the current state of Rack work in all tested cases, and then mysteriously cause errors when the user enters a UTF-8 character in the URI. I have already reported this issue and it should be fixed in Ruby. Thankfully, this issue only relates to libraries that are mistakenly returning
BINARY data, so we can cut this off at the pass by fixing Rack to return UTF-8 data here.

Next, that data will be used to make a request of the data store. Because we are likely using a UTF-8 encoded data-store, once the Rack issue is resolved, the request will go through without incident. If we were using an ISO-8859-1 data store (possible, but unlikely), this could pose issues. For instance, we could be looking up a story by a non-ASCII identifier that the database would not find because the request String is encoded in UTF-8.

Next, the data store returns the contents. Again, you are likely using using a UTF-8 data store (things like CouchDB and MongoDB return Strings as UTF-8). Your template is likely encoded in UTF-8 (and Rails actually makes the assumption that templates without any encoding specified are UTF-8), so the String from your database should merge with your template without incident.

However, there is another potential problem here. If your data source does not return UTF-8 data, Ruby will refuse to concatenate the Strings, giving you an incompatible encoding error (which will report UTF-8 as incompatible with, for instance, ISO-8859-1). In all of the encoding-related bug reports I’ve seen, I’ve only ever seen reports of BINARY data causing this problem, again, likely because your data source actually is UTF-8.

Next, you send the data back to the browser. Rails defaults to specifying a UTF-8 character set, so the browser should correctly interpret the String, if it got this far. Note that in Ruby 1.8, if you had received data as ISO-8859-1 and stored it in an ISO-8859-1 database, your users would now see “�”, because the browser cannot identify a valid Unicode character for the bytes that came back from the database.

In Ruby 1.9, this scenario (but not the much more common scenario where the database returns content as UTF-8, which is common because Rails specifies a UTF-8 client encoding in the default database.yml), you would receive an error rather than sending corrupted data to the client.

If your page included a form, we now have another potential avenue for problems. This is especially insidious because browsers allow the user to change the “document’s character set”, and users routinely fiddle with that setting to “fix” pages that are actually encoded in ISO-8859-1, but are specifying UTF-8 as the character set.

Unfortunately, while browsers generally use the document’s character set for POSTed form data, this is both not reliable and possible for the user to manually change. To add insult to injury, the browsers with the largest problems in this area do not send a Content-Type header with the correct charset to let the server know the character set of the POSTed data.

Newer standards specify an attribute accept-charset that page authors can add to forms to tell the client what character set to send the POSTed data as, but again, the browsers with the largest issues here are also the ones with issues in implementing accept-charset properly.

The most common scenario where you can see this issue is when the user pastes in content from Microsoft Word, and it makes it into the database and back out again as gibberish.

After a lot of research, I have discovered several hacks that, together, should completely solve this problem. I am still testing the solution, but I believe we should be able to completely solve this problem in Rails. By Rails 3.0 final, Rails application should be able to reliably assume that POSTed form data comes in as UTF-8.

Moving that data to the server presents another potential encoding problem, but again, if we can rely on the database to be using UTF-8 as the client (or internal) encoding, and the solution for POSTed form data pans out, the data should smoothly get into the database as UTF-8.

But what if we still do have non-UTF-8 data

Even with all of this, it is still possible that some non-BINARY data sneaks over the boundary and into our Rails application from a non-UTF-8 source.

For this scenario, Ruby 1.9 provides an option called Encoding.default_internal, which allows the user to specify an preferred encoding for Strings. Ruby itself and Ruby’s standard libraries respect this option, so even if, for instance, it opens some IO encoded in ISO-8859-1, it will give the data to the Ruby program transcoded to the preferred encoding.

Libraries, such as database drivers, should also support this option, which means that even if the database is somehow set up to receive UTF-8 String, the driver should convert those String transparently to the preferred encoding before handing it to the program.

Rails can take advantage of this by setting the default_internal to UTF-8, which will then ensure that String from non-UTF-8 sources still make their way into Rails encoded as UTF-8.

Since I started asking libraries to honor this option a week ago, do_sqlite, do_mysql, do_postgres, Nokogiri, Psych (the new YAML parser in Ruby 1.9), sqlite3, and the MongoDB driver have all added support for this option. The fix should be applied to the MySQL driver shortly, and I am still waiting on a response from the pg driver maintainer.

In short, by the time 1.9.2-final ships, I don’t see any reason why all libraries in use don’t honor this setting.

I’d also add that MongoDB and Nokogiri already return only UTF-8 data, so supporting this option was primarily a matter of correctness. If a driver already deals entirely in UTF-8, it will work transparently with Rails because Rails deals only in UTF-8.

That said, we plan to robustly be able to support scenarios where UTF-8 cannot be used in this way (because encoding are in use that cannot be transparently encoded at the boundary without data loss), so proper support for default_internal will be essential in the long-term.

TL;DR

The vast majority of encoding bugs to date have resulted from outdated drivers that returned BINARY data instead of Strings with proper encoding tags.

The pipeline that brings Strings in and out of Rails is reasonably well-understood, and simply by using UTF-8 libraries for each part of that pipeline, Ruby 1.9 will transparently work.

If you accidentally use non-UTF-8 sources in the pipeline, Ruby 1.9 will throw an error, an improvement over the Ruby 1.8 behavior of simply sending corrupted data to the client.

For this scenario, Ruby 1.9 allows you to specify a preferred encoding, which instructs the non-UTF-8 source to convert Strings in other encodings to UTF-8.

By default, Rails will set this option to UTF-8, which means that you should not see ISO-8859-1 Strings in your Rails application.

By the time Ruby 1.9 is released in a few months, this should be a reality, and your experience dealing with Ruby 1.9 String should be superior to the 1.8 experience, because it should generally work, but libraries will have properly considered encoding issues. This means that serving misencoded data should be basically impossible.

TL;DR the TL;DR

When using Rails 3.0 with Ruby 1.9.2-final, you will generally not have to care about encodings.

Postscript

With all that said, there can be scenarios where you receive BINARY data from a source. This can happen in any language that handles encodings more transparently than Ruby, such as Java and Python.

This is because it is possible for a library to receive BINARY data and not have the necessary metadata to tag it with an encoding.

In this case, you will either need to determine the encoding yourself or treat it as raw BINARY data, and not a String. The reason this scenario is rare is that if there is a way that you can determine the encoding (such as by looking at provided with the bytes), the original library can do the same.

If you get into a scenario where you know the encoding, but it is not machine available, you will want to do something like:

data = API.get("data")
 
data.encoding #=> ASCII-8BIT # alias for BINARY
 
data.force_encoding("SHIFT-JIS").encode!
 
# This first tags the data with the encoding that
# you know it is, and then re-encodes it to
# the default_internal encoding, if one was
# specified

My Common Git Workflow

A recent post that was highly ranked on Hacker News complained about common git workflows causing him serious pain. While I won’t get into the merit of his user experience complaints, I do want to talk about his specific use-case and how I personally work with it in git.

Best I can tell, Mike Taylor (the guy in the post) either tried to figure out a standard git workflow on his own, or he followed poor instructions that tried to bootstrap someone from an svn background while intentionally leaving out important information. In any event, I’ll step through my personal workflow for his scenario, contrasting with subversion as I go.

Cloning the Repository

The very first step when working with a repository is to clone it. In subversion, this is accomplished via svn checkout svn://url/to/repo/trunk. This retrieves the most recent revision of the trunk branch of the repository.

In git, this is accomplished via git clone git://url/to/repo (the http protocol is also possible). This retrieves the entire repository, including other branches and tags.

Making the Change

In both git and subversion, you make the change using a normal text editor.

After Making the Change

In git, you make a local commit, marking the difference between the most recent pulled version (master) and the changes you made. In subversion, the normal workflow does not involve making a change, but apparently some people make manual diffs in order to have a local copy of the changes before updating from the remote. Here’s an example comment from the Hacker News post:

I’ll tell you what happens when I use svn and there’s been an upstream change: I never update my local tree with local modifications. Instead, I extract all my local changes into a diff, then I update my local tree, and then I merge my diff back into the updated tree and commit.

When I need three-way merging, which isn’t often – usually patch can resync simple things like line offsets – it’s handled by a file comparison tool. I have a simple script which handles this

My personal process for making the commit in git almost always involves the gitx GUI, which lets me see the changes for each individual file, select the files (or chunks in the files) to commit, and then commit the whole thing. I sometimes break up the changes into several granular commits, if appropriate.

Updating from the remote

Now that we have our local changes, the next step is to update from the remote. In subversion, you would run svn up. Here, subversion will apply a merge strategy to attempt to merge the remote changes with the local ones that you made. If a merge was unsuccessful, subversion will tell you that a conflict has occurred. If you did not manually save off a diff file, there is no way to get back to the status from before you made the change.

In git, you would run git pull. By default, git applies the “recursive” strategy, which tries to merge your current files with the remote files at the most recent revision. As with subversion, this can result in a conflict. You can also pass the --rebase flag to pull, which is how I usually work. This tells git to stash away your commits, pull the remote changes, and then reapply your changes on top one at a time.

If you use --rebase, you may get a conflict for each of your local commits, which is usually easier to handle than a bunch of conflicts all at once.

I definitely recommend using --rebase which also provides instructions for dealing with conflicts as they arise.

In either case, in my experience, git’s merging capabilities are more advanced than subversion’s. This will result in many fewer cases where conflicts occur.

Resolving Conflicts

From here on, I am assuming you followed my advice and used git pull --rebase.

If a conflict has occurred, you will find that if you run git status, all of the non-conflicting files are already listed as “staged”, while the conflicting files are listed outside the staging area. This means that the non-conflicting files are already considered “added” to the current commit.

To resolve the conflicts, fix up the files listed outside the staging area and git add them. Again, I normally use gitx to move the resolved files into the staging area.

Once you have resolved the conflict, run git rebase --continue. This tells git to use the fixed up changes you just made instead of the original commit it was trying to put on top of the changes you got from the remote.

In subversion, if you got a conflict, subversion will create three files for you: file.mine, file.rOLD, and file.rNEW. You are responsible for fixing up the conflicts and getting back a working file. Once you are done, you run svn resolved.

NOTE: If you had not used git pull --rebase, but instead did raw git pull, you would fix up the files, add the files using git add or gitx, and the run git commit to seal the deal

Yikes! Something went wrong!

In git, if something goes wrong, you just run git reset --hard, which will bring you back to your last local commit.

In subversion, it’s not always possible unless you manually stored off a diff before you started.

Pushing

Now that you’re in sync with the remote server, you push your changes. In git, you run git push. In subversion, you run svn commit.

One Glossed-Over Difference

Subversion allows you to commit changes even if you haven’t svn uped and there have been changes to the remote, as long as there are no conflicts between your local files and the remote files.

Git never allows you to push changes to the remote if there have been remote changes. I personally prefer the git behavior, but I could see why someone might prefer the subversion behavior. However, I glossed over this difference because every subversion reference I’ve found advises running svn up before a commit, and I personally always did that in my years using subversion.

Comparison

Here’s a workflow comparison between git and subversion:

Operation git svn
Clone a repository git clone git://github.com/rails/rails.git svn checkout http://dev.rubyonrails.org/svn/rails/trunk
Preparing changes git commit (using gitx) nothing or create a manual diff
Update from the remote git pull --rebase svn up
Resolving conflicts git add then git rebase --continue svn resolve
Resolving conflicts without –rebase git add then git commit N/A
Yikes! Rolling back git reset –hard svn up -rOLD then apply diff (only if you manually made a diff first)
Pushing git push svn commit

Note that I am not attempting to provide an exhaustive guide to git here; there are many more git features that are quite useful. Additionally, I personally do a lot of local branching, and prefer to be able to think about git in terms of cheap branches, but the original poster explicitly said that he’d rather not. As a result, I didn’t address that here.

I also don’t believe that thinking of git in terms of subversion is a good idea. That said, the point of this post (and the point of the original poster) is that there are a set of high-level version control operations that you’d expect git to be able to handle in simple cases without a lot of fuss.

The How and Why of Bundler Groups

Since version 0.9, Bundler has had a feature called “groups”. The purpose of this feature is to allow you to specify groups of dependencies which may be used in certain situations, but not in others.

For instance, you may use ActiveMerchant only in production. In this case, you could say:

group :production do
  gem "activemerchant"
end

Specifying groups allows you to do two things. First, you can install the gems in your Gemfile, minus specific groups. For instance, Rails puts mysql and pg in a database group so that if you’re just working on ActionPack, you can bundle install --without db and run the ActionPack tests without having to worry about getting the gems installed.

Second, you can list specific groups to autorequire using Bundler.require. By default, Bundler.require requires all the gems in the default group (which is all the gems that have no explicit group). You can also say Bundler.require(:default, :another_group) to require specific groups.

Note the difference between these operations: bundle install is opt-out, while Bundler.require is opt-in. This is because the common usage of groups is to specify gems for different environments (such as development, test and production) and you shouldn’t need to specify that you want the “development” and “test” gems just to get up and running. On the other hand, you don’t want your test dependencies loaded in development or production.

It is also worth noting that all gems that you installed (i.e. not the ones that you excluded at install time with --without) will be available to require. This has no effect unless you actually require them. This means that in development mode, if you explicitly require rspec, it will work.

Rails 3 defaults to mapping groups to environment names, and explicitly autorequiring the implicit default group and the group named the same as the current environment. For example, in development mode, Rails will require the default group and the development group. The code that does this is in your application.rb:

Bundler.require(:default, Rails.env) if defined?(Bundler)

Consistency

In order to ensure consistency across all environments, bundler resolves the dependencies of your application using the gems listed in all groups, even if you specify --without. This means that while you can skip installing the gems listed in the production group by saying --without production, bundler will still download and examine the gems in order to properly resolve all dependencies.

As a result, the dependencies you install in development mode and test with will be compatible with the gems in other environments. In essence, this policy ensures that if your tests pass and run in development, your app will not fail to run in production because the dependencies resolved differently.

Multiple Inconsistent Configurations

Sometimes, especially when developing gems for wider use, you want to test your code against multiple incompatible configurations. At first glance, you might think that you could use groups for this case, but as described above, groups are designed for cases where all of the gems are compatible, but you don’t always want to have to install them in all situations.

Instead, use multiple Gemfiles, one for each incompatible configuration. When installing, do bundle install --gemfile Gemfile.rails2. This will tell Bundler to use Gemfile.rails2 rather than the default Gemfile. As in all cases in Bundler, you can also specify this option globally with an environment variable (BUNDLE_GEMFILE).

Ruby 1.9 Encodings: A Primer and the Solution for Rails

UPDATE: The DataObjects drivers, which are used in DataMapper, are now updated to honor default_internal. Let’s keep this moving.

Since Ruby 1.9 announced support for encodings, there has been a flurry of activity to make existing libraries encoding aware, and a tornado of confusion as users of Ruby and Rails have tried to make sense of it.

In this post, I will lay out the most common problems people have had, and what we can do as a community to put these issues to bed in time for Ruby 1.9.2 final.

A Quick Tour

I’m going to simplify some of this, but the broad strokes are essentially correct.

Before we begin, many of you are probably wonder what exactly an “encoding” is. For me, getting a handle on this was an important part of helping me understand the possible solution space.

On disk, all Strings are stored as a sequence of bytes. An encoding simply specifies how to take those bytes and convert them into “codepoints”. In some languages, such as English, a “codepoint” is exactly equivalent to “a character”. In most other languages, there is not a one-to-one correspondence. For example, a German codepoint might specify that the next codepoint should get an ümlaut.

The list of English characters represented by the first seven bits of ASCII (characters 0 through 127 in “ASCII-7″) have the same representation in many (but not all) encodings. This means that if you only use English characters, the on-disk representation of the characters will often be exactly the same regardless of the source encoding.

However, once you start to use other characters, the bytes on disk mean different things in different encodings. Have you ever seen a page on the Internet filled with something like “Führer”? That is the consequence of the bytes of “Führer” stored as UTF-8 being interpreted as Latin-1.

You can trivially see this problem using Ruby 1.9′s encoding support by running this program:

# encoding: UTF-8
 
puts "hello ümlaut".force_encoding("ISO-8859-1").encode("UTF-8")
 
# Output
# hello ümlat

First, we create a String (“hello ümlaut”) in the UTF-8 encoding. Next, we tell Ruby that the String is actually Latin-1. It’s not, so an attempt to read the characters will interpret the raw bytes of the “ü” as though they were Latin-1 bytes. We ask Ruby to give us that interpretation of the data in UTF-8 via encode and print it out.

We can see that while the bytes for “hello ” and “mlat” were identical in both UTF-8 and Latin-1, the bytes for “ü” in UTF-8 mean “ü” in Latin-1.

Note that while force_encoding simply tags the String with a different encoding, encode converts the bytes of one encoding into the equivalent bytes of the second. As a result, while force_encoding should almost never be used unless you know for sure that the bytes actually represent the characters you want in the target encoding, encode is relatively safe to use to convert a String into the encoding you want.

You’ve probably also seen the reverse problem, where bytes encoded in Latin-1 ended up inside a page encoded in UTF-8.

# encoding: ISO-8859-1
 
puts "hello ümlaut".force_encoding("UTF-8")
 
# Output
# hello ?mlat

Here, the sequence of bytes that represents an “ü” in Latin-1 could not be recognized in UTF-8, so they were replaced with a “?”. Note that puts will always simply write out the bytes to your terminal, and the terminal’s encoding will determine how they are interpreted. The examples in this post are all outputted to a terminal using UTF-8 encoding.

As you can imagine, this presents quite the issue when concatenating two Strings of different encodings. Simply smashing together the raw bytes of the two Strings can result in output that is incomprehensible in either encoding. To make matters worse, it’s not always possible to represent all of the characters in one encoding in another. For instance, the characters of the Emoji encoding cannot be represented in the ISO-8859-1 encoding (or even in a standardized way onto the UTF-8 encoding).

As a result, when you attempt to concatenate two Strings of different encodings in Ruby 1.9, Ruby displays an error.

# encoding: UTF-8
 
puts "hello ümlaut".encode("ISO-8859-1") + "hello ümlaut"
 
# Output
# incompatible character encodings: ISO-8859-1 and UTF-8 (Encoding::CompatibilityError)

Because it’s extremely tricky for Ruby to be sure that it can make a lossless conversion from one encoding to another (Ruby supports almost 100 different encodings), the Ruby core team has decided to raise an exception if two Strings in different encodings are concatenated together.

There is one exception to this rule. If the bytes in one of the two Strings are all under 127 (and therefore valid characters in ASCII-7), and both encodings are compatible with ASCII-7 (meaning that the bytes of ASCII-7 represent exactly the same characters in the other encoding), Ruby will make the conversion without complaining.

# encoding: UTF-8
 
puts "hello umlat".encode("ISO-8859-1") + "hello ümlaut"
 
# Output
# hello umlathello ümlaut

Since Ruby does not allow characters outside of the ASCII-7 range in source files without a declared encoding, this exception eliminates a large number of potential problems that Ruby’s strict concatenation rules might have introduced.

Binary Strings

By default, Strings with no encoding in Ruby are tagged with the ASCII-8BIT encoding, which is an alias for BINARY. Essentially, this is an encoding that simply means “raw bytes here”.

In general, code in Rails applications should not encounter BINARY strings, except for Strings created in source files without encodings. However, since these Strings will virtually always fall under the ASCII-7 exception, Ruby programmers should never have to deal with incompatible encoding exceptions where one of the two encodings is ASCII-8BIT (i.e. BINARY).

That said, almost all of the encoding problems reported by users in the Rails bug tracker involved ASCII-8BIT Strings. How did this happen?

There are two reasons for this.

The first reason is that early on, database drivers generally didn’t properly tag Strings they retrieved from the database with the proper encoding. This involves a manual mapping from the database’s encoding names to Ruby’s encoding names. As a result, it was extremely common from database drivers to return Strings with characters outside of the ASCII-7 range (because the original content was encoded in the database as UTF-8 or ISO-8859-1/Latin-1).

When attempting to concatenate that content onto another UTF-8 string (such as the buffer in an ERB template), Ruby would raise an incompatible encoding exception.

# encoding: UTF-8
 
puts "hello ümlaut" + "hello ümlaut".force_encoding("BINARY")
 
# Output
# incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

This is essentially identical to the scenario many people encountered. A UTF-8 String was presented to Ruby as a BINARY String, since the database driver didn’t tag it. When attempting to concatenate it onto UTF-8, Ruby had no way to do so reliably, so it raised an exception.

One reason that many people didn’t encounter this problem was that either the contents of the template or the text from the database were entirely in the ASCII-7 subset of their character set. As a result, Ruby would not complain. This is deceptive, because if they made a small change to their template, or if a user happened to enter non-ASCII-7 data (for instance, they got their first user named José), they would suddenly start seeing an incompatible encoding exception.

When people see this incompatible encoding exception, one common reaction is to call force_encoding("UTF-8") on the BINARY data. This will work great for Strings whose bytes actually are encoded in UTF-8. However, if people whose Strings were encoded in ISO-8859-5 (Russian) followed this instruction, they would end up with scrambled output.

Additionally, it’s impossible to simply encode the data, since Ruby doesn’t actually know the source encoding. In essence, a crucial piece of information has been lost at the database driver level.

Unfortunately, this means that well-meaning people who have solved their problem by force_encoding their Strings to UTF-8 (because the bytes actually did represent UTF-8 characters) become baffled when their solution doesn’t work for someone working on a Russian website.

Thankfully, this situation is now mostly solved. There are updates for all database drivers that map the encodings from the database to a Ruby encoding, which means that UTF-8 text from the database will be UTF-8 Strings in Ruby, and Latin-1 text from the database will be ISO-8859-1 Strings in Ruby.

Unfortunately, there is a second large source of BINARY Strings in Ruby. Specifically, data received from the web in the form of URL encoded POST bodies often do not specify the content-type of the content sent from forms.

In many cases, browsers send POST bodies in the encoding of the original document, but not always. In addition, some browsers say that they’re sending content as ISO-8859-1 but actually send it in Windows-1251. There is a long thread on the Rack tracker about this, but the bottom line is that it’s extremely difficult to determine the encoding of a POST body sent from the web.

As a result, Rack handlers send the raw bytes through as BINARY (which is reasonable, since handlers shouldn’t be in the business of trying to wade through this problem) and no middleware exists (yet) to properly tag the String with the correct encoding.

This means that if the stars align, the raw bytes are UTF-8, end up in a UTF-8 database, and end up coming back out again tagged as UTF-8. If the stars do not align, the text might actually be encoded in ISO-8859-1, get put into a UTF-8 database, and come out tagged as UTF-8 (and we know what happens when ISO-8859-1 data is mistakenly tagged as UTF-8).

In this case, because the ISO-8859-1 data is improperly tagged as UTF-8, Ruby happily concatenates it with other UTF-8 Strings, and hilarity ensues.

Because English characters have the same byte representation in all commonly used encodings, this problem is not as common as you might imagine. Unfortunately, this simply means that people who do encounter it are baffled and find it hard to get help. Additionally, this problem doesn’t manifest itself as a hard error. it can go unnoticed and dismissed as a minor annoyance if the number of non-ASCII-7 characters are low.

In order to properly solve this problem for Ruby 1.9, we need a very good heuristic for properly determining the encoding of web-sent POST bodies. There are some promising avenues that will get it right 99.9% of the time, and we need to package them into up a middleware that will tag Strings correctly.

Incompatible Encodings

If you’ve been paying attention, you’ve probably noticed that while the database drivers have solved one problem, they actually introduced another one.

Imagine that you’re using a MySQL database encoded in ISO-8859-1 (or ISO-8859-5, popular for Russian applications, or any other non-UTF-8 encoding). Now that the String coming back from the database is properly tagged as ISO-8859-1, Ruby will refuse to concatenate it onto the ERB buffer (which is encoded in UTF-8). Even if we solved this problem for ERB, it could be trivially reintroduced in other parts of the application through regular concatenation (+, concat, or even String interpolation).

Again, this problem is somewhat mitigated due to the ASCII-7 subset exception, which means that as long as one of the two incompatible Strings uses only English characters, users won’t see any problems. Again, because this “solution” means that the Ruby developer in question still may not understand encodings, this simply defers the problem to some uncertain point in the future when they either add a non-ASCII-7 character to their template or the user submits a non-ASCII-7 String.

The Solution

If you got this far, you’re probably thinking “Holy shit this encoding stuff is crazy. I don’t want to have to know any of this! I just want to write my web app!”

And you’d be correct.

Other languages, such as Java and Python, solve this problem by encodeing every String that enters the language as UTF-8 (or UTF-16). Theoretically, it is possible to represent the characters of every encoding in UTF-8. By doing this, programmers only ever deal with one kind of String, and concatenation happens between UTF-8 Strings.

However, this solution does not work very well for the Japanese community. For a variety of complicated reasons, Japanese encoding, such as SHIFT-JIS, are not considered to losslessly encode into UTF-8. As a result, Ruby has a policy of not attempting to simply encode any inbound String into UTF-8.

This decision is debatable, but the fact is that if Ruby transparently transcoded all content into UTF-8, a large portion of the Ruby community would see invisible lossy changes to their content. That part of the community is willing to put up with incompatible encoding exceptions because properly handling the encodings they regularly deal with is a somewhat manual process.

On the other hand, many Rails applications work mostly with encodings that trivially encode to UTF-8 (such as UTF-8 itself, ASCII, and the ISO-8859-1 family). For this rather large part of the community, having to manually encode Strings to solve incompatible encoding problem feels like a burden that belongs on the machine has been inappropriately shifted onto Rails application developers.

But there is a solution.

By default, Ruby should continue to support Strings of many different encodings, and raise exceptions liberally when a developer attempts to concatenate Strings of different encodings. This would satisfy those with encoding concerns that require manual resolution.

Additionally, you would be able to set a preferred encoding. This would inform drivers at the boundary (such as database drivers) that you would like them to convert any Strings that they tag with an encoding to your preferred encoding immediately. By default, Rails would set this to UTF-8, so Strings that you get back from the database or other external source would always be in UTF-8.

If a String at the boundary could not be converted (for instance, if you set ISO-8859-1 as the preferred encoding, this would happen a lot), you would get an exception as soon as that String entered the system.

In practice, almost all usage of this setting would be to specify UTF-8 as a preferred encoding. From your perspective, if you were dealing in UTF-8, ISO-8859-* and ASCII (most Western developers), you would never have to care about encodings.

Even better, Ruby already has a mechanism that is mostly designed for this purpose. In Ruby 1.9, setting Encoding.default_internal tells Ruby to encode all Strings crossing the barrier via its IO system into that preferred encoding. All we’d need, then, is for maintainers of database drivers to honor this convention as well.

It doesn’t require any changes to Ruby itself, and places the burden squarely on the few people who already need to deal with encodings (because taking data from outside of Ruby, via C, always already requires a tagging step). I have spoken with Aaron Patterson, who has been working on the SQLite3 driver, and he feels that this change is simple enough for maintainers of drivers dealing with external Strings to make it a viable option. He has already patched SQLite3 to make it respect default_internal.

However you feel about Ruby’s solution to expose String encodings directly in the language, you should agree that since we’re stuck with it for the forseeable future, this solution shifts the burden of dealing with it from the unwashed masses (most of whom have no idea what an encoding is) to a few maintainers of C extensions and libraries that deal in binary data. Getting this right as soon as possible will substantially ease the transition from Ruby 1.8 to Ruby 1.9.

Postscript: What Happened in 1.8!?

When people first move to 1.9 and encounter these shenanigans, they often wonder why everything seemed so simple in Ruby 1.8, and yet seemed to work.

There are a few reasons for this.

First, keep in mind that in Ruby 1.8, Strings are simple sequences of bytes. Ruby String operations just concatenate those byte sequences together without any kind of check. This means that concatenating two UTF-8 Strings together will just work, since the combined byte sequence is still valid UTF-8. As long as the client for the Ruby code (such as the browser) is told that the bytes are encoded in UTF-8, all is well. Rails does this by setting the default charset for all documents to UTF-8.

Second, Ruby 1.8 has a “UTF-8″ mode that makes its regular expression engine treat all Strings as UTF-8. In this mode (which is triggered by setting $KCODE = “UTF-8″), the regular expression engine correctly matches a complete UTF-8 character for /./, for instance. Rails sets this global by default, so if you were using Rails, regular expressions respect unicode characters, not raw bytes.

Third, very little non-English content in the wild is actually encoded in ISO-8859-1. If you were expecting to deal with content that was not English, you would probably set your MySQL database to use a UTF-8 encoding. Since Rails sets UTF-8 as the charset of outbound documents, most browsers will in fact return UTF-8 encoded data.

Fourth, the problems caused when an ISO-8859-1 String is accidentally concatenated into a UTF-8 String are not as jarring as the errors produced by Ruby 1.9. Let’s try a little experiment. First, open up a text editor, create a new file, and save it in the ISO-8859-1 encoding.

$KCODE = "UTF-8"
 
iso_8859_1 = "ümlaut"
 
# the byte representation of ümlaut in unicode
utf8 = "\xC3\xBCmlat"
 
puts iso_8859_1
puts utf8
 
puts iso_8859_1 + utf8
puts utf8 + iso_8859_1
 
# Output
# ?mlat
# ümlaut
# ?mlatümlaut
# ümlaut?mlat

If you somehow get ISO-8859-1 encoded content that uses characters outside of the ASCII-7 range, Ruby doesn’t puke. Instead, it simply replaces the unidentified character with a “?”, which can easily go unnoticed in a mostly English site with a few “José”s thrown into the mix. It could also easily be dismissed as a “weird bug that we don’t have time to figure out right now”.

Finally, Rails itself provides a pure-Ruby UTF-8 library that mops up a lot of the remaining issues. Specifically, it provides an alternate String class that can properly handle operations like split, truncate, index, justify and other operations that need to operate on characters, not bytes. It then uses this library internally in helpers like truncate, transparently avoiding a whole other class of issue.

In short, if you’re dealing mostly with English text, and you get unlucky enough the get ISO-8859-1 input from somewhere, the worst case is that you get a “?” instead of a “é”. If you’re dealing with a lot of non-English text, you’re probably being not using ISO-8859-1 sources. In either case, English (ASCII) text is compatible with UTF-8, and Rails provides solid enough pure-Ruby UTF-8 support to get you most of the rest of the way.

That said, anyone dealing with encodings other than UTF-8 and ISO-8859-1 (Japanese and Russian Rubyists) were definitely not in a good place with Ruby 1.8.

Thanks

I want to personally thank Jay Freeman (aka saurik), who in addition to being a general badass, spent about 15 hours with me patiently explaining these issues and working through the Ruby 1.9 source to help fully understand the tradeoffs available.

The Web Doesn’t Suck. Browsers Are Innovating.

This week saw a flurry of back-and-forth about the future of the web platform. In the “web sucks” camp were Sachin Agarwal of Posterous (The Web Sucks. Browsers need to innovate) and Joe Hewitt (Sachin summarized some of his tweets at @joehewitt agrees with me).

Chris Blizzard responded with a few narrow examples of what Firefox has been doing lately (geolocation, orientation, WebGL).

On Hacker News, most of the comments took Sachin to task for his argument that standards don’t matter, pointing people back to the “bad old days” of the browser wars.

In my opinion, both camps kind of miss the point.

Sachin made some very pointed factual claims which are just complete, pure hokum. Close to the top of his post, he says:

Web applications don’t have threading, GPU acceleration, drag and drop, copy and paste of rich media, true offline access, or persistence. Are you kidding me?

Really? Are you kidding me. In fact, browsers have implemented, and the WHAT-WG has been busily standardizing, all of these things for quite a number of years now. Threading? Check. GPU acceleration? Check and check. Drag and drop? Check. Offline access and persistence? Check and check.

Almost universally, these features, which exist in shipping browsers today, were based on experiments conducted years ago by browsers (notably Firefox and Safari, more recently Chrome), and did not wait for approval from the standards bodies to begin implementing.

In fact, large swaths of HTML5 are based on proposals from browsers, who have either already implemented the feature (CSS-based animations and transitions) or who then build the feature without waiting for final approval. And this is transparently obvious to anyone, since HTML5 has not not yet been approved, and some parts are the subject of internal W3C debate, yet the browsers are implementing the parts they like and innovating in other areas.

You can see this explicitly in the features prefixed with -moz or -webkit, because they’re going ahead and implementing features that have not gotten consensus yet.

In 2010, the WHAT-WG is a functioning place to bring a proposal for further review once you’re conducting some experiments or have a working implementation. Nobody is sitting on their hands waiting for final approval of HTML5. Don’t confuse the fact that people are submitting their ideas to the standards bodies with the misguided idea that only approved standards are being implemented.

And in fact, this is basically never how it’s worked. Apple implemented the <canvas> tag and <input type="search" /> in 2004 (here’s a 2004 rant from the editor of the then-brand-new HTML5 spec). Opera and Apple worked on the <video> tag in 2007. The new CSS flexible box model is based on XUL, which Firefox has had for over a decade. HTML5 drag and drop support is based on implementations shipped by the IE team over a decade ago. Firefox has been extending JavaScript for almost its entire history (most recently with JavaScript 1.8).

CSS transition, transforms and animations were built by Apple for the iPhone before they were a spec, and only later ported to the desktop. Firefox did the initial experimentation on WebGL in 2006, back when they were calling it Canvas 3d (here‘s a working add-on by Mozilla developer Vladimir Vukićević).

Google Gears shipped what would become the Application Cache, Web Storage, and Web Workers, proving out the concepts. It also shipped geolocation, which is now part of the HTML5 spec.

@font-face originally shipped in Internet Explorer 5.5. Apple has added new mobile events (touchstart, touchmove, touchend, onorientationchange) with accompanying JavaScript APIs (Apple developer account needed) without any spec activity that I can discern. Firefox, at the same time, added support for hardware accelerometers.

Even Microsoft bucked the standards bodies by shipping its own implementation of the W3C’s “cross-origin-resource-sharing” spec called Cross domain request, and shipped it in IE8.

It’s perfectly understandable that people who haven’t been following along for the past few years might have missed all of this. But the fact remains that browser vendors have been moving at a very rapid clip, implementing all kinds of interesting features, and then sharing them with other browsers for feedback, and, one hopes, consensus.

Some of these implementations suck (I’ve heard some people say not-nice things about WebGL and drag-and-drop, for instance). But that’s quite a bit different from saying that browsers are sitting on their hands waiting for the W3C or WHAT-WG to tell them what to do.

Named Gem Environments and Bundler

In the beginning, Rubygems made a decision to allow multiple versions of individual gems in the system repository of gems. This allowed people to use whatever versions of gems they needed for individual scripts, without having to partition the gems for specific purposes.

This was a nice starting place. Being able to just install whatever gems into one place and have scripts just work, without having to partition your gems into buckets made Rubygems an extremely pleasant tool to work with. In my opinion, it was a good decision.

As the ecosystem has matured, and more people build gems with dependencies (especially loose dependencies, like “rack”, “>= 1.0″), this original sin has introduced the dreaded activation error:

can't activate rspec-rails (= 1.3.2, runtime) for [], 
already activated rspec-rails-2.0.0.beta.6

This error occurs because of the linear way that gems are “activated”. For instance, consider the following scenario:

# On your system:
thin (1.2.7)
- daemons (>= 1.0.9)
- eventmachine (>= 0.12.6)
- rack (>= 1.0.0)

actionpack (2.3.5)
- activesupport (= 2.3.5)
- rack (~> 1.0.0)

rack (1.0.0)
rack (1.1.0)
activesupport (2.3.5)
daemons (1.0.9)
eventmachine (0.12.6)

Quickly glancing at these dependencies, you can see that these two gems are “compatible”. The gems have the rack dependency in common, and the >= 1.0.0 is compatible with ~> 1.0.0.

However, there are two ways to require these gems. First, you could require actionpack first.

require "action_pack"
# will result in "activating" ActiveSupport 2.3.5 and Rack 1.0.0
 
require "thin"
# will notice that Rack 1.0.0 is already activated, and satisfies
# the >= 1.0.0 requirement, so just activates daemons and eventmachine

Second, you could require thin first.

require "thin"
# will result in "activating" Rack 1.1.0, Daemons 1.0.9, 
# and EventMachine 0.12.6
 
require "action_pack"
# will notice that Rack 1.1.0 is already activated, but that
# it is incompatible with ~> 1.0.0. Therefore, it will emit:
# ---
# can't activate rack (~> 1.0.0, runtime) for ["actionpack-2.3.5"], 
# already activated rack-1.1.0 for ["thin-1.2.7"]

In this case, because thin pulled in Rack 1.1, it becomes impossible to load in actionpack, despite the fact that a potentially valid combination exists.

This problem is fundamental to the approach of supporting different versions of the same gem in the system and activating gems linearly. In other words, because no single entity ever has the opportunity to examine the entire list of dependencies, the onus is on the user to make sure that the gems requires are ordered correctly. Sometimes, it means that the user must explicitly require the right version of a child dependency just to make sure that the right versions are loaded.

There are two possible solutions to this problem.

Multiple Named Environments

One solution to this problem is to catch potential conflicts at installation time and ask the user to manage multiple named environments, each with a fully consistent, non-conflicting view of the world.

The best way to implement this looks something like this (I’ll use a fictitious gemenv command to illustrate):

$ gemenv install thin
- Installing daemons (1.0.10)
- Installing eventmachine (0.12.10)
- Installing rack (1.1.0)
- Installing thin (1.2.7)

$ gemenv install actionpack -v 2.3.5
- Uninstalling rack (1.1.0)
- Installing rack (1.0.1)
- Installing actionpack (2.3.5)

$ gemenv install rack -v 1.1.0
- rack (1.1.0) conflicts with actionpack (2.3.5)
- if you want rack (1.1.0) and actionpack (2.3.5)
  you will need a new environment.

$ gemenv create rails3
$ gemenv install actionpack -v 3.0.0.beta3
- Installing abstract (1.0.0)
- Installing builder (2.1.2)
- Installing i18n (0.3.7)
- Installing memcache-client (1.8.2)
- Installing tzinfo (0.3.20)
- Installing activesupport (3.0.0.beta3)
- Installing activemodel (3.0.0.beta3)
- Installing erubis (2.6.5)
- Installing rack (1.1.0)
- Installing rack-mount (0.6.3)
- Installing rack-test (0.5.3)
- Installing actionpack (3.0.0.beta3)

$ gemenv use default
$ ruby -e "puts Rack::VERSION"
1.0.1
$ gemenv use rails3
$ ruby -e "puts Rack::VERSION"
1.1.0

Essentially, the single entity with full knowledge of all dependencies is the installer, and the user creates as many environments as he or she needs for the various non-conflicting sets of gems in use.

This works nicely, because it guarantees that once using an environment, all gems available are compatible. Note that the above command is fictitious, but it bears similarity to rip.

Virtual, Anonymous Environments

Another solution, the one bundler uses, is to allow multiple, conflicting versions of gems to exist in the system repository of packages, but to ensure a lack of conflicts by resolving the dependencies used by individual applications.

First, install conflicting gems. In this case, Rails 2.3 and Rails 3.0 require different, incompatible versions of Rack (1.0.x and 1.1.x).

$ gem install rails
... output ...
$ gem install rails -v 3.0.0.beta3
... output ...

Next, specify which version of Rails to use in your application’s Gemfile:

gem "rails", "2.3.5"

When running script/server in an app using Bundler, Bundler can determine that Rails needs Rack 1.0.x, and pulls in Rack 1.0.1 from the system. If you had specified gem "rails", "3.0.0.beta3", bundler would have pulled in Rack 1.1.0 from the system.

In essence, we have the same kind of isolation as the fictitious command above, but instead of manually managing named environments, Bundler creates virtual isolated environments based on the list of gems used in your application.

Why Did We Use Virtual, Anonymous Environments?

When considering the tradeoffs between these two solutions, we realized that applications already typically have a list (executable or not) of its dependencies. The gem install command already works great for installing dependencies, and introducing a dependency resolution step there feels awkward and out of place.

Additionally, as an application evolves, it’s natural to continue updating its list of gems, keeping a record of the changes as you go. You could keep a separate named environment for each application, but you’d probably want to keep a list of the dependencies in the application anyway so that it’s possible to get up and running on a second machine.

In short, since a list of an application’s dependencies makes sense anyway, why burden the end-user with the need to maintain separate named environments and manually install gems.

And once we’re doing this, why not build a toolchain around installing gems and keeping records of not only the top-level installed gems, but the exact versions of all gems used at a given time? Why not build in support for “edge” gems on your local machine or in git repositories? Why not create workflows for sharing your application with other developers and deploying your application?

As application and gem developers ourselves, we wanted a tool that managed application’s dependencies across the lifecycle of an application, in the context of an application.

That said, there is absolutely room for experimentation in this space. Tools that enforce consistency at install time might be extremely appropriate for managing the gems that you use in scripts, but which you don’t share with other machines. Context matters.

Postscript

We commonly hear something to the effect of “but why add all this complexity? Without all these features, bundler could be so much smaller!”. The truth is that the bundler code itself is under 2,000 lines of code, and hasn’t grown a whole lot in the past few major revisions.

The new rpg tool, recently released by the venerable Ryan Tomayko, is also in that range. The original rip (currently in the “classic” branch) was also in that ballpark. The new rip (the current master branch), may well demonstrate that a fully-featured dependency management system for Ruby can be written in many fewer lines of code. If so, I’m excited to see the abstractions that they use to make it so.

But the bottom line is that all of the new package management solutions have done a good job of packing features into a small number of new lines of code, bundler included. By starting with good abstractions, it’s often possible to add rich new features without having to add a whole lot of new code (or even by deleting code!). We’ve certainly found that in our journey with bundler.

Ruby Require Order Problems

Bundler has inadvertantly exposed a number of require order issues in existing gems. I figured I’d take the opportunity to talk about them. There are basically two kinds of gem ordering issues:

Missing Requires

Imagine a gem that uses nokogiri, but never requires it. Instead, it assumes that something that is required before it will do the requiring. This happens relatively often with Rails plugins (which were previously able to assume that they were loaded at a particular time and place).

A well-known example of this problem is gems that didn’t require “yaml” because Rubygems required it. When Rubygems removed its require in 1.3.6, a number of gems broke. In general, bundler assumes that gems require what they need. If gems do so, this class of ordering problems is eliminated.

Constant Definition as API

This is where a gem checked for defined?(SomeConstant) to decide whether to activate some functionality.

This is a bit tricky. Essentially, the gem is saying that the API for using it is “require some other gem before me”. Personally, I consider that to be a problematic API, because it’s very implicit, and can be hard to track down exactly who did what require.

A better solution is to provide an explicit hook for people to activate the optional functionality. For instance, Haml
provides: Haml.init_rails(binding) which you can run after activating Haml.

This is slightly more manual than some would like, but the API of “make sure you require the optional dependencies before me” is also manual, and more error-prone.

Even if Bundler “respected require order”, which we plan to do (in some way) in 0.10, it’s still up to the user of the gem to ensure that they listed the optional gem above the required gem. This is not ideal.

A workaround that works great in Bundler 0.9 is to simply require the order-dependent gems above Bundler.require. We do this in Rails, so that gems can test for the existence of the Rails constant to decide whether to add optional Rails dependencies.

require "rails/all" 
Bundler.require(:default, Rails.env)

In the case of shoulda and mocha, a better solution could be:

# Gemfile 
group :test do 
  gem "shoulda" 
  gem "mocha" 
  # possible other gems where order doesn't matter 
end
 
# application.rb 
Bundler.require(:default) 
Bundler.require(Rails.env) unless Rails.env.test? 
 
# test_helper.rb 
# Since the order matters, require these gems manually, in the right order 
require "shoulda" 
Bundler.require(:test)

In my opinion, if you have gems that specifically depend on other gems, it is appropriate to manually require them first before automatically requiring things using Bundler.require. You should treat Bundler.require as a shortcut
for listing out a stack of requires.

Possible solutions?

One long-term solution, if we get gem metadata in Rubygems 1.4 (or optional dependencies down the line), would be to specify that a gem has optional dependencies on another gem and specify a file to run both gems are
available.

For instance, the Haml gem could say:

# gemspec 
s.integrates_with "rails", "~> 3.0.0.beta2", "haml/rails" 
 
# haml/rails.rb 
require "haml" 
require "rails" 
Haml.init_rails(binding)

Bundler would handle this in Bundler.require (or possibly Bundler.setup).

If this feature existed in Rubygems, it would work via gem activation. If the Haml gem was activated after the Rails gem, it would require “haml/rails” immediately.

If the Haml gem was activated otherwise, it wouldn’t do anything until the Rails gem was activated. When the Rails gem was activated, it would require the file.

Some of the Problems Bundler Solves

This post does not attempt to convince you to use bundler, or compare it to alternatives. Instead, I will try to articulate some of the problems that bundler tries to solve, since people have often asked. To be clear, users of bundler should not need to understand these issues, but some might be curious.

If you’re looking for information on bundler usage, check out the official Bundler site.

Dependency Resolution

This is the problem most associated with bundler. In short, by asking you to list all of your dependencies in a single manifest, bundler can determine, up front, a valid list of all of the gems and versions needed to satisfy that manifest.

Here is a simple example of this problem:

$ gem install thin
Successfully installed rack-1.1.0
Successfully installed eventmachine-0.12.10
Successfully installed daemons-1.0.10
Successfully installed thin-1.2.7
4 gems installed

$ gem install rails
Successfully installed activesupport-2.3.5
Successfully installed activerecord-2.3.5
Successfully installed rack-1.0.1
Successfully installed actionpack-2.3.5
Successfully installed actionmailer-2.3.5
Successfully installed activeresource-2.3.5
Successfully installed rails-2.3.5
7 gems installed

$ gem dependency actionpack -v 2.3.5
Gem actionpack-2.3.5
  activesupport (= 2.3.5, runtime)
  rack (~> 1.0.0, runtime)

$ gem dependency thin
Gem thin-1.2.7
  daemons (>= 1.0.9, runtime)
  eventmachine (>= 0.12.6, runtime)
  rack (>= 1.0.0, runtime)

$ irb
>> require "thin"
=> true
>> require "actionpack"
Gem::LoadError: can't activate rack (~> 1.0.0, runtime) 
for ["actionpack-2.3.5"], already activated rack-1.1.0
for ["thin-1.2.7"]

What happened here?

Thin declares that it can support any version of Rack above 1.0. ActionPack declares that it can support versions 1.0.x of Rack. When we require thin, it looks for the highest version of Rack that thin can support (1.1), and makes it available on the load path. When we require actionpack, it notes that the version of Rack already on the load path (1.1) is incompatible with actionpack (which requires 1.0.x) and throws an exception.

Thankfully, newer versions of Rubygems provide reasonable information about exactly what gem (“thin 1.2.7″) put Rack 1.1.0 on the load path. Unfortunately, there is often nothing the end user can do about it.

Rails could theoretically solve this problem by loosening its Rack requirement, but that would mean that ActionPack declared compatibility with any future version of Rack, a declaration ActionPack is unwilling to make.

The user can solve this problem by carefully ordering requires, but the user is never in control of all requires, so the process of figuring out the right order to require all dependencies can get quite tricky.

It is conceptually possible in this case, but it gets extremely hard when more than a few dependencies are in play (as in Rails 3).

Groups of Dependencies

When writing applications for deployments, developers commonly want to group their dependencies. For instance, you might use SQLite in development but Postgres in production.

For most people, the most important part of the grouping problem is making it possible to install the gems in their Gemfile, except the ones in specific groups. This introduces two additional problems.

First, consider the following Gemfile:

gem "rails", "2.3.5"
 
group :production do
  gem "thin"
end

Bundler allows you to install the gems in a Gemfile minus the gems in a specific group by running bundle install --without production. In this case, since rails depends on Rack, specifying that you don’t want to include thin means no thin, no daemons and no eventmachine but yes rack. In other words, we want to exclude the gems in the group specified, and any dependencies of those gems that are not dependencies of other gems.

Second, consider the following Gemfile:

gem "soap4r", "1.5.8"
 
group :production do
  gem "dm-salesforce", "0.10.3"
end

The soap4r gem depends on httpclient >= 2.1.1, while the dm-salesforce gem depends on httpclient =2.1.5.2. Initially, when you did bundle install --without production, we did not include gems in the production group in the dependency resolution process.

In this case, consider the case where httpclient 2.1.5.2 and httpclient 2.2 exist on Rubyforge.org. In development mode, your app will use the latest version (2.2), but in production, when dm-salesforce is included, the older version will be used.

Note that this happened even though you specified only hard versions at the top level, because not all gems use hard versions as their dependencies.

To solve this problem, Bundler downloads (but does not install) all gems, including gems in groups that you exclude (via --without). This allows you to specify gems with C extensions that can only compile in production (or testing requirements that depend on OSX for compilation) while maintaining a coherent list of gems used across all of these environments.

System Gems

In 0.8 and before, bundler installed all gems in the local application. This provided a neat sandbox, but broke the normal path for running a new Rails app:

$ gem install rails
$ rails myapp
$ cd myapp
$ rails server

Instead, in 0.8, you’d have to do:

$ gem install rails
$ rails myapp
$ cd myapp
$ gem bundle
$ rails server

Note that the gem bundle command became bundle install in Bundler 0.9.

In addition, this meant that Bundler needed to download and install commonly used gems over and over again if you were working on multiple apps. Finally, every time you changed the Gemfile, you needed to run gem bundle again, adding a “build step” that broke the flow of early Rails application.

In Bundler 0.9, we listened to this feedback, making it possible for bundler to use gems installed in the system. This meant that the ideal Rails installation steps could work, and you could share common gems between applications.

However, there were a few complications.

Since we now use gems installed in the system, Bundler resolves the dependencies in your Gemfile against your system sources at runtime, making a list of all of the gems to push onto the load path. Calling Bundler.setup kicks off this process. If you specified some gems not to install, we needed to make sure bundler did not try to find those gems in the system.

In order to solve this problem, we create a .bundle directory inside your application that remembers any settings that need to persist across bundler invocations.

Unfortunately, this meant that we couldn’t simply have people run sudo bundle install because root would own your application’s .bundle directory.

On OSX, root owns all paths that are, by default, in $PATH. It also owns the default GEM_HOME. This has two consequences. First, we could not trivially install executables from bundled gems into a system path. Second, we could not trivially install gems into a place that gem list would see.

In 0.9, we solved this problem by placing gems installed by bundler into BUNDLE_PATH, which defaults to ~/.bundle/#{RUBY_ENGINE}/#{RUBY_VERSION}. rvm, which does not install executables or gems into a path owned by root, helpfully sets BUNDLE_PATH to the same location as GEM_HOME. This means that when using rvm, gems installed via bundle install appear in gem list.

This also means that when not using rvm, you need to use bundle exec to place the executables installed by bundler onto the path and set up the environment.

In 0.10, we plan to bump up the permissions (by shelling out to sudo) when installing gems so we can install to the default GEM_HOME and install executables to a location on the $PATH. This will make executables created by bundle install available without bundle exec and will make gems installed by bundle install available to gem list on OSX without rvm.

Another complication: because gems no longer live in your application, we needed a way to snapshot the list of all versions of all gems used at a particular time, to ensure consistent versions across machines and across deployments.

We solved this problem by introducing a new command, bundle lock, which created a file called Gemfile.lock with a serialized representation of all versions of all gems in use.

However, in order to make Gemfile.lock useful, it would need to work in development, testing, and production, even if you ran bundle install --without production in development and then ran bundle lock. Since we had already decided that we needed to download (but not install) gems even if they were excluded by --without, we could easily include all gems (including those from excluded groups) in the Gemfile.lock.

Initially, we didn’t serialize groups exactly right in the Gemfile.lock causing inconsistencies between how groups behaved in unlocked and locked mode. Fixing this required a small change in the lock file format, which caused a small amount of frustration by users of early versions of Bundler 0.9.

Git

Very early (0.5 era) we decided that we would support prerelease “gems” that lived in git repositories.

At first, we figured we could just clone the git repositories and add the lib directory to the load path when the user ran Bundler.setup.

We abstracted away the idea of “gem source”, making it possible for gems to be found in system rubygems, remote rubygems, or git repositories. To specify that a gem was located in a git “source”, you could say:

gem "rspec-core", "2.0.0.beta.6", :git => "git://github.com/rspec/rspec-core.git"

This says: “You’ll find version 2.0.0.beta.6 in git://github.com/rspec/rspec-core.git”.

However, there were a number of issues involving git repositories.

First, if a prerelease gem had dependencies, we’d want to include those dependencies in the dependency graph. However, simply trying to run rake build was a nonstarter, as a huge number of prerelease gems have dependencies in their rake file that are only available to a tool like bundler once the gem is built (a chicken and egg problem). On the flip side, if another gem depended on a gem provided by a git repository, we were asking users to supply the version, an error-prone process since the version could change in the git repository and bundler wouldn’t be the wiser.

To solve this, we asked gem authors to put a .gemspec in the root of their repository, which would allow us to see the dependencies. A lot of people were familiar with this process, since github had used it for a while for automatically generating gems from git repositories.

At first, we assumed (like github did) that we could execute the .gemspec standalone, out of the context of its original repository. This allowed us to avoid cloning the full repository simply to resolve dependencies. However, a number of gems required files that were in the repository (most commonly, they required a version file from the gem itself to avoid duplication), so we modified bundler to do a full checkout of the repository so we could execute the gemspec in its original context.

Next, we found that a number of git repositories (notably, Rails) actually contained a number of gems. To support this, we allowed any number of .gemspec files in a repository, and would evaluate each in the context of its root. This meant that a git repository was more analogous to a gem source (like Rubygems.org) than a single .gem file.

Soon enough, people started complaining that they tried to use prerelease gems like nokogiri from git and bundler wasn’t compiling C extensions. This proved tricky, because the process that Rubygems uses to compile C extensions is more than a few lines, and we wanted to reuse the logic if possible.

In most cases, we were able to solve this problem by having bundler run gem build gem_name.gemspec on the gemspec, and using Rubygems’ native C extension process to compile the gem.

In a related problem, we started receiving reports that bundler couldn’t find rake while trying to compile C extensions. It turns out that Rubygems supports a rake compile mode if you use s.extensions = %w(Rakefile) or something containing mkrf. This essentially means that Rubygems itself has an implicit dependency on Rake. Since we sort the installed gems to make sure that dependencies get installed before the gems that depend on them, we needed to make sure that Rake was installed before any gem.

For git gems, we needed to make sure that Gemfile.lock remembered the exact revision used when bundler took the snapshot. This required some more abstraction, so sources could provide and load in agnostic information that they could use to reinstall everything identically to when bundler took the snapshot.

If a git gem didn’t supply a .gemspec, we needed to create a fake .gemspec that we could use throughout the process, based on the name and version the user specified for the repository. This would allow it to participate in the dependency resolution process, even if the repository itself didn’t provide a .gemspec.

If a repository did provide a .gemspec, and the user supplied a version or version range, we needed to confirm that the version provided matched the version specified in the .gemspec.

We checked out the git repositories into BUNDLE_PATH (again, defaulting to ~/.bundle/#{RUBY_ENGINE}/#{RUBY_VERSION} or $GEM_HOME with rvm) using the --bare option. This allows us to share git repositories like the rails repository, and then make local checkouts of specific revisions, branches or tags as specified by individual Gemfiles.

One final problem, if your Gemfile looks like this:

source "http://rubygems.org"
 
gem "nokogiri"
gem "rails", :git => "git://github.com/rails/rails.git", :tag => "v2.3.4"

You do not expect bundler to pull in the version from Rubygems.org, even though it’s newer. Because bundler treats the git repository as a gem source, it initially pulled in the latest version of the gem, regardless of the source. To solve this problem, we added the concept of “pinned dependencies” to the dependency resolver, allowing us to ask it to skip traversing paths that got the rails dependencies from other sources.

Paths

Now that we had git repositories working, it was a hop, skip and jump to support any path. We could use all of the same heuristics as we used for git repositories (including using gem build to install C extensions and having multiple version) on any path in the file system.

With so many sources in the mix, we started seeing cases where people had different gems with the exact same name and version in different sources. Most commonly, people would have created a gem from a local checkout of something (like Rack), and then, when the final version of the gem was released to Rubygems.org, we were still using the version installed locally.

We tried to solve this problem by forcing a lookup in Rubygems.org for a gem, but this contrasted with people who didn’t want to have to hit a remote repository when they had all the gems locally.

When we first started talking to early adopters, they were incredulous that this could happen. “If you do something stupid like that, f*** you”. One by one, those very same people fell victim to the “bug”. Unfortunately, it manifests itself as can't find active_support/core_ext/something_new, which is extremely confusing and can appear to be a generic “bundler bug”. This is especially problematic if the dependencies change in two copies of the gem with identical names and versions.

To solve this problem, we decided that if you had snapshotted the repository via bundle lock and had all of the required gems on your local machine, we would not try to hit a remote. However, if you run bundle install otherwise, we always check to see if there is a newer version in the remote.

In fact, this class of error (two different copies of the gems with the same name and version) has resulting in a fairly intricate prioritization system, which can be different in different scenarios. Unfortunately, the principle of least surprise requires that we tweak these priorities for different scenarios.

While it seems that we could just say “if you rake install a gem you’re on your own”, it’s very common, and people expect things to mostly work even in this scenario. Small tweaks to these priorities have also resulted in small changes in behavior between versions of 0.9 (but only in cases where the exact same name and versioned gems, in different sources, provides different code).

In fact, because of the overall complexity of the problem, and because of different ways that these features can interact, very small tweaks to different parts of the system can result in unexpected changes. We’ve gotten pretty good at seeing the likely outcome of these tweaks, but they can be baffling to users of bundler. A major goal of the lead-in to 1.0 has been to increase determinism, even in cases where we have to arbitrarily pick a “right” answer.

Conclusion

This is just a small smattering of some of the problems we’ve encountered while working on bundler. Because the problem is non-trivial (and parts are np-complete), adding an apparently simple feature can upset the equilibrium of the entire system. More frustratingly, adding features can sometimes change “undefined” behavior that accidentally breaks a working system as a result of an upgrade.

As we head into 0.10 and 1.0, we hope to add some additional features to smooth out the typical workflows, while stabilizing some of the seeming indeterminism in Bundler today. One example is imposing a standard require order for gems in the Gemfile, which is currently “undefined”.

Thanks for listening, and getting to the end of this very long post.

Using .gemspecs as Intended

When you clone a repository containing a Unix tool (or download a tarball), there’s a standard way to install it. This is expected to work without any other dependencies, on all machines where the tool is supported.

$ autoconf
$ ./configure
$ make
$ sudo make install

This provides a standard way to download, build and install Unix tools. In Ruby, we have a similar (little-known) standard:

$ gem build gem_name.gemspec
$ gem install gem_name-version.gem

If you opt-into this convention, not only will it simplify the install process for your users, but it will make it possible for bundler (and other future automated tools) to build and install your gem (including binaries, proper load path handling and compilation of C extensions) from a local path or git repository.

What to Do

Create a .gemspec in the root of your repository and check it in.

Feel free to use dynamic code in here. When your gem is built, Rubygems will run that code and create a static representation. This means it’s fine to pull your gem’s version or other shared details out of your library itself. Do not, however, use other libraries or dependencies.

You can also use Dir[] in your .gemspec to get a list of files (and remove files you don’t want with -; see the example below).

Here’s bundler’s .gemspec:

# -*- encoding: utf-8 -*-
lib = File.expand_path('../lib/', __FILE__)
$:.unshift lib unless $:.include?(lib)
 
require 'bundler/version'
 
Gem::Specification.new do |s|
  s.name        = "bundler"
  s.version     = Bundler::VERSION
  s.platform    = Gem::Platform::RUBY
  s.authors     = ["Carl Lerche", "Yehuda Katz", "André Arko"]
  s.email       = ["carlhuda@engineyard.com"]
  s.homepage    = "http://github.com/carlhuda/bundler"
  s.summary     = "The best way to manage your application's dependencies"
  s.description = "Bundler manages an application's dependencies through its entire life, across many machines, systematically and repeatably"
 
  s.required_rubygems_version = ">= 1.3.6"
  s.rubyforge_project         = "bundler"
 
  s.add_development_dependency "rspec"
 
  s.files        = Dir.glob("{bin,lib}/**/*") + %w(LICENSE README.md ROADMAP.md CHANGELOG.md)
  s.executables  = ['bundle']
  s.require_path = 'lib'
end

If you didn’t already know this, the DSL for gem specifications is already pretty clean and straight-forward, there is no need to generate your gemspec using alternative tools.

Your gemspec should run standalone, ideally with no additional dependencies. You can assume its __FILE__ is located in the root of your project.

When it comes time to build your gem, use gem build.

$ gem build bundler.gemspec

This will spit out a .gem file properly named with a static version of the gem specification inside, having resolved things like Dir.glob calls and the version you might have pulled in from your library.

Next, you can push your gem to Rubygems.org quickly and painlessly:

$ gem push bundler-0.9.15.gem

If you’ve already provided credentials, you’ve now published your gem. If not, you will be asked for your credentials (once per machine).

You can easily automate this process using Rake:

$LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
require "bundler/version"
 
task :build do
  system "gem build bundler.gemspec"
end
 
task :release => :build do
  system "gem push bundler-#{Bunder::VERSION}"
end

Using tools that are built into Ruby and Rubygems creates a more streamlined, conventional experience for all involved. Instead of trying to figure out what command to run to create a gem, expect to be able to run gem build mygem.gemspec.

A nice side-effect of this is that those who check in valid .gemspec files can take advantage of tools like bundler that allow git repositories to stand in for gems. By using the gem build convention, bundler is able to generate binaries and compile C extensions from local paths or git repositories in a conventional, repeatable way.

Try it. You’ll like it.

Archives

Categories

Blogroll

Meta