Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

jQuery 1.4 and Malformed JSON

Today, we released jQuery 1.4, a mostly backward compatible release with a few minor quirks. One of these quirks is that jQuery now uses the native JSON parser where available, rejecting malformed JSON.

This probably sounds fine, except that by malformed JSON, we mostly mean:

{foo: "bar"}

Because this is valid JavaScript, people have long assumed it was valid JSON, and a lot of hand-coded JSON is written like this. Thankfully, JSON automatically generated by Rails does not have this problem, so most of you guys should not have an issue here.

If you have server-side JSON that can’t be quickly fixed in the transition to jQuery 1.4, you can use the compatibility plugin, which tries to patch a number of small incompatibilities, or you can use this workaround:

$.ajax({url: "/url", 
  dataType: "json",
  success: function(json) {
    // do something with json
  }
});
 
// becomes
 
$.ajax({url: "/url",
  dataType: "text",
  success: function(text) {
    json = eval("(" + text + ")");
    // do something with JSON
  }
});

Essentially, this is what jQuery used to do to evaluate JSON, so with these small changes you can get the old behavior.

Again, though, you should probably be modifying your busted server-side code to stop sending malformed JSON!

ActiveModel: Make Any Ruby Object Feel Like ActiveRecord

Rails 2.3 has a ton of really nice functionality locked up in monolithic components. I’ve posted quite a bit about how we’ve opened up a lot of that functionality in ActionPack, making it easier to reuse the router, dispatcher, and individual parts of ActionController. ActiveModel is another way we’ve exposed useful functionality to you in Rails 3.

Before I Begin, The ActiveModel API

Before I begin, there are two major elements to ActiveModel. The first is the ActiveModel API, the interface that models must adhere to in order to gain compatibility with ActionPack’s helpers. I’ll be talking more about that soon, but for now, the important thing about the ActiveModel API is that your models can become ActiveModel compliant without using a single line of Rails code.

In order to help you ensure that your models are compliant, ActiveModel comes with a module called ActiveModel::Lint that you can include into your test cases to test compliance with the API:

class LintTest < ActiveModel::TestCase
  include ActiveModel::Lint::Tests
 
  class CompliantModel
    extend ActiveModel::Naming
 
    def to_model
      self
    end
 
    def valid?()      true end
    def new_record?() true end
    def destroyed?()  true end
 
    def errors
      obj = Object.new
      def obj.[](key)         [] end
      def obj.full_messages() [] end
      obj
    end
  end
 
  def setup
    @model = CompliantModel.new
  end
end

The ActiveModel::Lint::Tests provide a series of tests that are run against the @model, testing for compliance.

ActiveModel Modules

The second interesting part of ActiveModel is a series of modules provided by ActiveModel that you can use to implement common model functionality on your own Ruby objects. These modules were extracted from ActiveRecord, and are now included in ActiveRecord.

Because we’re dogfooding these modules, you can be assured that APIs you bring in to your models will remain consistent with ActiveRecord, and that they’ll continue to be maintained in future releases of Rails.

The ActiveModel comes with internationalization baked in, providing an avenue for much better community sharing around translating error messages and the like.

The Validations System

This was perhaps the most frustrating coupling in ActiveRecord, because it meant that people writing libraries for, say, CouchDB had to choose between painstakingly copying the API over, allowing inconsistencies to creep in, or just inventing a whole new API.

Validations have a few different elements.

First, declaring the validations themselves. You’ve seen the usage before in ActiveRecord:

class Person < ActiveRecord::Base
  validates_presence_of :first_name, :last_name
end

To do the same thing for a plain old Ruby object, simply do the following:

class Person
  include ActiveModel::Validations
 
  validates_presence_of :first_name, :last_name
 
  attr_accessor :first_name, :last_name
  def initialize(first_name, last_name)
    @first_name, @last_name = first_name, last_name
  end
end

The validations system calls read_attribute_for_validation to get the attribute, but by default, it aliases that method to send, which supports the standard Ruby attribute system of attr_accessor.

To use a more custom attribute lookup, you can do:

class Person
  include ActiveModel::Validations
 
  validates_presence_of :first_name, :last_name
 
  def initialize(attributes = {})
    @attributes = attributes
  end
 
  def read_attribute_for_validation(key)
    @attributes[key]
  end
end

Let’s look at what a validator actually is. First of all, the validates_presence_of method:

def validates_presence_of(*attr_names)
  validates_with PresenceValidator, _merge_attributes(attr_names)
end

You can see that validates_presence_of is using the more primitive validates_with, passing it the validator class, merging in {:attributes => attribute_names} into the options passed to the validator. Next, the validator itself:

class PresenceValidator < EachValidator
  def validate(record)
    record.errors.add_on_blank(attributes, options[:message])
  end
end

The EachValidator that it inherits from validates each attribute with the validate method. In this case, it adds the error message to the record, only if the attribute is blank.

The add_on_blank method does add(attribute, :blank, :default => custom_message) if value.blank? (among other things), which is adding the localized :blank message to the object. If you take a look at the built-in locale/en.yml looks like:

en:
  errors:
    # The default format use in full error messages.
    format: "{{attribute}} {{message}}"
 
    # The values :model, :attribute and :value are always available for interpolation
    # The value :count is available when applicable. Can be used for pluralization.
    messages:
      inclusion: "is not included in the list"
      exclusion: "is reserved"
      invalid: "is invalid"
      confirmation: "doesn't match confirmation"
      accepted: "must be accepted"
      empty: "can't be empty"
      blank: "can't be blank"
      too_long: "is too long (maximum is {{count}} characters)"
      too_short: "is too short (minimum is {{count}} characters)"
      wrong_length: "is the wrong length (should be {{count}} characters)"
      not_a_number: "is not a number"
      greater_than: "must be greater than {{count}}"
      greater_than_or_equal_to: "must be greater than or equal to {{count}}"
      equal_to: "must be equal to {{count}}"
      less_than: "must be less than {{count}}"
      less_than_or_equal_to: "must be less than or equal to {{count}}"
      odd: "must be odd"
      even: "must be even"

As a result, the error message will read first_name can't be blank.

The Error object is also a part of ActiveModel.

Serialization

ActiveRecord also comes with default serialization for JSON and XML, allowing you to do things like: @person.to_json(:except => :comment).

The main important part of the serialization support is adding general support for specifying the attributes to include across all serializers. That means that you can do @person.to_xml(:except => :comment) as well.

To add serialization support to your own model, you will need to include the serialization module and implement attributes. Check it out:

class Person
  include ActiveModel::Serialization
 
  attr_accessor :attributes
  def initialize(attributes)
    @attributes = attributes
  end
end
 
p = Person.new(:first_name => "Yukihiro", :last_name => "Matsumoto")
p.to_json #=> %|{"first_name": "Yukihiro", "last_name": "Matsumoto"}|
p.to_json(:only => :first_name) #=> %|{"first_name": "Yukihiro"}|

You can also pass in a :methods option to specify methods to call for certain attributes that are determined dynamically.

Here's the Person model with validations and serialization:

class Person
  include ActiveModel::Validations
  include ActiveModel::Serialization
 
  validates_presence_of :first_name, :last_name
 
  attr_accessor :attributes
  def initialize(attributes = {})
    @attributes = attributes
  end
 
  def read_attribute_for_validation(key)
    @attributes[key]
  end
end

Others

Those are just two of the modules available in ActiveModel. Some others include:

  • AttributeMethods: Makes it easy to add attributes that are set like table_name :foo
  • Callbacks: ActiveRecord-style lifecycle callbacks.
  • Dirty: Support for dirty tracking
  • Naming: Default implementations of model.model_name, which are used by ActionPack (for instance, when you do render :partial => model
  • Observing: ActiveRecord-style observers
  • StateMachine: A simple state-machine implementation for models
  • Translation: The core translation support

This mostly reflects the first step of ActiveRecord extractions done by Josh Peek for his Google Summer of Code project last summer. Over time, I expect to see more extractions from ActiveRecord and more abstractions built up around ActiveModel.

I also expect to see a community building up around things like adding new validators, translations, serializers and more, especially now that they can be reused not only in ActiveRecord, but in MongoMapper, Cassandra Object, and other ORMs that leverage ActiveModel's built-in modules.

The Maximal Usage Doctrine for Open Source

I’ve worked on a number of open source projects over the past several years (the most prominent being Merb, Ruby on Rails and jQuery) and have begun to form some thoughts about the usage (aka adoption) of open source projects and the practical effects of license styles.

The Playing Field

There are essentially two kinds of licenses popularly used in open source[1].

The type I’ve worked with most extensively in the BSD or MIT-style license. This license allows unlimited usage, modification, distribution, and commercialization of the source code, with two caveats. First, the copyright notice must be distributed with the source code. Second, to the extent legally enforceable, if you use my MIT-licensed source code, you may not sue me for the effects of using the source.

The other major license type is called copyleft, and essentially leverages copyright law to insist that any modifications to the original source must themselves be made public. The practical effect of this license could not be effected without government-enforced copyright laws.

The most popular copyleft license is GPL, which is used by Linux, and which also has viral characteristics. In short, if you link code covered by the GPL into your own code, you are required to open your code as well (if you distribute it). The GPL is not very clear about what this means for more dynamic languages (is requiring a Ruby file “linking”?).

A less viral version of the GPL, the LGPL, requires that any modifications you make to the software itself be released, but drops the “linking” requirement.

Because of the uncertainty surrounding the viral characteristics of the GPL, legal departments in big corporations are very hostile to the GPL license. Software covered by the LGPL license is more likely to get approval by corporate legal, and MIT/BSD licenses are the most well-liked (probably because they don’t confer any obligations on the corporation).

What I Want When I Write Open Source

When I work on a serious open source project, like Ruby on Rails or jQuery, I have relatively simple desires.

One. I want as many people as possible to use the software I am working on. This is probably a selfish, egoist desire, but it may also be a desire to see useful software actually used.

Two. I want some subset of the many users to be willing to report bugs, and for the varied situations the code is used in to help improve the quality of the software as a result. This is essentially “given enough eyeballs, all bugs are shallow”.

Three. I want people who are interested in improving the software to submit patches. Specifically, I want to receive patches from people who have thought through the problem the bug is trying to fix, and want to receive fewer patches from people who have hacked together a solution just to get it to work. In essence, a high signal-to-noise ratio of patches is more important than a high volume of patches.

Four. I want people who are interested in improving the software long term to become long-term contributors and eventually committers.

Of these concerns, numbers two and three are the highest priorities. I want to expose the code to as much real-world pressure as possible, and I want as many high-quality patches to fix those bugs as possible. If I had to choose between those two, I would pick number two: exposing my code to real-world stress is the best way to rapidly ferret out bugs and incorrect design assumptions that I know of.

Meeting Those Requirements

Starting out with the easiest, my first desire, to have my software used as much as possible, is most easily satisfied by an extremely liberal usage policy. Adding restrictions on the use of software I write reduces its adoption almost by definition.

Much more importantly, the same can be said about exposing code to real world stresses. By far the most important way to achieve this goal is to make it as easy as possible for as many people as possible to use the code.

If only 1% of all proprietary users of the source ever report bugs, that’s 1% of potentially thousands of users, as opposed to 100% of the zero proprietary users who were able to use the software under a more restrictive usage scheme. In practice, this number is much more than 1%, as proprietary users of software experience and report bugs just like open source users do.

The only real counter-argument to this is that by forcing users to contribute, some number of proprietary users will be forced to become open source users, and their contributions will outweigh the smaller contributions of proprietary users. In practice, proprietary users choose proprietary solutions instead when they are forced to choose between restrictive open source usage schemes and other proprietary software.

There is also much to be said for exposing open source tools into proprietary environments.

There is also much to be said for introducing proprietary developers to the open source ecosystem. Proprietary developers have access to things like paid time, access to obscure usage scenarios, and access to markets with low open source penetration that the open source community lacks.

The next major desire I have while working on open source is a steady stream of high-quality patches. This should be advantage copyleft, because all users of the software are forced to contribute back. However, since copyleft licenses are not used in proprietary environments anyway, the patches to open source projects from those environments under more permissive licenses are much more numerous. Again, even if only a few percent of proprietary users contribute back to the project, that is significantly more contributions than the 100% of zero proprietary users.

Also importantly, the patches are contributed by much more dedicated users of the software, instead of being force-contributions. I have never heard a team member on an open source project say that inadequate patches are received by permissive-license software, and that this problem would be solved by going to a more restrictive model.

Finally, I can only speak for jQuery and Rails (and other smaller open source projects I work on), but a large number of new long-term contributors became involved while working on corporate projects, where a permissive license made the decision to use the software in the first place feasible.

Lower Barrier to Entry Helps Meet Open Source Goals

Regardless of how much of the above argument you agree with, it is clear that copyleft licenses intentionally impose a higher barrier to entry for usage than more permissive licenses.

For projects that use more permissive licenses, the fact that many proprietary users of their software do not contribute back changes is a feature, not a bug.

That’s because we don’t focus on all of the people who use our software without contributing back. Instead, we focus on all the users, bug reports, patches, and long term contributors we have gained by keeping the barrier to entry as low as possible.

In the end, the world of ubiquitous open source is here. And we made it without having to resort to coercion or the formation of an entirely self-contained set of software. We made it by building great software and convincing real users that it was better than the alternative.

Postscript: Linux

Linux is a peculiar example because its license has not impeded its usage much. In part, that is because most users of Linux do not make changes to it, and the linking characteristics of the GPL license rarely come into play.

In cases like this, I would argue that copyleft licenses are close enough to usage maximization to get most of the benefits of more permissive licenses. However, it’s not clear to me what benefits the copyleft licenses provide in those (relatively rare) cases.

[1] There are other kinds of licenses, like the Affero license, which is even more restrictive than the GPL license. I would classify Affero as copyleft.

The Craziest F***ing Bug I’ve Ever Seen

This afternoon, I was telling a friend about one of my exploits tracking down a pretty crazy heisenbug, and he said he thought other people would be interested in hearing about it. So let me tell you about it.

Before you continue, if you’re not interested in relatively arcane technical details, feel free to skip this post. It’s here mainly because a friend said he thought people would be interested in it.

Our Story Begins

Our story begins with a small change in the way Ruby 1.9 handles implicit coercion. The most common case of implicit coercion is in Array#flatten. The basic logic is that Ruby checks to see if an element of the Array can be itself coerced into an Array, which is then does before flattening.

There are two steps to the process:

  1. Check to see if the element can be coerced into an Array
  2. If it can, call to_ary on the element, and repeat the process recursively

In Ruby 1.8, the process is essentially the following:

if obj.respond_to?(:to_ary)
  obj.__send__(:to_ary)
else
  obj
end

In Ruby 1.9, it was changed to:

begin
  obj.__send__(:to_ary)
rescue NoMethodError
  obj
end

Of course, the internal code was implemented in C, but you get the idea. This change subtly effects objects that implement method_missing:

class MyObject
  def method_missing(meth, *)
    if meth == :testing
      puts "TESTING"
    else
      puts "Calling Super"
      super
    end
  end
end

In Ruby 1.8, #flatten will first call #respond_to? which returns false and therefore doesn’t ever trigger the #method_missing. In Ruby 1.9, #method_missing will be triggered blindly, and because the call to super should raise a NoMethodError, we will essentially get the same result.

There are some subtle differences here, but for the vast majority of cases, the behavior is identical or close enough to not matter.

A Weird Quirk

When the above change landed in Ruby 1.9.2′s head, Rails started experiencing weird, intermittent behavior. As you might know, Rails overrides method_missing on NilClass to provide a guess about what object you were expecting instead of nil. This feature is called “whiny nils” and can be enabled or disabled in Rails applications.

def method_missing(method, *args, &block)
  if klass = METHOD_CLASS_MAP[method]
    raise_nil_warning_for klass, method, caller
  else
    super
  end
end

But sometimes, when nil was inside an Array, and the Array was flattened, the flatten failed with a NameError: undefined local variable or method `to_ary' for nil:NilClass, which resulted in a failure in the flatten entirely.

The bug appeared intermittently, apparently unrelated to the code that was calling it. We worked around it by catching the NameError and reraising a NoMethodError, but the existence of the bug was rather baffling.

The Bug Reappears

When working on bundler, Carl and I saw the bug again. This time, we didn’t want to let it go. In this case, we had four virtually identical tests with four identical stack traces leading to NoMethodError or NameError seemingly at random. Something didn’t add up.

After hunting bugs for a few hours (having dug deeply into Ruby’s source), I brought in Evan Phoenix (of Rubinius). At first, Evan was baffled as well, but he correctly pointed out that the key to understanding what was going on was the difference between NameError and NoMethodError in Ruby.

x = Object.new
x.no_method #=> NoMethodError
 
class Testing
  def vcall_no_method
    no_method #=> NameError
  end
 
  def call_no_method
    self.no_method #=> NoMethodError
  end
end

In essence, when Ruby sees a method call with implicit self (which might be a typo’ed local variable), it raises a NameError, rather than a NoMethodError.

However, it still didn’t add up, because the call to to_ary was internal, and shouldn’t have been randomly interpreted as a vcall (a call to a method with implicit self) or a normal call.

Tracking it Down

Evan dug deeply into the Ruby source, and found how Ruby was determining whether the call was a vcall or a regular call.

static inline VALUE
method_missing(VALUE obj, ID id, int argc, const VALUE *argv, int call_status)
{
    VALUE *nargv, result, argv_ary = 0;
    rb_thread_t *th = GET_THREAD();
 
    th->method_missing_reason = call_status;
    th->passed_block = 0;
    // ...
}

Before calling the method_missing method itself, Ruby sets a thread-local variable called call_status that reflects whether or not the original call was a vcall or a normal call.

Upon further examination, Evan discovered that the call to method_missing in the case of type coercion did not change the thread-local method_missing_reason.

As a result, Ruby raised a NoMethodError or NameError based upon whether the last call to method_missing was a vcall or regular call.

The fix was a simple one-line patch, which allowed us to remove the workaround in ActiveSupport, but it was by far the craziest f***ing bug I’ve ever seen

Spinning up a new Rails app

So people have been attempting to get a Rails app up and running recently. I also have some apps in development on Rails 3, so I’ve been experiencing some of the same problems many others have.

The other night, I worked with sferik to start porting merb-admin over to Rails. Because this process involved being on edge Rails, we got the process honed to a very simple, small, repeatable process.

The Steps

Step 1: Check out Rails

$ git clone git://github.com/rails/rails.git

Step 2: Generate a new app

$ ruby rails/railties/bin/rails new_app
$ cd new_app

Step 3: Edit the app’s Gemfile

# Add to the top
directory "/path/to/rails", :glob => "{*/,}*.gemspec"
git "git://github.com/rails/arel.git"
git "git://github.com/rails/rack.git"

Step 4: Bundle

$ gem bundle

Done

Everything should now work: script/server, script/console, etc.

If you want to check your copy of Rails into your app, you can copy it into the app and then change your Gemfile to point to the relative location.

For instance, if you copy it into vendor/rails, you can make the first line of the Gemfile directory "vendor/rails", :glob => => "{*/,}*.gemspec". You’ll want to run gem bundle again after changing the Gemfile, of course.

The Rails 3 Router: Rack it Up

In my previous post about generic actions in Rails 3, I made reference to significant improvements in the router. Some of those have been covered on other blogs, but the full scope of the improvements hasn’t yet been covered.

In this post, I’ll cover a number of the larger design decisions, as well as specific improvements that have been made. Most of these features were in the Merb router, but the Rails DSL is more fully developed, and the fuller emphasis on Rack is a strong improvement from the Merb approach.

Improved DSL

While the old map.connect DSL still works just fine, the new standard DSL is less verbose and more readable.

# old way
ActionController::Routing::Routes.draw do |map|
  map.connect "/main/:id", :controller => "main", :action => "home"
end
 
# new way
Basecamp::Application.routes do
  match "/main/:id", :to => "main#home"
end

First, the routes are attached to your application, which is now its own object and used throughout Railties. Second, we no longer need map, and the new DSL (match/to) is more expressive. Finally, we have a shortcut for controller/action pairs ("main#home" is {:controller => "main", :action => "home").

Another useful shortcut allows you to specify the method more simply than before:

Basecamp::Application.routes do
  post "/main/:id", :to => "main#home", :as => :homepage
end

The :as in the above example specifies a named route, and creates the homepage_url et al helpers as in Rails 2.

Rack It Up

When designing the new router, we all agreed that it should be built first as a standalone piece of functionality, with Rails sugar added on top. As a result, we used rack-mount, which was built by Josh Peek as a standalone Rack router.

Internally, the router simply matches requests to a rack endpoint, and knows nothing about controllers or controller semantics. Essentially, the router is designed to work like this:

Basecamp::Application.routes do
  match "/home", :to => HomeApp
end

This will match requests with the /home path, and dispatches them to a valid Rack application at HomeApp. This means that dispatching to a Sinatra app is trivial:

class HomeApp < Sinatra::Base
  get "/" do
    "Hello World!"
  end
end
 
Basecamp::Application.routes do
  match "/home", :to => HomeApp
end

The one small piece of the puzzle that might have you wondering at this point is that in the previous section, I showed the usage of :to => "main#home", and now I say that :to takes a Rack application.

Another improvement in Rails 3 bridges this gap. In Rails 3, PostsController.action(:index) returns a fully valid Rack application pointing at the index action of PostsController. So main#home is simply a shortcut for MainController.action(:home), and it otherwise is identical to providing a Sinatra application.

As I posted before, this is also the engine behind match "/foo", :to => redirect("/bar").

Expanded Constraints

Probably the most common desired improvement to the Rails 2 router has been support for routing based on subdomains. There is currently a plugin called subdomain_routes that implements this functionality as follows:

ActionController::Routing::Routes.draw do |map|
  map.subdomain :support do |support|
    support.resources :tickets
    ...
  end
end

This solves the most common case, but the reality is that this is just one common case. In truth, it should be possible to constrain routes based not just on path segments, method, and subdomain, but also based on any element of the request.

The Rails 3 router exposes this functionality. Here is how you would constrain requests based on subdomains in Rails 3:

Basecamp::Application.routes do
  match "/foo/bar", :to => "foo#bar", :constraints => {:subdomain => "support"}
end

These constraints can include path segments as well as any method on ActionDispatch::Request. You could use a String or a regular expression, so :constraints => {:subdomain => /support\d/} would be valid as well.

Arbitrary constraints can also be specified in block form, as follows:

Basecamp::Application.routes do
  constraints(:subdomain => "support") do
    match "/foo/bar", :to => "foo#bar"
  end
end

Finally, constraints can be specified as objects:

class SupportSubdomain
  def self.matches?(request)
    request.subdomain == "support"
  end
end
 
Basecamp::Application.routes do
  constraints(SupportSubdomain) do
    match "/foo/bar", :to => "foo#bar"
  end
end

Optional Segments

In Rails 2.3 and earlier, there were some optional segments. Unfortunately, they were hardcoded names and not controllable. Since we’re using a generic router, magical optional segment names and semantics would not do. And having exposed support for optional segments in Merb was pretty nice. So we added them.

# Rails 2.3
ActionController::Routing::Routes.draw do |map|
  # Note that :action and :id are optional, and
  # :format is implicit
  map.connect "/:controller/:action/:id"
end
 
# Rails 3
Basecamp::Application.routes do
  # equivalent
  match "/:controller(/:action(/:id))(.:format)"
end

In Rails 3, we can be explicit about the optional segments, and even nest optional segments. If we want the format to be a prefix path, we can do match "(/:format)/home" and the format is optional. We can use a similar technique to add an optional company ID prefix or a locale.

Pervasive Blocks

You may have noticed this already, but as a general rule, if you can specify something as an inline condition, you can also specify it as a block constraint.

Basecamp::Application.routes do
  controller :home do
    match "/:action"
  end
end

In the above example, we are not required to specify the controller inline, because we specified it via a block. You can use this for subdomains, controller restrictions, request method (get etc. take a block). There is also a scope method that can be used to scope a block of routes under a top-level path:

Basecamp::Application.routes do
  scope "/home" do
    match "/:action", :to => "homepage"
  end
end

The above route would match /home/hello/foo to homepage#foo.

Closing

There are additional (substantial) improvements around resources, which I will save for another time, assuming someone else doesn’t get to it first.

Generic Actions in Rails 3

So Django has an interesting feature called “generic views”, which essentially allow you to to render a template with generic code. In Rails, the same feature would be called “generic actions” (just a terminology difference).

This was possible, but somewhat difficult in Rails 2.x, but it’s a breeze in Rails 3.

Let’s take a look at a simple generic view in Django, the “redirect_to” view:

urlpatterns = patterns('django.views.generic.simple',
    ('^foo/(?P<id>\d+)/$', 'redirect_to', {'url': '/bar/%(id)s/'}),
)

This essentially redirects "/foo/<id>" to "/bar/<id>s/". In Rails 2.3, a way to achieve equivalent behavior was to create a generic controller that handled this:

class GenericController < ApplicationController
  def redirect
    redirect_to(params[:url] % params, params[:options])
  end
end

And then you could use this in your router:

map.connect "/foo/:id", :controller => "generic", :action => "redirect", :url => "/bar/%{id}s"

This uses the new Ruby 1.9 interpolation syntax (“%{first} %{last}” % {:foo => “hello”, :bar => “sir”} == “hello sir”) that has been backported to Ruby 1.8 via ActiveSupport.

Better With Rails 3

However, this is a bit clumsy, and requires us to have a special controller to handle this (relatively simple) case. It also saddles us with the conceptual overhead of a controller in the router itself.

Here’s how you do the same thing in Rails 3:

match "/foo/:id", :to => redirect("/bar/%{id}s")

This is built-into Rails 3′s router, but the way it works is actually pretty cool. The Rails 3 router is conceptually decoupled from Rails itself, and the :to key points at a Rack endpoint. For instance, the following would be a valid route in Rails 3:

match "/foo", :to => proc {|env| [200, {}, ["Hello world"]] }

The redirect method simply returns a rack endpoint that knows how to handle the redirection:

def redirect(*args, &block)
  options = args.last.is_a?(Hash) ? args.pop : {}
 
  path = args.shift || block
  path_proc = path.is_a?(Proc) ? path : proc {|params| path % params }
  status = options[:status] || 301
 
  lambda do |env|
    req = Rack::Request.new(env)
    params = path_proc.call(env["action_dispatch.request.path_parameters"])
    url = req.scheme + '://' + req.host + params
    [status, {'Location' => url, 'Content-Type' => 'text/html'}, ['Moved Permanently']]
  end
end

There’s a few things going on here, but the important part is the last few lines, where the redirect method returns a valid Rack endpoint. If you look closely at the code, you can see that the following would be valid as well:

match "/api/v1/:api", :to => 
  redirect {|params| "/api/v2/#{params[:api].pluralize}" }
 
# and
 
match "/api/v1/:api", :to => 
  redirect(:status => 302) {|params| "/api/v2/#{params[:api].pluralize}" }

Another Generic Action

Another nice generic action that Django provides is allowing you to render a template directly without needing an explicit action. It looks like this:

urlpatterns = patterns('django.views.generic.simple',
    (r'^foo/$',             'direct_to_template', {'template': 'foo_index.html'}),
    (r'^foo/(?P<id>\d+)/$', 'direct_to_template', {'template': 'foo_detail.html'}),
)

This provides a special mechanism for rendering a template directly from the Django router. Again, this could be implemented by creating a special controller in Rails 2 and used as follows:

class GenericController < ApplicationController
  def direct_to_template
    render(params[:options])
  end
end
 
# Router
map.connect "/foo", :controller => "generic", :action => "direct_to_template", :options => {:template => "foo_detail"}

A Prettier API

A nicer way to do this would be something like this:

match "/foo", :to => render("foo")

For the sake of clarity, let’s say that directly rendered templates will come out of app/views/direct unless otherwise specified. Also, let’s say that the render method should work identically to the render method used in Rails controllers themselves, so that render :template => "foo", :status => 201, :content_type => Mime::JSON et al will work as expected.

In order to make this work, we’ll use ActionController::Metal, which exposes a Rack-compatible object with access to all of the powers of a full ActionController::Base object.

class RenderDirectly < ActionController::Metal
  include ActionController::Rendering
  include ActionController::Layouts
 
  append_view_path Rails.root.join("app", "views", "direct")
  append_view_path Rails.root.join("app", "views")
 
  layout "application"
 
  def index
    render *env["generic_views.render_args"]
  end
end
 
module GenericActions
  module Render
    def render(*args)
      app = RenderDirectly.action(:index)
      lambda do |env|
        env["generic_views.render_args"] = args
        app.call(env)
      end
    end
  end
end

The trick here is that we’re subclassing ActionController::Metal and pulling in just Rendering and Layouts, which gives you full access to the normal rendering API without any of the other overhead of normal controllers. We add both the direct directory and the normal view directory to the view path, which means that any templates you place inside app/views/direct will take be used first, but it’ll fall back to the normal view directory for layouts or partials. We also specify that the layout is application, which is not the default in Rails 3 in this case since our metal controller does not inherit from ApplicationController.

Note for the Curious

In all normal application cases, Rails will look up the inheritance chain for a named layout matching the controller name. This means that the Rails 2 behavior, which allows you to provide a layout named after the controller, still works exactly the same as before, and that ApplicationController is just another controller name, and application.html.erb is its default layout.

And then, the actual use in your application:

Rails.application.routes do
  extend GenericActions
 
  match "/foo", :to => render("foo_index")
  # match "/foo" => render("foo_index") is a valid shortcut for the simple case
  match "/foo/:id", :constraints => {:id => /\d+/}, :to => render("foo_detail")
end

Of course, because we’re using a real controller shell, you’ll be able to use any other options available on the render (like :status, :content_type, :location, :action, :layout, etc.).

Metaprogramming in Ruby: It’s All About the Self

After writing my last post on Rails plugin idioms, I realized that Ruby metaprogramming, at its core, is actually quite simple.

It comes down to the fact that all Ruby code is executed code–there is no separate compile or runtime phase. In Ruby, every line of code is executed against a particular self. Consider the following five snippets:

class Person
  def self.species
    "Homo Sapien"
  end
end
 
class Person
  class << self
    def species
      "Homo Sapien"
    end
  end
end
 
class << Person
  def species
    "Homo Sapien"
  end
end
 
Person.instance_eval do
  def species
    "Homo Sapien"
  end
end
 
def Person.species
  "Homo Sapien"
end

All five of these snippets define a Person.species that returns Homo Sapien. Now consider another set of snippets:

class Person
  def name
    "Matz"
  end
end
 
Person.class_eval do
  def name
    "Matz"
  end
end

These snippets all define a method called name on the Person class. So Person.new.name will return “Matz”. For those familiar with Ruby, this isn’t news. When learning about metaprogramming, each of these snippets is presented in isolation: another mechanism for getting methods where they “belong”. In fact, however, there is a single unified reason that all of these snippets work the way they do.

First, it is important to understand how Ruby’s metaclass works. When you first learn Ruby, you learn about the concept of the class, and that each object in Ruby has one:

class Person
end
 
Person.class #=> Class
 
class Class
  def loud_name
    "#{name.upcase}!"
  end
end
 
Person.loud_name #=> "PERSON!"

Person is an instance of Class, so any methods added to Class are available on Person as well. What they don’t tell you, however, is that each object in Ruby also has its own metaclass, a Class that can have methods, but is only attached to the object itself.

matz = Object.new
def matz.speak
  "Place your burden to machine's shoulders"
end

What’s going on here is that we’re adding the speak method to matz‘s metaclass, and the matz object inherits from its metaclass and then Object. The reason this is somewhat less clear than ideal is that the metaclass is invisible in Ruby:

matz = Object.new
def matz.speak
  "Place your burden to machine's shoulders"
end
 
matz.class #=> Object

In fact, matz‘s “class” is its invisible metaclass. We can even get access to the metaclass:

metaclass = class << matz; self; end
metaclass.instance_methods.grep(/speak/) #=> ["speak"]

At this point in other articles on this topic, you’re probably struggling to keep all of the details in your head; it seems as though there are so many rules. And what’s this class << matz thing anyway?

It turns out that all of these weird rules collapse down into a single concept: control over the self in a given part of the code. Let’s go back and take a look at some of the snippets we looked at earlier:

class Person
  def name
    "Matz"
  end
 
  self.name #=> "Person"
end

Here, we are adding the name method to the Person class. Once we say class Person, the self until the end of the block is the Person class itself.

Person.class_eval do
  def name
    "Matz"
  end
 
  self.name #=> "Person"
end

Here, we’re doing exactly the same thing: adding the name method to instances of the Person class. In this case, class_eval is setting the self to Person until the end of the block. This is all perfectly straight forward when dealing with classes, but it’s equally straight forward when dealing with metaclasses:

def Person.species
  "Homo Sapien"
end
 
Person.name #=> "Person"

As in the matz example earlier, we are defining the species method on Person‘s metaclass. We have not manipulated self, but you can see using def with an object attaches the method to the object’s metaclass.

class Person
  def self.species
    "Homo Sapien"
  end
 
  self.name #=> "Person"
end

Here, we have opened the Person class, setting the self to Person for the duration of the block, as in the example above. However, we are defining a method on Person‘s metaclass here, since we’re defining the method on an object (self). Also, you can see that self.name while inside the person class is identical to Person.name while outside it.

class << Person
  def species
    "Homo Sapien"
  end
 
  self.name #=> ""
end

Ruby provides a syntax for accessing an object’s metaclass directly. By doing class << Person, we are setting self to Person‘s metaclass for the duration of the block. As a result, the species method is added to Person‘s metaclass, rather than the class itself.

class Person
  class << self
    def species
      "Homo Sapien"
    end
 
    self.name #=> ""
  end
end

Here, we combine several of the techniques. First, we open Person, making self equal to the Person class. Next, we do class << self, making self equal to Person‘s metaclass. When we then define the species method, it is defined on Person‘s metaclass.

Person.instance_eval do
  def species
    "Homo Sapien"
  end
 
  self.name #=> "Person"
end

The last case, instance_eval, actually does something interesting. It breaks apart the self into the self that is used to execute methods and the self that is used when new methods are defined. When instance_eval is used, new methods are defined on the metaclass, but the self is the object itself.

In some of these cases, the multiple ways to achieve the same thing arise naturally out of Ruby’s semantics. After this explanation, it should be clear that def Person.species, class << Person; def species, and class Person; class << self; def species aren’t three ways to achieve the same thing by design, but that they arise out of Ruby’s flexibility with regard to specifying what self is at any given point in your program.

On the other hand, class_eval is slightly different. Because it take a block, rather than act as a keyword, it captures the local variables surrounding it. This can provide powerful DSL capabilities, in addition to controlling the self used in a code block. But other than that, they are exactly identical to the other constructs used here.

Finally, instance_eval breaks apart the self into two parts, while also giving you access to local variables defined outside of it.

In the following table, defines a new scope means that code inside the block does not have access to local variables outside of the block.

mechanism method resolution method definition new scope?
class Person Person same yes
class << Person Person’s metaclass same yes
Person.class_eval Person same no
Person.instance_eval Person Person’s metaclass no

Also note that class_eval is only available to Modules (note that Class inherits from Module) and is an alias for module_eval. Additionally, instance_exec, which was added to Ruby in 1.8.7, works exactly like instance_eval, except that it also allows you to send variables into the block.

UPDATE: Thank you to Yugui of the Ruby core team for correcting the original post, which ignored the fact that self is broken into two in the case of instance_eval.

Better Ruby Idioms

Carl and I have been working on the plugins system over the past few days. As part of that process, we read through the Rails Plugin Guide. While reading through the guide, we noticed a number of idioms presented in the guide that are serious overkill for the task at hand.

I don’t blame the author of the guide; the idioms presented are roughly the same that have been used since the early days of Rails. However, looking at them brought back memories of my early days using Rails, when the code made me feel as though Ruby was full of magic incantations and ceremony to accomplish relatively simple things.

Here’s an example:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    # any method placed here will apply to classes, like Hickwall
    def acts_as_something
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    # any method placed here will apply to instaces, like @hickwall
  end
end

To begin with, the send is completely unneeded. The acts_as_something method will be run on the Class itself, giving the method access to include, a private method.

This code intended to be used as follows:

class ActiveRecord::Base
  include Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

What the code does is:

  1. Register a hook so that when the module is included, the ClassMethods are extended onto the class
  2. In that module, define a method that includes the InstanceMethods
  3. So that you can say acts_as_something in your code

The crazy thing about all of this is that it’s completely reinventing the module system that Ruby already has. This would be exactly identical:

module Yaffle
  # any method placed here will apply to classes, like Hickwall
  def acts_as_something
    send :include, InstanceMethods
  end
 
  module InstanceMethods
    # any method placed here will apply to instances, like @hickwall
  end
end

To be used via:

class ActiveRecord::Base
  extend Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

In a nutshell, there’s no point in overriding include to behave like extend when Ruby provides both!

To take this a bit further, you could do:

module Yaffle
  # any method placed here will apply to instances, like @hickwall, 
  # because that's how modules work!
end

To be used via:

class Article < ActiveRecord::Base
  include Yaffle
end

In effect, the initial code (override included hook to extend a method on, which then includes a module) is two layers of abstraction around a simple Ruby include!

Let’s look at a few more examples:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to behave like extend (instead of just using extend!).

A better solution:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = options[:yaffle_text_field].to_s || "last_squawk"
  end
end
 
ActiveRecord::Base.extend Yaffle

In this case, it’s appropriate to use an acts_as_yaffle, since you’re providing additional options which could not be encapsulated using the normal Ruby extend.

Another “more advanced” case:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to pretend to be an extend, and a send where it is not needed. Identical functionality:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include InstanceMethods
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.extend Yaffle

Of course, it is also possible to do:

module Yaffle
  def squawk(string)
    write_attribute(self.class.yaffle_text_field, string.to_squawk)
  end
end
 
class ActiveRecord::Base
  def self.acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include Yaffle
  end
end

Since the module is always included in ActiveRecord::Base, there is no reason that the earlier code, with its additional modules and use of extend, is superior to simply reopening the class and adding the acts_as_yaffle method directly. Now we can put the squawk method directly inside the Yaffle module, where it can be included cleanly.

It may not seem like a huge deal, but it significantly reduces the amount of apparent magic in the plugin pattern, making it more accessible for new users. Additionally, it exposes the new user to include and extend quickly, instead of making them feel as though they were magic incantations requiring the use of send and special modules named ClassMethods in order to get them to work.

To be clear, I’m not saying that these idioms aren’t sometimes needed in special, advanced cases. However, I am saying that in the most common cases, they’re huge overkill that obscures the real functionality and confuses users.

Using the New Gem Bundler Today

As you might have heard, Carl and I released a new project that allows you to bundle your gems (both pure-ruby and native) with your application. Before I get into the process for using the bundler today, I’d like to go into the design goals of the project.

  • The bundler should allow the specification of all dependencies in a separate place from the application itself. In other words, it should be possible to determine the dependencies for an application without needing to start up the application.
  • The bundler should have a built-in dependency resolving mechanism, so it can determine the required gems for an entire set of dependencies.
  • Once the dependencies are resolved, it should be possible to get the application up and running on a new system without needing to check Rubyforge (or gemcutter) again. This is especially important for compiled gems (it should be possible to get the list of required gems once and compile on remote systems as desired).
  • Above all else, the bundler should provide a reproducible installation of Ruby applications. New gem releases or down remote servers should not be able to impact the successful installation of an application. In most cases, git clone; gem bundle should be all that is needed to get an application on a new system and up and running.
  • Finally, the bundler should not assume anything about Rails applications. While it should work flawlessly in the context of a Rails application, this should be because a Rails application is a Ruby application.

Using the Bundler Today

To use the gem bundler today in a non-Rails application, follow the following steps:

  1. gem install bundler
  2. Create a Gemfile in the root of your application
  3. Add dependencies to your Gemfile. See below for more details on the sorts of things you can do in the Gemfile. At the simplest level, gem "gem_name", "version" will add a dependency of the gem and version to your application
  4. At the root, run gem bundle. The bundler should tell you that it is resolving dependencies, then downloading and installing the gems.
  5. Add vendor/gems/gems, vendor/gems/specifications, vendor/gems/doc, and vendor/gems/environment.rb to your .gitignore file
  6. Inside your application, require vendor/gems/environment.rb to add the bundled dependencies to your load paths.
  7. Use Bundler.require_env :optional_environment to actually require the files.
  8. After committing, run gem bundle in a fresh clone to re-expand the gems. Since you left the vendor/gems/cache in your source control, new machines will be guaranteed to use the same files as the original machine, requiring no remote dependencies

The bundler will also install binaries into the app’s bin directory. You can, therefore, run bin/rackup for instance, which will ensure that the local bundle, rather than the system, is used. You can also run gem exec rackup, which runs any command using the local bundle. This allows things like gem exec ruby -e "puts Nokogiri::VERSION" or the even more adventurous gem exec bash, which will open a new shell in the context of the bundle.

Gemfile

You can do any of the following in the Gemfile:

  • gem "name", "version": version may be a strict version or a version requirement like &gt;= 1.0.6. The version is optional.
  • gem "name", "version", :require_as =&gt; "file": the require_as allows you to specify which file should be required when the require_env is called. By default, it is the gem’s name
  • gem "name", "version", :only =&gt; :testing: The environment name can be anything. It is used later in your require_env call. You may specify either :only, or :except constraints
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor": Specify a git repository to be used to satisfy the dependency. You must use a hard dependency (“1.0.6″) rather than a soft dependency (“>= 1.0.6″). If a .gemspec is found in the repository, it is used for further dependency lookup. If the repository has multiple .gemspecs, each directory will a .gemspec will be considered a gem.
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor", :branch =&gt; "experimental": Further specify a branch, tag, or ref to use. All of :branch, :tag, and :ref are valid options
  • gem "name", "version", :vendored_at =&gt; "vendor/nokogiri": In the next version of bundler, this option will be changing to :path. This specifies that the dependency can be found in the local file system, rather than remotely. It is resolved relative to the location of the Gemfile
  • clear_sources: Empties the list of gem sources to search inside of.
  • source "http://gems.github.com": Adds a gem source to the list of available gem sources.
  • bundle_path "vendor/my_gems": Changes the default location of bundled gems from vendor/gems
  • bin_path "my_executables": Changes the default location of the installed executables
  • disable_system_gems: Without this command, both bundled gems and system gems will be used. You can therefore have things like ruby-debug in your system and use it. However, it also means that you may be using something in development mode that is installed on your system but not available in production. For this reason, it is best to disable_system_gems
  • disable_rubygems: This completely disables rubygems, reducing startup times considerably. However, it often doesn’t work if libraries you are using depend on features of Rubygems. In this mode, the bundler shims the features of Rubygems that we know people are using, but it’s possible that someone is using a feature we’re unaware of. You are free to try disable_rubygems first, then remove it if it doesn’t work. Note that Rails 2.3 cannot be made to work in this mode
  • only :environment { gem "rails" }: You can use only or except in block mode to specify a number of gems at once

Bundler process

When you run gem bundle, a few things happen. First, the bundler attempts to resolve your list of dependencies against the gems you have already bundled. If they don’t resolve, the metadata for each specified source is fetched and the gems are downloaded. Next (either way), the bundler checks to see whether the downloaded gems are expanded. For any gem that is not yet expanded, the bundler expands it. Finally, the bundler creates the environment.rb file with the new settings. This means that running gem bundler over and over again will be extremely fast, because after the first time, all gems are downloaded and expanded. If you change settings, like disable_rubygems, running gem bundle again will simply regenerate the environment.rb.

Rails 2.3

To get this working with Rails 2.3, you need to create a preinitializer.rb and insert the following:

require "#{File.dirname(__FILE__)}/../vendor/bundler_gems/environment"
 
class Rails::Boot
  def run
    load_initializer
    extend_environment
    Rails::Initializer.run(:set_load_path)
  end
 
  def extend_environment
    Rails::Initializer.class_eval do
      old_load = instance_method(:load_environment)
      define_method(:load_environment) do
        Bundler.require_env RAILS_ENV
        old_load.bind(self).call
      end
    end
  end
end

It’s a bit ugly, but you can copy and paste that code and forget it. Astute readers will notice that we’re using vendor/bundler_gems/environment.rb. This is because Rails 2.3 attaches special, irrevocable meaning to vendor/gems. As a result, make sure to do the following in your Gemfile: bundle_path "vendor/bundler_gems".

Gemcutter uses this setup and it’s working great for them.

Bundler 0.7

We’re going to be releasing Bundler 0.7 tomorrow. It has some new features:

  • List outdated gems by passing --outdated-gems. Bundler conservatively does not update your gems simply because a new version came out that satisfies the requirement. This is so that you can be sure that the versions running on your local machine will make it safely to production. This will allow you to check for outdated gems so you can decide whether to update your gems with –update. Hat tip to manfred, who submitted this patch as part of his submission to the Rumble at Ruby en Rails
  • Specify the build requirements for gems in a YAML file that you specify with --build-options. The file looks something like this:
    mysql:
      config: /path/to/mysql_config

    This is equivalent to –with-mysql-config=/path/to/mysql_config

  • Specify :bundle =&gt; false to indicate that you want to use system gems for a particular dependency. This will ensure that it gets resolved correctly during dependency resolution but does not need to be included in the bundle
  • Support for multiple bundles containing multiple platforms. This is especially useful for people moving back and forth between Ruby 1.8 and 1.9 and don’t want to constantly have to nuke and rebundle
  • Deprecate :vendored_at and replace with :path
  • A new directory DSL method in the Gemfile:
    directory "vendor/rails" do
      gem "activesupport", :path =&gt; "activesupport" # :path is optional if it's identical to the gem name
                                                    # the version is optional if it can be determined from
                                                    # a gemspec
    end
  • You can do the same with the git DSL method
    git "git://github.com/rails/rails.git" do # :branch, :tag, or :ref work here
      gem "activesupport", :path =&gt; "activesupport" # same rules as directory, except that the files are
                                                    # first downloaded from git.
    end
  • Fix some bugs in resolving prerelease dependencies

Archives

Categories

Blogroll

Meta