Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor.

@gmcintire Passenger may be setting GEM_HOME differently -- can you print out ENV["GEM_HOME"] in passenger?

Archive for the ‘Ruby on Rails’ Category

Threads (in Ruby): Enough Already

For a while now, the Ruby community has become enamored in the latest new hotness, evented programming and Node.js. It’s gone so far that I’ve heard a number of prominent Rubyists saying that JavaScript and Node.js are the only sane way to handle a number of concurrent users.

I should start by saying that I personally love writing evented JavaScript in the browser, and have been giving talks (for years) about using evented JavaScript to sanely organize client-side code. I think that for the browser environment, events are where it’s at. Further, I don’t have any major problem with Node.js or other ways of writing server-side evented code. For instance, if I needed to write a chat server, I would almost certainly write it using Node.js or EventMachine.

However, I’m pretty tired of hearing that threads (and especially Ruby threads) are completely useless, and if you don’t use evented code, you may as well be using a single process per concurrent user. To be fair, this has somewhat been the party line of the Rails team years ago, but Rails has been threadsafe since Rails 2.2, and Rails users have been taking advantage of it for some time.

Before I start, I should be clear that this post is talking about requests that spent a non-tiny amount of their time utilizing the CPU (normal web requests), even if they do spend a fair amount of time in blocking operations (disk IO, database). I am decidedly not talking about situations, like chat servers where requests sit idle for huge amounts of time with tiny amounts of intermittent CPU usage.

Threads and IO Blocking

I’ve heard a common misperception that Ruby inherently “blocks” when doing disk IO or making database queries. In reality, Ruby switches to another thread whenever it needs to block for IO. In other words, if a thread needs to wait, but isn’t using any CPU, Ruby’s built-in methods allow another waiting thread to use the CPU while the original thread waits.

If every one of your web requests uses the CPU for 30% of the time, and waits for IO for the rest of the time, you should be able to serve three requests in parallel, coming close to maxing out your CPU.

Here’s a couple of diagrams. The first shows how people imagine requests work in Ruby, even in threadsafe mode. The second is how an optimal Ruby environment will actually operate. This example is extremely simplified, showing only a few parts of the request, and assuming equal time spent in areas that are not necessarily equal.


Untitled.001.png


Untitled.002.png


I should be clear that Ruby 1.8 spends too much time context-switching between its green threads. However, if you’re not switching between threads extremely often, even Ruby 1.8′s overhead will amount to a small fraction of the total time needed to serve a request. A lot of the threading benchmarks you’ll see are testing pathological cases involve huge amounts of threads, not very similar to the profile of a web server.

(if you’re thinking that there are caveats to my “optimal Ruby environment”, keep reading)

“Threads are just HARD”

Another common gripe that pushes people to evented programming is that working with threads is just too hard. Working hard to avoid sharing state and using locks where necessary is just too tricky for the average web developer, the argument goes.

I agree with this argument in the general case. Web development, on the other hand, has an extremely clean concurrency primitive: the request. In a threadsafe Rails application, the framework manages threads and uses an environment hash (one per request) to store state. When you work inside a Rails controller, you’re working inside an object that is inherently unshared. When you instantiate a new instance of an ActiveRecord model inside the controller, it is rooted to that controller, and is therefore not used between live threads.

It is, of course, possible to use global state, but the vast majority of normal, day-to-day Rails programming (and for that matter, programming in any web framework in any language with a request model) is inherently threadsafe. This means that Ruby will transparently handle switching back and forth between active requests when you do something blocking (file, database, or memcache access, for instance), and you don’t need to personally manage the problems the arise when doing concurrent programming.

This is significantly less true about applications, like chat servers, that keep open a huge number of requests. In those cases, a lot of the application logic happens outside the individual request, so you need to personally manage shared state.

Historical Ruby Issues

What I’ve been talking about so far is how stock Ruby ought to operate. Unfortunately, a group of things have historically conspired to make Ruby’s concurrency story look much worse than it actually ought to be.

Most obviously, early versions of Rails were not threadsafe. As a result, all Rails users were operating with a mutex around the entire request, forcing Rails to behave like the first “Imagined” diagram above. Annoyingly, Mongrel, the most common Ruby web server for a few years, hardcoded this mutex into its Rails handler. As a result, if you spun up Rails in “threadsafe” mode a year ago using Mongrel, you would have gotten exactly zero concurrency. Also, even in threadsafe mode (when not using the built-in Rails support) Mongrel spins up a new thread for every request, not exactly optimal.

Second, the most common database driver, mysql is a very poorly behaved C extension. While built-in I/O (file or pipe access) correctly alerts Ruby to switch to another thread when it hits a blocking region, other C extensions don’t always do so. For safety, Ruby does not allow a context switch while in C code unless the C code explicitly tells the VM that it’s ok to do so.

All of the Data Objects drivers, which we built for DataMapper, correctly cause a context switch when entering a blocking area of their C code. The mysqlplus gem, released in March 2009, was designed to be a drop-in replacement for the mysql gem, but fix this problem. The new mysql2 gem, written by Brian Lopez, is a drop-in replacement for the old gem, also correctly handles encodings in Ruby 1.9, and is the new default MySQL driver in Rails.

Because Rails shipped with the (broken) mysql gem by default, even people running on working web servers (i.e. not mongrel) in threadsafe mode would have seen a large amount of their potential concurrency eaten away because their database driver wasn’t alerting Ruby that concurrent operation was possible. With mysql2 as the default, people should see real gains on threadsafe Rails applications.

A lot of people talk about the GIL (global interpreter lock) in Ruby 1.9 as a death knell for concurrency. For the uninitiated, the GIL disallows multiple CPU cores from running Ruby code simultaneously. That does mean that you’ll need one Ruby process (or thereabouts) per CPU core, but it also means that if your multithreaded code is running correctly, you should need only one process per CPU core. I’ve heard tales of six or more processes per core. Since it’s possible to fully utilize a CPU with a single process (even in Ruby 1.8), these applications could get a 4-6x improvement in RAM usage (depending on context-switching overhead) by switching to threadsafe mode and using modern drivers for blocking operations.

JRuby, Ruby 1.9 and Rubinius, and the Future

Finally, JRuby already runs without a global interpreter lock, allowing your code to run in true parallel, and to fully utilize all available CPUs with a single JRuby process. A future version of Rubinius will likely ship without a GIL (the work has already begun), also opening the door to utilizing all CPUs with a single Ruby process.

And all modern Ruby VMs that run Rails (Ruby 1.9′s YARV, Rubinius, and JRuby) use native threads, eliminating the annoying tax that you need to pay for using threads in Ruby 1.8. Again, though, since that tax is small relative to the time for your requests, you’d likely see a non-trivial improvement in latency in applications that spend time in the database layer.

To be honest, a big part of the reason for the poor practical concurrency story in Ruby has been that the Rails project didn’t take it seriously, which it difficult to get traction for efforts to fix a part of the problem (like the mysql driver).

We took concurrency very seriously in the Merb project, leading to the development of proper database drivers for DataMapper (Merb’s ORM), and a top-to-bottom understanding of parts of the stack that could run in parallel (even on Ruby 1.8), but which weren’t. Rails 3 doesn’t bring anything new to the threadsafety of Rails itself (Rails 2.3 was threadsafe too), but by making the mysql2 driver the default, we have eliminated a large barrier to Rails applications performing well in threadsafe mode without any additional research.

UPDATE: It’s worth pointing to Charlie Nutter’s 2008 threadsafety post, where he talked about how he expected threadsafe Rails would impact the landscape. Unfortunately, the blocking MySQL driver held back some of the promise of the improvement for the vast majority of Rails users.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Spinning up a new Rails app

So people have been attempting to get a Rails app up and running recently. I also have some apps in development on Rails 3, so I’ve been experiencing some of the same problems many others have.

The other night, I worked with sferik to start porting merb-admin over to Rails. Because this process involved being on edge Rails, we got the process honed to a very simple, small, repeatable process.

The Steps

Step 1: Check out Rails

$ git clone git://github.com/rails/rails.git

Step 2: Generate a new app

$ ruby rails/railties/bin/rails new_app
$ cd new_app

Step 3: Edit the app’s Gemfile

# Add to the top
directory "/path/to/rails", :glob => "{*/,}*.gemspec"
git "git://github.com/rails/arel.git"
git "git://github.com/rails/rack.git"

Step 4: Bundle

$ gem bundle

Done

Everything should now work: script/server, script/console, etc.

If you want to check your copy of Rails into your app, you can copy it into the app and then change your Gemfile to point to the relative location.

For instance, if you copy it into vendor/rails, you can make the first line of the Gemfile directory "vendor/rails", :glob => => "{*/,}*.gemspec". You’ll want to run gem bundle again after changing the Gemfile, of course.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

The Rails 3 Router: Rack it Up

In my previous post about generic actions in Rails 3, I made reference to significant improvements in the router. Some of those have been covered on other blogs, but the full scope of the improvements hasn’t yet been covered.

In this post, I’ll cover a number of the larger design decisions, as well as specific improvements that have been made. Most of these features were in the Merb router, but the Rails DSL is more fully developed, and the fuller emphasis on Rack is a strong improvement from the Merb approach.

Improved DSL

While the old map.connect DSL still works just fine, the new standard DSL is less verbose and more readable.

# old way
ActionController::Routing::Routes.draw do |map|
  map.connect "/main/:id", :controller => "main", :action => "home"
end
 
# new way
Basecamp::Application.routes do
  match "/main/:id", :to => "main#home"
end

First, the routes are attached to your application, which is now its own object and used throughout Railties. Second, we no longer need map, and the new DSL (match/to) is more expressive. Finally, we have a shortcut for controller/action pairs ("main#home" is {:controller => "main", :action => "home").

Another useful shortcut allows you to specify the method more simply than before:

Basecamp::Application.routes do
  post "/main/:id", :to => "main#home", :as => :homepage
end

The :as in the above example specifies a named route, and creates the homepage_url et al helpers as in Rails 2.

Rack It Up

When designing the new router, we all agreed that it should be built first as a standalone piece of functionality, with Rails sugar added on top. As a result, we used rack-mount, which was built by Josh Peek as a standalone Rack router.

Internally, the router simply matches requests to a rack endpoint, and knows nothing about controllers or controller semantics. Essentially, the router is designed to work like this:

Basecamp::Application.routes do
  match "/home", :to => HomeApp
end

This will match requests with the /home path, and dispatches them to a valid Rack application at HomeApp. This means that dispatching to a Sinatra app is trivial:

class HomeApp < Sinatra::Base
  get "/" do
    "Hello World!"
  end
end
 
Basecamp::Application.routes do
  match "/home", :to => HomeApp
end

The one small piece of the puzzle that might have you wondering at this point is that in the previous section, I showed the usage of :to => "main#home", and now I say that :to takes a Rack application.

Another improvement in Rails 3 bridges this gap. In Rails 3, PostsController.action(:index) returns a fully valid Rack application pointing at the index action of PostsController. So main#home is simply a shortcut for MainController.action(:home), and it otherwise is identical to providing a Sinatra application.

As I posted before, this is also the engine behind match "/foo", :to => redirect("/bar").

Expanded Constraints

Probably the most common desired improvement to the Rails 2 router has been support for routing based on subdomains. There is currently a plugin called subdomain_routes that implements this functionality as follows:

ActionController::Routing::Routes.draw do |map|
  map.subdomain :support do |support|
    support.resources :tickets
    ...
  end
end

This solves the most common case, but the reality is that this is just one common case. In truth, it should be possible to constrain routes based not just on path segments, method, and subdomain, but also based on any element of the request.

The Rails 3 router exposes this functionality. Here is how you would constrain requests based on subdomains in Rails 3:

Basecamp::Application.routes do
  match "/foo/bar", :to => "foo#bar", :constraints => {:subdomain => "support"}
end

These constraints can include path segments as well as any method on ActionDispatch::Request. You could use a String or a regular expression, so :constraints => {:subdomain => /support\d/} would be valid as well.

Arbitrary constraints can also be specified in block form, as follows:

Basecamp::Application.routes do
  constraints(:subdomain => "support") do
    match "/foo/bar", :to => "foo#bar"
  end
end

Finally, constraints can be specified as objects:

class SupportSubdomain
  def self.matches?(request)
    request.subdomain == "support"
  end
end
 
Basecamp::Application.routes do
  constraints(SupportSubdomain) do
    match "/foo/bar", :to => "foo#bar"
  end
end

Optional Segments

In Rails 2.3 and earlier, there were some optional segments. Unfortunately, they were hardcoded names and not controllable. Since we’re using a generic router, magical optional segment names and semantics would not do. And having exposed support for optional segments in Merb was pretty nice. So we added them.

# Rails 2.3
ActionController::Routing::Routes.draw do |map|
  # Note that :action and :id are optional, and
  # :format is implicit
  map.connect "/:controller/:action/:id"
end
 
# Rails 3
Basecamp::Application.routes do
  # equivalent
  match "/:controller(/:action(/:id))(.:format)"
end

In Rails 3, we can be explicit about the optional segments, and even nest optional segments. If we want the format to be a prefix path, we can do match "(/:format)/home" and the format is optional. We can use a similar technique to add an optional company ID prefix or a locale.

Pervasive Blocks

You may have noticed this already, but as a general rule, if you can specify something as an inline condition, you can also specify it as a block constraint.

Basecamp::Application.routes do
  controller :home do
    match "/:action"
  end
end

In the above example, we are not required to specify the controller inline, because we specified it via a block. You can use this for subdomains, controller restrictions, request method (get etc. take a block). There is also a scope method that can be used to scope a block of routes under a top-level path:

Basecamp::Application.routes do
  scope "/home" do
    match "/:action", :to => "homepage"
  end
end

The above route would match /home/hello/foo to homepage#foo.

Closing

There are additional (substantial) improvements around resources, which I will save for another time, assuming someone else doesn’t get to it first.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Generic Actions in Rails 3

So Django has an interesting feature called “generic views”, which essentially allow you to to render a template with generic code. In Rails, the same feature would be called “generic actions” (just a terminology difference).

This was possible, but somewhat difficult in Rails 2.x, but it’s a breeze in Rails 3.

Let’s take a look at a simple generic view in Django, the “redirect_to” view:

urlpatterns = patterns('django.views.generic.simple',
    ('^foo/(?P<id>\d+)/$', 'redirect_to', {'url': '/bar/%(id)s/'}),
)

This essentially redirects "/foo/<id>" to "/bar/<id>s/". In Rails 2.3, a way to achieve equivalent behavior was to create a generic controller that handled this:

class GenericController < ApplicationController
  def redirect
    redirect_to(params[:url] % params, params[:options])
  end
end

And then you could use this in your router:

map.connect "/foo/:id", :controller => "generic", :action => "redirect", :url => "/bar/%{id}s"

This uses the new Ruby 1.9 interpolation syntax (“%{first} %{last}” % {:foo => “hello”, :bar => “sir”} == “hello sir”) that has been backported to Ruby 1.8 via ActiveSupport.

Better With Rails 3

However, this is a bit clumsy, and requires us to have a special controller to handle this (relatively simple) case. It also saddles us with the conceptual overhead of a controller in the router itself.

Here’s how you do the same thing in Rails 3:

match "/foo/:id", :to => redirect("/bar/%{id}s")

This is built-into Rails 3′s router, but the way it works is actually pretty cool. The Rails 3 router is conceptually decoupled from Rails itself, and the :to key points at a Rack endpoint. For instance, the following would be a valid route in Rails 3:

match "/foo", :to => proc {|env| [200, {}, ["Hello world"]] }

The redirect method simply returns a rack endpoint that knows how to handle the redirection:

def redirect(*args, &block)
  options = args.last.is_a?(Hash) ? args.pop : {}
 
  path = args.shift || block
  path_proc = path.is_a?(Proc) ? path : proc {|params| path % params }
  status = options[:status] || 301
 
  lambda do |env|
    req = Rack::Request.new(env)
    params = path_proc.call(env["action_dispatch.request.path_parameters"])
    url = req.scheme + '://' + req.host + params
    [status, {'Location' => url, 'Content-Type' => 'text/html'}, ['Moved Permanently']]
  end
end

There’s a few things going on here, but the important part is the last few lines, where the redirect method returns a valid Rack endpoint. If you look closely at the code, you can see that the following would be valid as well:

match "/api/v1/:api", :to => 
  redirect {|params| "/api/v2/#{params[:api].pluralize}" }
 
# and
 
match "/api/v1/:api", :to => 
  redirect(:status => 302) {|params| "/api/v2/#{params[:api].pluralize}" }

Another Generic Action

Another nice generic action that Django provides is allowing you to render a template directly without needing an explicit action. It looks like this:

urlpatterns = patterns('django.views.generic.simple',
    (r'^foo/$',             'direct_to_template', {'template': 'foo_index.html'}),
    (r'^foo/(?P<id>\d+)/$', 'direct_to_template', {'template': 'foo_detail.html'}),
)

This provides a special mechanism for rendering a template directly from the Django router. Again, this could be implemented by creating a special controller in Rails 2 and used as follows:

class GenericController < ApplicationController
  def direct_to_template
    render(params[:options])
  end
end
 
# Router
map.connect "/foo", :controller => "generic", :action => "direct_to_template", :options => {:template => "foo_detail"}

A Prettier API

A nicer way to do this would be something like this:

match "/foo", :to => render("foo")

For the sake of clarity, let’s say that directly rendered templates will come out of app/views/direct unless otherwise specified. Also, let’s say that the render method should work identically to the render method used in Rails controllers themselves, so that render :template => "foo", :status => 201, :content_type => Mime::JSON et al will work as expected.

In order to make this work, we’ll use ActionController::Metal, which exposes a Rack-compatible object with access to all of the powers of a full ActionController::Base object.

class RenderDirectly < ActionController::Metal
  include ActionController::Rendering
  include ActionController::Layouts
 
  append_view_path Rails.root.join("app", "views", "direct")
  append_view_path Rails.root.join("app", "views")
 
  layout "application"
 
  def index
    render *env["generic_views.render_args"]
  end
end
 
module GenericActions
  module Render
    def render(*args)
      app = RenderDirectly.action(:index)
      lambda do |env|
        env["generic_views.render_args"] = args
        app.call(env)
      end
    end
  end
end

The trick here is that we’re subclassing ActionController::Metal and pulling in just Rendering and Layouts, which gives you full access to the normal rendering API without any of the other overhead of normal controllers. We add both the direct directory and the normal view directory to the view path, which means that any templates you place inside app/views/direct will take be used first, but it’ll fall back to the normal view directory for layouts or partials. We also specify that the layout is application, which is not the default in Rails 3 in this case since our metal controller does not inherit from ApplicationController.

Note for the Curious

In all normal application cases, Rails will look up the inheritance chain for a named layout matching the controller name. This means that the Rails 2 behavior, which allows you to provide a layout named after the controller, still works exactly the same as before, and that ApplicationController is just another controller name, and application.html.erb is its default layout.

And then, the actual use in your application:

Rails.application.routes do
  extend GenericActions
 
  match "/foo", :to => render("foo_index")
  # match "/foo" => render("foo_index") is a valid shortcut for the simple case
  match "/foo/:id", :constraints => {:id => /\d+/}, :to => render("foo_detail")
end

Of course, because we’re using a real controller shell, you’ll be able to use any other options available on the render (like :status, :content_type, :location, :action, :layout, etc.).

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Better Ruby Idioms

Carl and I have been working on the plugins system over the past few days. As part of that process, we read through the Rails Plugin Guide. While reading through the guide, we noticed a number of idioms presented in the guide that are serious overkill for the task at hand.

I don’t blame the author of the guide; the idioms presented are roughly the same that have been used since the early days of Rails. However, looking at them brought back memories of my early days using Rails, when the code made me feel as though Ruby was full of magic incantations and ceremony to accomplish relatively simple things.

Here’s an example:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    # any method placed here will apply to classes, like Hickwall
    def acts_as_something
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    # any method placed here will apply to instaces, like @hickwall
  end
end

To begin with, the send is completely unneeded. The acts_as_something method will be run on the Class itself, giving the method access to include, a private method.

This code intended to be used as follows:

class ActiveRecord::Base
  include Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

What the code does is:

  1. Register a hook so that when the module is included, the ClassMethods are extended onto the class
  2. In that module, define a method that includes the InstanceMethods
  3. So that you can say acts_as_something in your code

The crazy thing about all of this is that it’s completely reinventing the module system that Ruby already has. This would be exactly identical:

module Yaffle
  # any method placed here will apply to classes, like Hickwall
  def acts_as_something
    send :include, InstanceMethods
  end
 
  module InstanceMethods
    # any method placed here will apply to instances, like @hickwall
  end
end

To be used via:

class ActiveRecord::Base
  extend Yaffle
end
 
class Article < ActiveRecord::Base
  acts_as_yaffle
end

In a nutshell, there’s no point in overriding include to behave like extend when Ruby provides both!

To take this a bit further, you could do:

module Yaffle
  # any method placed here will apply to instances, like @hickwall, 
  # because that's how modules work!
end

To be used via:

class Article < ActiveRecord::Base
  include Yaffle
end

In effect, the initial code (override included hook to extend a method on, which then includes a module) is two layers of abstraction around a simple Ruby include!

Let’s look at a few more examples:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to behave like extend (instead of just using extend!).

A better solution:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = options[:yaffle_text_field].to_s || "last_squawk"
  end
end
 
ActiveRecord::Base.extend Yaffle

In this case, it’s appropriate to use an acts_as_yaffle, since you’re providing additional options which could not be encapsulated using the normal Ruby extend.

Another “more advanced” case:

module Yaffle
  def self.included(base)
    base.send :extend, ClassMethods
  end
 
  module ClassMethods
    def acts_as_yaffle(options = {})
      cattr_accessor :yaffle_text_field
      self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
      send :include, InstanceMethods
    end
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.send :include, Yaffle

Again, we have the idiom of overriding include to pretend to be an extend, and a send where it is not needed. Identical functionality:

module Yaffle
  def acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include InstanceMethods
  end
 
  module InstanceMethods
    def squawk(string)
      write_attribute(self.class.yaffle_text_field, string.to_squawk)
    end
  end
end
 
ActiveRecord::Base.extend Yaffle

Of course, it is also possible to do:

module Yaffle
  def squawk(string)
    write_attribute(self.class.yaffle_text_field, string.to_squawk)
  end
end
 
class ActiveRecord::Base
  def self.acts_as_yaffle(options = {})
    cattr_accessor :yaffle_text_field
    self.yaffle_text_field = (options[:yaffle_text_field] || :last_squawk).to_s
    include Yaffle
  end
end

Since the module is always included in ActiveRecord::Base, there is no reason that the earlier code, with its additional modules and use of extend, is superior to simply reopening the class and adding the acts_as_yaffle method directly. Now we can put the squawk method directly inside the Yaffle module, where it can be included cleanly.

It may not seem like a huge deal, but it significantly reduces the amount of apparent magic in the plugin pattern, making it more accessible for new users. Additionally, it exposes the new user to include and extend quickly, instead of making them feel as though they were magic incantations requiring the use of send and special modules named ClassMethods in order to get them to work.

To be clear, I’m not saying that these idioms aren’t sometimes needed in special, advanced cases. However, I am saying that in the most common cases, they’re huge overkill that obscures the real functionality and confuses users.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Using the New Gem Bundler Today

As you might have heard, Carl and I released a new project that allows you to bundle your gems (both pure-ruby and native) with your application. Before I get into the process for using the bundler today, I’d like to go into the design goals of the project.

  • The bundler should allow the specification of all dependencies in a separate place from the application itself. In other words, it should be possible to determine the dependencies for an application without needing to start up the application.
  • The bundler should have a built-in dependency resolving mechanism, so it can determine the required gems for an entire set of dependencies.
  • Once the dependencies are resolved, it should be possible to get the application up and running on a new system without needing to check Rubyforge (or gemcutter) again. This is especially important for compiled gems (it should be possible to get the list of required gems once and compile on remote systems as desired).
  • Above all else, the bundler should provide a reproducible installation of Ruby applications. New gem releases or down remote servers should not be able to impact the successful installation of an application. In most cases, git clone; gem bundle should be all that is needed to get an application on a new system and up and running.
  • Finally, the bundler should not assume anything about Rails applications. While it should work flawlessly in the context of a Rails application, this should be because a Rails application is a Ruby application.

Using the Bundler Today

To use the gem bundler today in a non-Rails application, follow the following steps:

  1. gem install bundler
  2. Create a Gemfile in the root of your application
  3. Add dependencies to your Gemfile. See below for more details on the sorts of things you can do in the Gemfile. At the simplest level, gem "gem_name", "version" will add a dependency of the gem and version to your application
  4. At the root, run gem bundle. The bundler should tell you that it is resolving dependencies, then downloading and installing the gems.
  5. Add vendor/gems/gems, vendor/gems/specifications, vendor/gems/doc, and vendor/gems/environment.rb to your .gitignore file
  6. Inside your application, require vendor/gems/environment.rb to add the bundled dependencies to your load paths.
  7. Use Bundler.require_env :optional_environment to actually require the files.
  8. After committing, run gem bundle in a fresh clone to re-expand the gems. Since you left the vendor/gems/cache in your source control, new machines will be guaranteed to use the same files as the original machine, requiring no remote dependencies

The bundler will also install binaries into the app’s bin directory. You can, therefore, run bin/rackup for instance, which will ensure that the local bundle, rather than the system, is used. You can also run gem exec rackup, which runs any command using the local bundle. This allows things like gem exec ruby -e "puts Nokogiri::VERSION" or the even more adventurous gem exec bash, which will open a new shell in the context of the bundle.

Gemfile

You can do any of the following in the Gemfile:

  • gem "name", "version": version may be a strict version or a version requirement like &gt;= 1.0.6. The version is optional.
  • gem "name", "version", :require_as =&gt; "file": the require_as allows you to specify which file should be required when the require_env is called. By default, it is the gem’s name
  • gem "name", "version", :only =&gt; :testing: The environment name can be anything. It is used later in your require_env call. You may specify either :only, or :except constraints
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor": Specify a git repository to be used to satisfy the dependency. You must use a hard dependency (“1.0.6″) rather than a soft dependency (“>= 1.0.6″). If a .gemspec is found in the repository, it is used for further dependency lookup. If the repository has multiple .gemspecs, each directory will a .gemspec will be considered a gem.
  • gem "name", "version", :git =&gt; "git://github.com/wycats/thor", :branch =&gt; "experimental": Further specify a branch, tag, or ref to use. All of :branch, :tag, and :ref are valid options
  • gem "name", "version", :vendored_at =&gt; "vendor/nokogiri": In the next version of bundler, this option will be changing to :path. This specifies that the dependency can be found in the local file system, rather than remotely. It is resolved relative to the location of the Gemfile
  • clear_sources: Empties the list of gem sources to search inside of.
  • source "http://gems.github.com": Adds a gem source to the list of available gem sources.
  • bundle_path "vendor/my_gems": Changes the default location of bundled gems from vendor/gems
  • bin_path "my_executables": Changes the default location of the installed executables
  • disable_system_gems: Without this command, both bundled gems and system gems will be used. You can therefore have things like ruby-debug in your system and use it. However, it also means that you may be using something in development mode that is installed on your system but not available in production. For this reason, it is best to disable_system_gems
  • disable_rubygems: This completely disables rubygems, reducing startup times considerably. However, it often doesn’t work if libraries you are using depend on features of Rubygems. In this mode, the bundler shims the features of Rubygems that we know people are using, but it’s possible that someone is using a feature we’re unaware of. You are free to try disable_rubygems first, then remove it if it doesn’t work. Note that Rails 2.3 cannot be made to work in this mode
  • only :environment { gem "rails" }: You can use only or except in block mode to specify a number of gems at once

Bundler process

When you run gem bundle, a few things happen. First, the bundler attempts to resolve your list of dependencies against the gems you have already bundled. If they don’t resolve, the metadata for each specified source is fetched and the gems are downloaded. Next (either way), the bundler checks to see whether the downloaded gems are expanded. For any gem that is not yet expanded, the bundler expands it. Finally, the bundler creates the environment.rb file with the new settings. This means that running gem bundler over and over again will be extremely fast, because after the first time, all gems are downloaded and expanded. If you change settings, like disable_rubygems, running gem bundle again will simply regenerate the environment.rb.

Rails 2.3

To get this working with Rails 2.3, you need to create a preinitializer.rb and insert the following:

require "#{File.dirname(__FILE__)}/../vendor/bundler_gems/environment"
 
class Rails::Boot
  def run
    load_initializer
    extend_environment
    Rails::Initializer.run(:set_load_path)
  end
 
  def extend_environment
    Rails::Initializer.class_eval do
      old_load = instance_method(:load_environment)
      define_method(:load_environment) do
        Bundler.require_env RAILS_ENV
        old_load.bind(self).call
      end
    end
  end
end

It’s a bit ugly, but you can copy and paste that code and forget it. Astute readers will notice that we’re using vendor/bundler_gems/environment.rb. This is because Rails 2.3 attaches special, irrevocable meaning to vendor/gems. As a result, make sure to do the following in your Gemfile: bundle_path "vendor/bundler_gems".

Gemcutter uses this setup and it’s working great for them.

Bundler 0.7

We’re going to be releasing Bundler 0.7 tomorrow. It has some new features:

  • List outdated gems by passing --outdated-gems. Bundler conservatively does not update your gems simply because a new version came out that satisfies the requirement. This is so that you can be sure that the versions running on your local machine will make it safely to production. This will allow you to check for outdated gems so you can decide whether to update your gems with –update. Hat tip to manfred, who submitted this patch as part of his submission to the Rumble at Ruby en Rails
  • Specify the build requirements for gems in a YAML file that you specify with --build-options. The file looks something like this:
    mysql:
      config: /path/to/mysql_config

    This is equivalent to –with-mysql-config=/path/to/mysql_config

  • Specify :bundle =&gt; false to indicate that you want to use system gems for a particular dependency. This will ensure that it gets resolved correctly during dependency resolution but does not need to be included in the bundle
  • Support for multiple bundles containing multiple platforms. This is especially useful for people moving back and forth between Ruby 1.8 and 1.9 and don’t want to constantly have to nuke and rebundle
  • Deprecate :vendored_at and replace with :path
  • A new directory DSL method in the Gemfile:
    directory "vendor/rails" do
      gem "activesupport", :path =&gt; "activesupport" # :path is optional if it's identical to the gem name
                                                    # the version is optional if it can be determined from
                                                    # a gemspec
    end
  • You can do the same with the git DSL method
    git "git://github.com/rails/rails.git" do # :branch, :tag, or :ref work here
      gem "activesupport", :path =&gt; "activesupport" # same rules as directory, except that the files are
                                                    # first downloaded from git.
    end
  • Fix some bugs in resolving prerelease dependencies
Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Simplifying Rails Block Helpers (With a Side of Rubinius)

We all know that <%= string %> emits a String in ERB. And <% string %> runs Ruby code, but does not emit a String. When starting working with Rails, you almost expect the syntax for block helpers to be:

<%= content_tag(:div) do %>
  The content
<% end %>

Why doesn’t it work that way?

It has to do with how the ERB parser works, looking at each line individually. When it sees <% %>, it evaluates the code as a line of Ruby. When it sees <%= %>, it evaluates the inside of the ERB tag, and calls to_s on it.

This:

<% form_for(@object) do %>
Stuff
<% end %>

gets effectively converted to:

form_for(@object) do
_buf << ("Stuff").to_s
end

On the other hand, this:

<%= form_for(@object) do %>
Stuff
<% end %>

gets converted to:

_buf << (form_for(@object) do).to_s
_buf << ("Stuff").to_s
end

which isn’t valid Ruby. So we use the first approach, and then let the helper itself, rather than ERB, be responsible for concatenating to the buffer. Sadly, it leads to significantly more complex helpers.

Let’s take a look at the implementation of content_tag.

def content_tag(name, content_or_options_with_block = nil, options = nil, escape = true, &block)
  if block_given?
    options = content_or_options_with_block if content_or_options_with_block.is_a?(Hash)
    content_tag = content_tag_string(name, capture(&block), options, escape)
 
    if block_called_from_erb?(block)
      concat(content_tag)
    else
      content_tag
    end
  else
    content_tag_string(name, content_or_options_with_block, options, escape)
  end
end

The important chunk here is the middle, inside of the if block_given? section. The first few lines just get the actual contents, using the capture helper to pull out the contents of the block. But then you get this:

if block_called_from_erb?(block)
  concat(content_tag)
else
  content_tag
end

This is actually a requirement for writing a block helper of any kind in Rails. First, Rails checks to see if the block is being called from ERB. If so, it takes care of concatenating to the buffer. Otherwise, the caller simply wants a String back, so it returns it.

Worse, here’s the implementation of block_called_from_erb?:

BLOCK_CALLED_FROM_ERB = 'defined? __in_erb_template'
 
# Check whether we're called from an erb template.
# We'd return a string in any other case, but erb <%= ... %>
# can't take an <% end %> later on, so we have to use <% ... %>
# and implicitly concat.
def block_called_from_erb?(block)
  block && eval(BLOCK_CALLED_FROM_ERB, block)
end

So every time you use a block helper in Rails, or use a helper which uses a block helper, Rails is forced to eval into the block to determine what the context is.

In Merb, we solved this problem by using this syntax:

<%= form_for(@object) do %>
Stuff
<% end =%>

And while everyone agrees that the opening <%= is a reasonable change, the closing =%> is a bit grating. However, it allows us to compile the above code into:

_buf << (form_for(@object) do
_buf << ("Stuff").to_s
end).to_s

That’s because we tag the end with a special ERB tag that allows us to attach a ).to_s to the end. We use Erubis, which lets us control the compilation process more finely, to hook into this process.

Rails 3 will use Erubis regardless of this problem to implement on-by-default XSS protection, but I needed a solution that didn’t require the closing =%> (ideally).

Evan (lead on Rubinius) hit upon a rather ingenious idea: use Ruby operator precedence to get around the need to know where the end was. Effectively, compile into the following:

_buf << capture_obj << form_for(@object) do
_buf << ("Stuff").to_s
end

where capture_obj is:

class CaptureObject
  def <<(obj)
    @object = obj
    self
  end
 
  def to_str
    @object.to_s
  end
 
  def to_s
    @object.to_s
  end
end

Unfortunately, with one hand Ruby operator precedence giveth, and with one hand it taketh away. In order to test this, I tried using a helper that returned an object, rather than a String (valid in ERB). In ERB, this would call to_s on the object. When I tried to run this code with the CaptureObject, I got:

template template:1:in `<<': can't convert Object into String (TypeError)
   from template template:1:in `template'
   from helper_spike.rb:48

Evan and I were both a bit baffled by this (although it retrospect we probably shouldn’t have been), and we hit on the idea to try running the code through Rubinius and look at its backtrace:

An exception occurred running helper_spike.rb
    Coercion error: #<Object:0x60a>.to_str => String failed:
(No method 'to_str' on an instance of Object.) (TypeError)
 
Backtrace:
                       Type.coerce_to at kernel/common/type.rb:22
           Kernel(String)#StringValue at kernel/common/kernel.rb:82
                            String#<< at kernel/common/string.rb:93
                   MyContext#template at template template:1
                      main.__script__ at helper_spike.rb:48

By looking at Rubinius’ backtrace, we quickly realized that the order of operations was wrong, and to_str was getting called on the return value from the helper, rather than the CaptureObject. As I tweeted immediately thereafter, the information available in Rubinius’ backtrace is just phenomenal, exposing enough information to really see what’s going on. Because the internals of Rubinius are written in Ruby, the Ruby backtrace goes all the way through to the Type.coerce_to method.

After realizing that, we changed the implementation of CaptureObject to take the buffer in its initializer, and have it handle concatenating to the buffer. The compiled code now looks like:

capture_obj << form_for(@object) do
_buf << ("Stuff").to_s
end

and the CaptureObject looks like:

class CaptureObject
  def initialize(buf)
    @buf = buf
  end
 
  def <<(obj)
    @buf << obj.to_s
  end
end

Now, Ruby’s operator precedence will bind the do to the form_for, and the return value of form_for will be to_s‘ed and concatenated to the buffer.

And the best thing is the implementation of content_tag once that’s done:

def content_tag(name, content = nil, options = nil, escape = true, &block)
  if block_given?
    options = content if content.is_a?(Hash)
    content = capture(&block)
  end
  content_tag_string(name, content, options, escape)
end

We can simply return a String and ERB handles the concatenation work. That’s the important part: helper writers should be able to think of block helpers the same way they think about traditional helpers. Somewhat less importantly, we’ll be able to eliminate evaling into untold numbers of blocks at runtime.

This was only an experiment, and the specific details still need to be worked out (how do we do this without breaking untold numbers of existing applications), I’m very happy with this solution, which provides the simplicity and performance enhancement of the Merb solution without the ugly =%>.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Rubygems Good Practice

Rubygems provides two things for the Ruby community.

  1. A remote repository/packaging format and installer
  2. A runtime dependency manager

The key to good rubygems practice is to treat these two elements of Rubygems as separate from each other. Someone might use the Rubygems packaging format and the Rubygems distribution but not want to use the Rubygems runtime.

And why should they? The Rubygems runtime is mainly responsible for setting up the appropriate load-paths, and if you are able to get the load paths set up correctly, why should you care about Rubygems at all?

In other words, you should write your libraries so that their only requirement is being in the load path. Users might then use Rubygems to get your library in the load path, or they might check it out of git and add it themselves.

It sounds pretty straight-forward but there are a few common pitfalls:

Using gem inside your gems

It’s reasonably common to see code like this inside of a gem:

gem "extlib", ">= 1.0.8"
require "extlib"

This should be entirely unnecessary. While using Kernel.gem in an application makes perfect sense, gems themselves should use their gem specification to provide dependent versions. When used with Rubygems, Rubygems will automatically add the appropriate dependencies to the load path. When not using Rubygems, the users can add the dependencies themselves.

Keep in mind that whether or not you use Rubygems, you can use require and it will do the right thing. If the file is in the load path (because you put it there or because Rubygems put it there), it will just work. If it’s not in the loadpath, Rubygems will look for a matching gem to add to the load path (by overriding require).

Rescuing from Gem::LoadError

This idiom is also reasonably common:

begin
  gem "my_gem", ">= 1.0.6"
  require "my_gem"
rescue Gem::LoadError
  # handle the error somehow
end

The right solution here is to avoid the gem call, as I said above, and rescue from plain LoadError. The Rubygems runtime sometimes raises Gem::LoadError, but that inherits from regular LoadError, so you’re free to rescue from that and catch cases with and without the rubygems runtime.

Conclusion

Declare you gem version dependencies in your gem specification and use simple requires in your library. If you need to catch the case where the dependency could not be found, rescue from LoadError.

And that’s all there is to it. Your library will work fine with or without the Rubygems runtime :)

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

Rails 3: The Great Decoupling

In working on Rails 3 over the past 6 months, I have focsed rather extensively on decoupling components from each other.

Why should ActionController care whether it’s talking to ActionView or just something that duck-types like ActionView? Of course, the key to making this work well is to keep the interfaces between components as small as possible, so that implementing an ActionView lookalike is a matter of implementing just a few methods, not dozens.

While I was preparing for my talk at RubyKaigi, I was trying to find the smallest possible examples that demonstrate some of this stuff. It went really well, but I noticed a few areas that could be improved even further, producing an even more compelling demonstration.

This weekend, I focused on cleaning up those interfaces, so we have small and clearly documented mechanisms for interfacing with Rails components. I want to focus on ActionView in this post, which I’ll demonstrate with an example.

$:.push "rails/activesupport/lib"
$:.push "rails/actionpack/lib"
 
require "action_controller"
 
class Kaigi < ActionController::Http
  include AbstractController::Callbacks
  include ActionController::RackConvenience
  include ActionController::Renderer
  include ActionController::Layouts
  include ActionView::Context
 
  before_filter :set_name
  append_view_path "views"
 
  def _action_view
    self
  end
 
  def controller
    self
  end
 
  DEFAULT_LAYOUT = Object.new.tap {|l| def l.render(*) yield end }
 
  def _render_template_from_controller(template, layout = DEFAULT_LAYOUT, options = {}, partial = false)
    ret = template.render(self, {})
    layout.render(self, {}) { ret }
  end
 
  def index
    render :template => "template"
  end
 
  def alt
    render :template => "template", :layout => "alt"
  end
 
  private
  def set_name
    @name = params[:name]
  end
end
 
app = Rack::Builder.new do
  map("/kaigi") {  run Kaigi.action(:index) }
  map("/kaigi/alt") { run Kaigi.action(:alt) }
end.to_app
 
Rack::Handler::Mongrel.run app, :Port => 3000

There’s a bunch going on here, but the important thing is that you can run this file with just ruby, and it’ll serve up /kaigi and /kaigi/alt. It will serve templates from the local “/views” directory, and correctly handle before filters just fine.

Let’s look at this a piece at a time:

$:.push "rails/activesupport/lib"
$:.push "rails/actionpack/lib"
 
require "action_controller"

This is just boilerplace. I symlinked rails to a directory under this file and required action_controller. Note that simply requiring ActionController is extremely cheap — no features have been used yet

class Kaigi < ActionController::Http
  include AbstractController::Callbacks
  include ActionController::RackConvenience
  include ActionController::Renderer
  include ActionController::Layouts
  include ActionView::Context
end

I inherited my class from ActionController::Http. I then included a number of features, include Rack convenience methods (request/response), the Renderer, and Layouts. I also made the controller itself the view context. I will discuss this more in just a moment.

  before_filter :set_name

This is the normal Rail before_filter. I didn’t need to do anything else to get this functionality other than include AbstractController::Callbacks

  append_view_path "views"

Because we’re not in a Rails app, our view paths haven’t been pre-populated. No problem: it’s just a one-liner to set them ourselves.

The next part is the interesting part. In Rails 3, while ActionView::Base remains the default view context, the interface between ActionController and ActionView is extremely well defined. Specifically:

  • A view context must include ActionView::Context. This just adds the compiled templates, so they can be called from the context
  • A view context must provide a _render_template_from_controller method, which takes a template object, a layout, and additional options
  • A view context may optionally also provide a _render_partial_from_controller, to handle render :partial => @some_object
  • In order to use ActionView::Helpers, a view context must have a pointer back to its original controller

That’s it! That’s the entire ActionController<=>ActionView interface.

  def _action_view
    self
  end
 
  def controller
    self
  end

Here, we specify that the view context is just self, and define controller, required by view contexts. Effectively, we have merged the controller and view context (mainly just to see if it could be done ;) )

  DEFAULT_LAYOUT = Object.new.tap {|l| def l.render(*) yield end }

Next, we make a default layout. This is just a simple proc that provides a render method that yields to the block. It will simplify:

  def _render_template_from_controller(template, layout = DEFAULT_LAYOUT, options = {}, partial = false)
    ret = template.render(self, {})
    layout.render(self, {}) { ret }
  end

Here, we supply the required _render_template_from_controller. The template object that is passed in is a standard Rails Template which has a render method on it. That method takes the view context and any locals. For this example, we pass in self as the view context, and do not provide any locals. Next, we call render on the layout, passing in the return value of template.render. The reason we created a default is to make the case of a layout identical to the case without.

  def index
    render :template => "template"
  end
 
  def alt
    render :template => "template", :layout => "alt"
  end
 
  private
  def set_name
    @name = params[:name]
  end

This is a standard Rails controller.

app = Rack::Builder.new do
  map("/kaigi") {  run Kaigi.action(:index) }
  map("/kaigi/alt") { run Kaigi.action(:alt) }
end.to_app
 
Rack::Handler::Mongrel.run app, :Port => 3000

Finally, rather than use the Rails router, we just wire the controller up directly using Rack. In Rails 3, ControllerName.action(:action_name) returns a rack-compatible endpoint, so we can wire them up directly.

And that’s all there is to it!

Note: I’m not sure if I still need to say this, but stuff like this is purely a demonstration of the power of the internals, and does not reflect changes to the public API or the way people use Rails by default. Everyone on the Rails team is strongly committed to retaining the same excellent startup experience and set of good conventional defaults. That will not be changing in 3.0.

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter

What do we need to get on Ruby 1.9?

A year ago, I was very skeptical of Ruby 1.9. There were a lot of changes in it, and it seemed like it was going to be a mammoth job to get things running on it. The benefits did not seem to outweigh the costs of switching, especially since Ruby 1.9 was not yet adequately stable to justify the big switch.

At this point, however, it seems as though Ruby 1.9 has stabilized (with 1.9.2 on the horizon), and there are some benefits that seem to obviously justify a switch (such as fast, integrated I18n, better performance in general, blocks that can have default arguments and take blocks, etc.).

Perhaps more importantly though, Ruby’s language implementors have shifted their focus to Ruby 1.9. It has become increasingly difficult to get enhancements in Ruby 1.8, because it is no longer trunk Ruby. Getting community momentum behind Ruby 1.9 would enable us to make productive suggestions to Matz and the other language implementors. Instead, we seem to get a new monthly patch fixing Ruby 1.8.

So my question is: what do we as a community need to shift momentum to 1.9. I’m don’t want a generic answer, like “we need to feel good about it”. I’m asking you what is stopping you today from using Ruby 1.9 for your next project. Is there a library that doesn’t work? Is there a new language feature that causes so much disruption to your existing programming patterns to make a switch untenable?

I suspect that we are all just comfortable in Ruby 1.8, but would actually be mostly fine upgrading to Ruby 1.9. I also suspect that there are small issues I’m not personally aware of, but which have blocked some of you from upgrading. Rails 2.3 and 3.0 (edge) work fine on Ruby 1.9, and I’d like to see what we can do to make Ruby 1.9 a good recommended option for new projects.

Thoughts?

Share and Enjoy:
  • Digg
  • Reddit
  • HackerNews
  • Twitter