The Building Blocks of Ruby

When showing off cool features of Ruby to the uninitiated (or to a language sparring partner), the excited Rubyist often shows off Ruby's "powerful block syntax". Unfortunately, the Rubyist uses "powerful block syntax" as shorthand for a number of features that the Pythonista or Javaist simply has no context for.

To start, we usually point at Rake, Rspec or Sinatra as examples of awesome usage of block syntax:

get "/hello" do
  "Hello World"
end

In response, Pythonistas usually point to these syntaxes as roughly equivalent:

@get('/hi')
def hello():
  return "Hello World"

def hello() -> "/hi":
  return "Hello World"

While the Python version may not be quite as pretty, nothing about them screams "Ruby has much stronger capabilities here". Instead, by using examples like Sinatra, Rubyists trade in an argument about great semantic power for one about superficial beauty.

Rubyists, Pythonistas and others working on web development share a common language in JavaScript. When describing blocks to "outsiders" who share a common knowledge of JavaScript, we tend to point at JavaScript functions as a close analogue. Unfortunately, this only furthers the confusion.

On the Ruby side, when PHP or Java announces that they're "adding closures", many of us don't stop to ask "what kind of closures?"

Cut to the Chase

Let's cut to the chase and use a better example of the utility of Ruby blocks.

def append(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  File.open(path, "a") do |file|
    file.puts YAML.dump(data)
  end

  return data
end

Here, the File.open method takes a block. It then opens a new file (in "append" mode), and yields the open file into the block. When the block completes, Ruby closes the file. Except that Ruby doesn't just close the file when the block completes; it guarantees that the File will be closed, even if executing the block results in a raise. Let's take a look at the implementation of File in Rubinius:

def self.open(*args)
  io = new *args

  return io unless block_given?

  begin
    yield io
  ensure
    begin
      io.close unless io.closed?
    rescue StandardError
      # nothing, just swallow them.
    end
  end
end

This means that you can wrap up idioms like pervasive try/catch/finally in methods.

# Without blocks
def append(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  begin
    file = File.open(path, "a")
    file.puts YAML.dump(data)
  ensure
    file.close
  end

  return data
end

Because Ruby runs ensure clauses even when the exception happened in a block, programmers can reliably ensure that Ruby executes teardown logic hidden away in abstractions.

This example only demonstrates the power of well-designed lambdas. With the addition of one small additional feature, Ruby's blocks become something altogether different.

def write(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  File.open(path, "w") do |file|
    return false if Digest::MD5.hexdigest(file.read) == data.hash
    file.puts YAML.dump(data)
  end

  return true
end

In the above case, imagine that writing the data to disk is quite expensive, and we can skip writing if the MD5 hash of the file's contents match a hash method on the data. Here, we'll return false if the method did not write to disk, and true if the method did.

Ruby's blocks support non-local-return (some references), which means that a return from the block behaves identically to returning from the block's original context. In this case, returning from inside the block returns from the write method, but Ruby will still run the ensure block closing the file.

You can think of non-local-return as behaving something like:

def write(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  File.open(path, "w") do |file|
    raise Return.new(false) if Digest::MD5.hexdigest(file.read) == data.hash
    file.puts YAML.dump(data)
  end

  return true
rescue Return => e
  return e.object
end

where Return is Return = Struct.new(:object).

Of course, any reasonable lambda implementation will support this, but Ruby's version has the benefit of feeling just like a normal return, and requiring much less chrome to achieve it. It also behaves well in scenarios that already use rescue or ensure, avoiding mind-warping combinations.

Further, Ruby also supports super inside of blocks. Imagine the write method was defined on a subclass of a simpler class whose write method took the raw data from the file and printed it to a log.

def write(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  File.open(path, "w") do |file|
    file_data = file.read
    super(location, file_data)
    return false if Digest::MD5.hexdigest(file_data) == data.hash
    file.puts YAML.dump(data)
  end

  return true
end

In a purer lambda scenario, we would need to store off a reference to the self, then use that reference inside the lambda:

def write(location, data)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  this = self
  File.open(path, "w") do |file|
    file_data = file.read

    # imaginary Ruby construct that would be needed without
    # non-local-super
    this.super.write(location, file_data)
    raise Return.new(false) if Digest::MD5.hexdigest(file_data) == data.hash
    file.puts YAML.dump(data)
  end

  return true
rescue Return => e
  return e.object
end

You can also yield to a method's block inside a block. Imagine that the write method is called with a block that chooses the correct data to use based on whether the file is executable:

def write(location)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  File.open(path, "w") do |file|
    file_data = file.read
    super(location)
    data = yield file
    return false if Digest::MD5.hexdigest(file_data) == data.hash
    file.puts YAML.dump(data)
  end

  return true
end

This would be called via:

write("/path/to/file") do |file|
  if file.executable?
    "#!/usr/bin/env ruby\nputs 'Hello World!'"
  else
    "Hello World!"
  end
end

In a pure-lambda language, we would take the block in as a normal argument to the function, then call it inside the closure:

def write(location, block)
  path = Pathname.new(location)
  raise "Location does not exist" unless path.exist?

  this = self
  File.open(path, "w") do |file|
    file_data = file.read

    # imaginary Ruby construct that would be needed without
    # non-local-super
    this.super.write(location, file_data)
    data = block.call(file)
    raise Return.new(false) if Digest::MD5.hexdigest(file_data) == data.hash
    file.puts YAML.dump(data)
  end

  return true
rescue Return => e
  return e.object
end

The real benefit of Ruby's approach comes from the fact that the code inside the block would be identical if the method did not take a block. Consider the identical method, except taking a File instead of a location:

def write(file)
  file_data = file.read
  super(file)
  data = yield file
  return false if Digest::MD5.hexdigest(file_data) == data.hash
  file.puts YAML.dump(data)
  return true
end

Without the block, the Ruby code looks exactly the same. This means that Ruby programmers can more easily abstract out repeated patterns into methods that take blocks without having to rewrite a bunch of code. It also means that using a block does not interrupt the normal flow of code, and it's possible to create new "control flow" constructs that behave almost identically to built-in control flow constructs like if and while.

Rails uses this to good effect with respond_to, which provides convenient syntax for declaring content negotiation:

def index
  @people = Person.find(:all)

  respond_to do |format|
    format.html # default action is render
    format.xml { render :xml => @people.xml }
  end
end

Because of the way Ruby blocks work, you can also return from any of the format blocks:

def index
  @people = Person.find(:all)

  respond_to do |format|
    format.html { redirect_to(person_path(@people.first)) and return }
    format.xml  { render :xml => @people.xml }
    format.json { render :json => @people.json }
  end

  session[:web_service] = true
end

Here, we returned from the HTML format after redirecting, allowing us to take additional action (setting a :web_service key on the session) for other cases (XML and JSON mime types).

Keep in mind that the code above is a demonstration of a number of features of Ruby's blocks. It's very rare to see return, yield and super all used in a single block. That said, Ruby programmers commonly use one or more of these constructs inside blocks, because their usage is seamless.

So Why Are Ruby's Blocks Better?

If you made it this far, let's take a look at another use of blocks in Ruby: mutex synchronization.

Java supports synchronization via a special synchronized keyword:

class Example {
  final Lock lock = new Lock();

  void example() {
    synchronized(lock) {
      // do dangerous stuff here
    }
  }
}

Essentially, Java provides a special construct for expressing the idea that it should run a block of code once at a time for a given instance of the synchronization object. Because Java provides a special construct, you can return from inside the synchronization block, and the Java runtime does the appropriate things.

Similarly, Python required the use of try/finally until Python 2.5, when they added a special language feature to handle the try/finally idiom:

class Example:
  # old
  def example(self):
    lock.acquire()
    try:
      ... access shared resource
    finally:
      lock.release() # release lock, no matter what

  # new
  def example(self):
    with lock:
      ... access shared resource

In Python's 2.5's case, the object passed to with must implement a special protocol (including enter and exit methods), so the with statement cannot be used like Ruby's general-purpose, lightweight blocks.

Ruby represents the same concept using a method that takes a block:

class Example
  @@lock = Mutex.new

  def example
    @@lock.synchronize do
      # do dangerous stuff here
    end
  end
end

Importantly, synchronize is a normal Ruby method. The original version, written in pure Ruby, looks like this:

def synchronize
  lock
  begin
    yield
  ensure
    unlock
  end
end

It has all the hallmarks of what we've discussed so far. It locks, yields to the block, and ensures that the lock will be released. This means that if a Ruby programmer returns from inside the block, synchronize will behave correctly.

This example demonstrates the key power of Ruby's blocks: they can easily replace language constructs. In this case, a Ruby programmer can take unsafe code, plop it inside a synchronization block, and it will continue to work.

Postscript

I've historically written my posts without very many links, mostly out of a fear of links going out of date. I've received increasing requests for more annotations in my posts, so I'll start doing that. Let me know if you think my annotations in this post were useful, and feel free to give me any suggestions on that front that you find useful.