Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

Archive for January, 2012

JavaScript Needs Blocks

While reading Hacker News posts about JavaScript, I often come across the misconception that Ruby’s blocks are essentially equivalent to JavaScript’s “first class functions”. Because the ability to pass functions around, especially when you can create them anonymously, is extremely powerful, the fact that both JavaScript and Ruby have a mechanism to do so makes it natural to assume equivalence.

In fact, when people talk about why Ruby’s blocks are different from Python‘s functions, they usually talk about anonymity, something that Ruby and JavaScript share, but Python does not have. At first glance, a Ruby block is an “anonymous function” (or colloquially, a “closure”) just as a JavaScript function is one.

This impression, which I admittedly shared in my early days as a Ruby/JavaScript developer, misses an important subtlety that turns out to have large implications. This subtlety is often referred to as “Tennent’s Correspondence Principle”. In short, Tennent’s Correspondence Principle says:

“For a given expression expr, lambda expr should be equivalent.”

This is also known as the principle of abstraction, because it means that it is easy to refactor common code into methods that take a block. For instance, consider the common case of file resource management. Imagine that the block form of File.open didn’t exist in Ruby, and you saw a lot of the following in your code:

begin
  f = File.open(filename, "r")
  # do something with f
ensure
  f.close
end

In general, when you see some code that has the same beginning and end, but a different middle, it is natural to refactor it into a method that takes a block. You would write a method like this:

def read_file(filename)
  f = File.open(filename, "r")
  yield f
ensure
  f.close
end

And you’d refactor instances of the pattern in your code with:

read_file(filename) do |f|
  # do something with f
end

In order for this strategy to work, it’s important that the code inside the block look the same after refactoring as before. We can restate the correspondence principle in this case as:

# do something with f

should be equivalent to:

do
  # do something with
end

At first glance, it looks like this is true in Ruby and JavaScript. For instance, let’s say that what you’re doing with the file is printing its mtime. You can easily refactor the equivalent in JavaScript:

try {
  // imaginary JS file API
  var f = File.open(filename, "r");
  sys.print(f.mtime);
} finally {
  f.close();
}

Into this:

read_file(function(f) {
  sys.print(f.mtime);
});

In fact, cases like this, which are in fact quite elegant, give people the mistaken impression that Ruby and JavaScript have a roughly equivalent ability to refactor common functionality into anonymous functions.

However, consider a slightly more complicated example, first in Ruby. We’ll write a simple class that calculates a File’s mtime and retrieves its body:

class FileInfo
  def initialize(filename)
    @name = filename
  end
 
  # calculate the File's +mtime+
  def mtime
    f = File.open(@name, "r")
    mtime = mtime_for(f)
    return "too old" if mtime < (Time.now - 1000)
    puts "recent!"
    mtime
  ensure
    f.close
  end
 
  # retrieve that file's +body+
  def body
    f = File.open(@name, "r")
    f.read
  ensure
    f.close
  end
 
  # a helper method to retrieve the mtime of a file
  def mtime_for(f)
    File.mtime(f)
  end
end

We can easily refactor this code using blocks:

class FileInfo
  def initialize(filename)
    @name = filename
  end
 
  # refactor the common file management code into a method
  # that takes a block
  def mtime
    with_file do |f|
      mtime = mtime_for(f)
      return "too old" if mtime < (Time.now - 1000)
      puts "recent!"
      mtime
    end
  end
 
  def body
    with_file { |f| f.read }
  end
 
  def mtime_for(f)
    File.mtime(f)
  end
 
private
  # this method opens a file, calls a block with it, and
  # ensures that the file is closed once the block has
  # finished executing.
  def with_file
    f = File.open(@name, "r")
    yield f
  ensure
    f.close
  end
end

Again, the important thing to note here is that we could move the code into a block without changing it. Unfortunately, this same case does not work in JavaScript. Let’s first write the equivalent FileInfo class in JavaScript.

// constructor for the FileInfo class
FileInfo = function(filename) {
  this.name = filename;
};
 
FileInfo.prototype = {
  // retrieve the file's mtime
  mtime: function() {
    try {
      var f = File.open(this.name, "r");
      var mtime = this.mtimeFor(f);
      if (mtime < new Date() - 1000) {
        return "too old";
      }
      sys.print(mtime);
    } finally {
      f.close();
    }
  },
 
  // retrieve the file's body
  body: function() {
    try {
      var f = File.open(this.name, "r");
      return f.read();
    } finally {
      f.close();
    }
  },
 
  // a helper method to retrieve the mtime of a file
  mtimeFor: function(f) {
    return File.mtime(f);
  }
};

If we try to convert the repeated code into a method that takes a function, the mtime method will look something like:

function() {
  // refactor the common file management code into a method
  // that takes a block
  this.withFile(function(f) {
    var mtime = this.mtimeFor(f);
    if (mtime < new Date() - 1000) {
      return "too old";
    }
    sys.print(mtime);
  });
}

There are two very common problems here. First, this has changed contexts. We can fix this by allowing a binding as a second parameter, but it means that we need to make sure that every time we refactor to a lambda we make sure to accept a binding parameter and pass it in. The var self = this pattern emerged in JavaScript primarily because of the lack of correspondence.

This is annoying, but not deadly. More problematic is the fact that return has changed meaning. Instead of returning from the outer function, it returns from the inner one.

This is the right time for JavaScript lovers (and I write this as a sometimes JavaScript lover myself) to argue that return behaves exactly as intended, and this behavior is simpler and more elegant than the Ruby behavior. That may be true, but it doesn’t alter the fact that this behavior breaks the correspondence principle, with very real consequences.

Instead of effortlessly refactoring code with the same start and end into a function taking a function, JavaScript library authors need to consider the fact that consumers of their APIs will often need to perform some gymnastics when dealing with nested functions. In my experience as an author and consumer of JavaScript libraries, this leads to many cases where it’s just too much bother to provide a nice block-based API.

In order to have a language with return (and possibly super and other similar keywords) that satisfies the correspondence principle, the language must, like Ruby and Smalltalk before it, have a function lambda and a block lambda. Keywords like return always return from the function lambda, even inside of block lambdas nested inside. At first glance, this appears a bit inelegant, and language partisans often accuse Ruby of unnecessarily having two types of “callables”, in my experience as an author of large libraries in both Ruby and JavaScript, it results in more elegant abstractions in the end.

Iterators and Callbacks

It’s worth noting that block lambdas only make sense for functions that take functions and invoke them immediately. In this context, keywords like return, super and Ruby’s yield make sense. These cases include iterators, mutex synchronization and resource management (like the block form of File.open).

In contrast, when functions are used as callbacks, those keywords no longer make sense. What does it mean to return from a function that has already returned? In these cases, typically involving callbacks, function lambdas make a lot of sense. In my view, this explains why JavaScript feels so elegant for evented code that involves a lot of callbacks, but somewhat clunky for the iterator case, and Ruby feels so elegant for the iterator case and somewhat more clunky for the evented case. In Ruby’s case, (again in my opinion), this clunkiness is more from the massively pervasive use of blocks for synchronous code than a real deficiency in its structures.

Because of these concerns, the ECMA working group responsible for ECMAScript, TC39, is considering adding block lambdas to the language. This would mean that the above example could be refactored to:

FileInfo = function(name) {
  this.name = name;
};
 
FileInfo.prototype = {
  mtime: function() {
    // use the proposed block syntax, `{ |args| }`.
    this.withFile { |f|
      // in block lambdas, +this+ is unchanged
      var mtime = this.mtimeFor(f);
      if (mtime < new Date() - 1000) {
        // block lambdas return from their nearest function
        return "too old";
      }
      sys.print(mtime);
    }
  },
 
  body: function() {
    this.withFile { |f| f.read(); }
  },
 
  mtimeFor: function(f) {
    return File.mtime(f);
  },
 
  withFile: function(block) {
    try {
      var f = File.open(this.name, "r");
      block(f);
    } finally {
      f.close();
    }
  }
};

Note that a parallel proposal, which replaces function-scoped var with block-scoped let, will almost certainly be accepted by TC39, which would slightly, but not substantively, change this example. Also note block lambdas automatically return their last statement.

Our experience with Smalltalk and Ruby show that people do not need to understand the SCARY correspondence principle for a language that satisfies it to yield the desired results. I love the fact that the concept of “iterator” is not built into the language, but is instead a consequence of natural block semantics. This gives Ruby a rich, broadly useful set of built-in iterators, and language users commonly build custom ones. As a JavaScript practitioner, I often run into situations where using a for loop is significantly more straight-forward than using forEach, always because of the lack of correspondence between the code inside a built-in for loop and the code inside the function passed to forEach.

For the reasons described above, I strongly approve of the block lambda proposal and hope it is adopted.

Archives

Categories

Meta