Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor.
Tokaido: My Hopes and Dreams
April 13th, 2012
A few weeks ago, I started a kickstarter project to fund work on a project to make a long-term, sustainable binary build of Ruby. The outpouring of support was great, and I have far exceeded my original funding goal. First, I’d like to thank everyone in the community who contributed large and small donations. This kickstarter couldn’t have been as successful as it has been without the hundreds (650 at latest count!) of individual donations by interested Rubyists.
In this post, I want to talk about what my goals are for this project, and why I think it will be a tool that everyone, myself included, will use.
What is Tokaido
The name “Tokaido” (東海道 in Japanese) comes from the Tōkaidō Shinkansen bullet train line in Japan.
Precompiled, Static Ruby
At its core, Tokaido is a binary distribution of Ruby without any external dependencies on your system. This means that Ruby itself, as well as all of the compiled elements of the standard library, will come in a self-contained directory with no additional requirements.
The binary will also have no hardcoded paths, so it will be possible to move the directory anywhere on the file system and have it continue to work as long as its bin directory is on the $PATH. This is an important goal, as Tokaido will ship as a downloadable .app which should work wherever it is dropped on the file system.
Precompiled Binary Gems
Tokaido will come with all of the necessary gems to use Rails and common Rails extensions. Because some of these gems have external dependencies (for example, nokogori depends on libxml2), Tokaido will come with precompiled versions of these gems built for OSX that do not depend on external libraries (they will be statically compiled, not dynamically linked).
As part of this project, I plan to work with people who ship common native extensions to help them build and ship binary versions of their gems for OSX without external dependencies. Luis Lavena has been doing amazing work with rake-compiler to make this process easy on gem developers, and Wayne E. Seguin and Michał Papis have been doing great work with sm, which makes it easy to precompile the needed dependencies for inclusion in the precompiled gems. These tools will be essential in the effort to make dependency-free precompiled gems a standard part of the OSX Ruby ecosystem.
I anticipate that gem authors will, in general, start distributing precompiled binary versions of their gems. If, by the time I ship the first version of Tokaido, some important gems do not yet ship as precompiled binaries, Tokaido will bootstrap the process by including the binaries.
Terminal-Based Workflow
Tokaido does not aim to replace the Terminal as the main way to work with a Rails app. It will ship an isolated environment with no external dependencies designed for use in the Terminal. The application UI will supplement, rather than replace, the normal application workflow. This is a crucial part of the overall goal to make Tokaido an excellent tool for both experienced Rails developers, intermediate Rails developers, and totally new Rails developers.
Because many gems come with executables, and Tokaido couldn’t abstract every possible executable even if it wanted to, it is essential that new developers get used to using the Terminal as early as possible in their Rails experience, but in a way that minimizes unnecessary errors.
Extras: Code and App Health
A number of really great tools exist to help Rails applications remain healthy:
bundle outdatedlets you know when new versions of gems are availablerails_best_practiceshelps find common mistakes in Rails applicationsreek,flog,flay,roodiand other metrics tools help identify Ruby smells and prioritizes areas where you can reduce your technical debtsimplecovdoes coverage analysis on your test suite to identify areas lacking good coveragebulletidentifies N+1 queries and unused eager loadingActiveSupport::Notificationsprovides introspection information about Rails applications
The Tokaido UI will attempt to centralize this useful information into a health check, and help you drill in if you want to do more with the information. Because they are so useful, it will encourage you to use these tools in new application, and provide immediate rewards if you do.
Extras: yourapp.dev
The pow dev server and PassengerPane make it possible to avoid needing to manually load a Rails server and gives you a friendly alias for your application at yourapp.dev instead of having to type in a port on localhost.
Tokaido will come with integrated support for this, so every app you manage through Tokaido will have a aliased development server out of the box.
Extras: rails:// (or ruby://) Protocol Handler
Tokaido will come with a protocol handler that will allow .gem files hosted on rubygems.org or other locations to be added to an app managed by Tokaido. It may also allow web sites to host installation templates that could execute against a maintained application. This will require coordination with rubygems.org, and the specifics of it may change over time, but the basic idea is to enable communication between web sites and the Tokaido app.
This idea came from a number of community members, and the specifics (especially around security and the protocol specification) definitely need fleshing out, but it seems extremely promising.
Extras: Ruby Toolbox Integration?
Ruby Toolbox has emerged as an amazing directory of Ruby libraries, categorized by type. It also has up-to-date information on Github activity and other useful indicators when evaluating tools. Several people have asked for integration with Ruby Toolbox, and assuming the author is willing, Tokaido will make it easy to use the information in Ruby Toolbox to bootstrap an app and add new functionality as your app grows.
Finding the right gem for the job is easy if you know what you’re looking for, but even experienced developers find the Ruby Toolbox useful when researching tools in an unfamiliar area.
Goals of Tokaido
Eliminate Failure Scenarios
The primary goal of Tokaido is to build a distribution of Ruby that eliminates the main failure scenarios that people encounter when trying to install Ruby. In general, things fail because of conflicts with the existing system environment (or the user’s environment). These failures do not happen to everyone, but when they happen, they are extremely difficult to recover from. This has happened to me personally and to people I have tried to get started with Ruby.
This sort of thing does not necessarily happen on every installation, but once you start going down the rabbit hole, it can be very difficult to find your way out. The environment difficulties can be caused by anything from an old MacPorts installation to a mistaken attempt to install something to the system once upon a time to something failing during a previous step in the installation process.
It also may not be enough to install a precompiled Ruby into the system, because the system environment itself may be corrupted with things like erroneous executables on the $PATH or bad dynamic libraries that get linked during native compilation. Also, later installations may do damage to the system-installed Ruby. Tokaido is a standalone environment that is loaded on top of an existing session, and therefore minimizes the possible damage that load order or subsequent installs can do.
Precompile Everything
In order to eliminate a certain class of failure scenarios, Tokaido will ship with a precompiled Ruby, which will eliminate the possibility of compilation errors when installing Ruby. This precompiled Ruby will also come with all of the dependencies it needs, like zlib, yaml and others, instead of allowing the system to try to find them at runtime. This will eliminate the possibility that changes to the system environment will cause a normally working version of Ruby to fail in some scenarios.
As above, Tokaido will also use this technique for commonly used native gems. For example, Tokaido will ship with a precompiled version of Nokogiri that comes with libxml, instead of relying on the system’s copy of libxml (incidentally, the system’s libxml has occasionally been subtly broken, necessitating installation via homebrew). I expect that this will happen because gem authors will start shipping precompiled versions of their gems for OSX. If there are still a few common gems straggling by the time Tokaido ships, we’ll bootstrap the process by shipping with binary gems we compile ourselves.
Use Tokaido Myself
Since Tokaido does not fundamentally alter a developer’s relationship with Ruby, I expect to be able to start using it for day-to-day Rails development. Some of the additional extras, like app health and built-in server support, are things I already do manually and will enjoy having as part of a larger tool. I’m also extremely interested in ideas for other extras that can add value for experienced developers without altering how people work with Ruby today.
Integration Testing
One of the coolest, unheralded things about the Rails project is the integration suite for Rails put together by Sam Ruby. It essentially transcribes the Agile Web Development With Rails book into a series of very complete integration tests for Rails. When refactoring Rails for Rails 3, this suite was extremely useful, and helped keep the number of unintentional changes to Rails to a tiny number. It also kept us honest about intentional changes.
This suite is perfect for Tokaido, as it tests every aspect of Rails, which is itself touches a wide swath of Ruby itself. To start, Tokaido will use this suite for integration testing.
Collaboration
There are several other projects, notably rvm, trying to solve problems in a similar space. Especially when it comes to precompilation, there is no reason not to work together on the core infrastructure that powers these efforts. I have already started to work with Michał Papis, who also works on sm and rvm on shared work that can be useful for both projects. He has been teaching me all about the work that folks associated with rvm have done to simplify things like downloading and compiling libz.a, a prerequisite for precompiled Rubies.
I will also work with authors of native gems to start shipping precompiled binary gems for OSX. I have already started working with Aaron Patterson, who works on the sqlite and nokogiri gems, and have started reaching out to several others.
I am very interested in working together with anyone working on a similar area so that the tools we are all working on can be shared between projects. This is a core mission of the Tokaido project, and something that the extra funding I got will allow me to prioritize.
Migration to “System”
Also through Michał Papis, I learned about an awesome new (still experimental) feature in rvm called mount that will mount an external Ruby into the rvm system. I will make sure that the Tokaido ruby can be added to an rvm installation. This will mean that if someone wants to migrate from Tokaido to a more advanced rvm setup, they can take their Ruby environment with them with no trouble.
I would be open to other approaches to migrating a Tokaido environment to the system as well. The goal would be to seamlessly migrate an environment to the system without introducing opportunities for failure during that process.
Looking Forward
I’m really excited to the work I’ve been doing to prepare for this project, and looking forward to shipping a great environment for OSX Ruby users. I have also really enjoyed reaching out to others working in similar areas and the prospect of collaborating with so many smart people on a shared goal.
Thanks so much!
JavaScript Needs Blocks
January 10th, 2012
While reading Hacker News posts about JavaScript, I often come across the misconception that Ruby’s blocks are essentially equivalent to JavaScript’s “first class functions”. Because the ability to pass functions around, especially when you can create them anonymously, is extremely powerful, the fact that both JavaScript and Ruby have a mechanism to do so makes it natural to assume equivalence.
In fact, when people talk about why Ruby’s blocks are different from Python‘s functions, they usually talk about anonymity, something that Ruby and JavaScript share, but Python does not have. At first glance, a Ruby block is an “anonymous function” (or colloquially, a “closure”) just as a JavaScript function is one.
This impression, which I admittedly shared in my early days as a Ruby/JavaScript developer, misses an important subtlety that turns out to have large implications. This subtlety is often referred to as “Tennent’s Correspondence Principle”. In short, Tennent’s Correspondence Principle says:
“For a given expression
expr,lambda exprshould be equivalent.”
This is also known as the principle of abstraction, because it means that it is easy to refactor common code into methods that take a block. For instance, consider the common case of file resource management. Imagine that the block form of File.open didn’t exist in Ruby, and you saw a lot of the following in your code:
begin f = File.open(filename, "r") # do something with f ensure f.close end
In general, when you see some code that has the same beginning and end, but a different middle, it is natural to refactor it into a method that takes a block. You would write a method like this:
def read_file(filename) f = File.open(filename, "r") yield f ensure f.close end
And you’d refactor instances of the pattern in your code with:
read_file(filename) do |f| # do something with f end
In order for this strategy to work, it’s important that the code inside the block look the same after refactoring as before. We can restate the correspondence principle in this case as:
# do something with fshould be equivalent to:
do # do something with end
At first glance, it looks like this is true in Ruby and JavaScript. For instance, let’s say that what you’re doing with the file is printing its mtime. You can easily refactor the equivalent in JavaScript:
try { // imaginary JS file API var f = File.open(filename, "r"); sys.print(f.mtime); } finally { f.close(); }
Into this:
read_file(function(f) { sys.print(f.mtime); });
In fact, cases like this, which are in fact quite elegant, give people the mistaken impression that Ruby and JavaScript have a roughly equivalent ability to refactor common functionality into anonymous functions.
However, consider a slightly more complicated example, first in Ruby. We’ll write a simple class that calculates a File’s mtime and retrieves its body:
class FileInfo def initialize(filename) @name = filename end # calculate the File's +mtime+ def mtime f = File.open(@name, "r") mtime = mtime_for(f) return "too old" if mtime < (Time.now - 1000) puts "recent!" mtime ensure f.close end # retrieve that file's +body+ def body f = File.open(@name, "r") f.read ensure f.close end # a helper method to retrieve the mtime of a file def mtime_for(f) File.mtime(f) end end
We can easily refactor this code using blocks:
class FileInfo def initialize(filename) @name = filename end # refactor the common file management code into a method # that takes a block def mtime with_file do |f| mtime = mtime_for(f) return "too old" if mtime < (Time.now - 1000) puts "recent!" mtime end end def body with_file { |f| f.read } end def mtime_for(f) File.mtime(f) end private # this method opens a file, calls a block with it, and # ensures that the file is closed once the block has # finished executing. def with_file f = File.open(@name, "r") yield f ensure f.close end end
Again, the important thing to note here is that we could move the code into a block without changing it. Unfortunately, this same case does not work in JavaScript. Let’s first write the equivalent FileInfo class in JavaScript.
// constructor for the FileInfo class FileInfo = function(filename) { this.name = filename; }; FileInfo.prototype = { // retrieve the file's mtime mtime: function() { try { var f = File.open(this.name, "r"); var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { return "too old"; } sys.print(mtime); } finally { f.close(); } }, // retrieve the file's body body: function() { try { var f = File.open(this.name, "r"); return f.read(); } finally { f.close(); } }, // a helper method to retrieve the mtime of a file mtimeFor: function(f) { return File.mtime(f); } };
If we try to convert the repeated code into a method that takes a function, the mtime method will look something like:
function() { // refactor the common file management code into a method // that takes a block this.withFile(function(f) { var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { return "too old"; } sys.print(mtime); }); }
There are two very common problems here. First, this has changed contexts. We can fix this by allowing a binding as a second parameter, but it means that we need to make sure that every time we refactor to a lambda we make sure to accept a binding parameter and pass it in. The var self = this pattern emerged in JavaScript primarily because of the lack of correspondence.
This is annoying, but not deadly. More problematic is the fact that return has changed meaning. Instead of returning from the outer function, it returns from the inner one.
This is the right time for JavaScript lovers (and I write this as a sometimes JavaScript lover myself) to argue that return behaves exactly as intended, and this behavior is simpler and more elegant than the Ruby behavior. That may be true, but it doesn’t alter the fact that this behavior breaks the correspondence principle, with very real consequences.
Instead of effortlessly refactoring code with the same start and end into a function taking a function, JavaScript library authors need to consider the fact that consumers of their APIs will often need to perform some gymnastics when dealing with nested functions. In my experience as an author and consumer of JavaScript libraries, this leads to many cases where it’s just too much bother to provide a nice block-based API.
In order to have a language with return (and possibly super and other similar keywords) that satisfies the correspondence principle, the language must, like Ruby and Smalltalk before it, have a function lambda and a block lambda. Keywords like return always return from the function lambda, even inside of block lambdas nested inside. At first glance, this appears a bit inelegant, and language partisans often accuse Ruby of unnecessarily having two types of “callables”, in my experience as an author of large libraries in both Ruby and JavaScript, it results in more elegant abstractions in the end.
Iterators and Callbacks
It’s worth noting that block lambdas only make sense for functions that take functions and invoke them immediately. In this context, keywords like return, super and Ruby’s yield make sense. These cases include iterators, mutex synchronization and resource management (like the block form of File.open).
In contrast, when functions are used as callbacks, those keywords no longer make sense. What does it mean to return from a function that has already returned? In these cases, typically involving callbacks, function lambdas make a lot of sense. In my view, this explains why JavaScript feels so elegant for evented code that involves a lot of callbacks, but somewhat clunky for the iterator case, and Ruby feels so elegant for the iterator case and somewhat more clunky for the evented case. In Ruby’s case, (again in my opinion), this clunkiness is more from the massively pervasive use of blocks for synchronous code than a real deficiency in its structures.
Because of these concerns, the ECMA working group responsible for ECMAScript, TC39, is considering adding block lambdas to the language. This would mean that the above example could be refactored to:
FileInfo = function(name) { this.name = name; }; FileInfo.prototype = { mtime: function() { // use the proposed block syntax, `{ |args| }`. this.withFile { |f| // in block lambdas, +this+ is unchanged var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { // block lambdas return from their nearest function return "too old"; } sys.print(mtime); } }, body: function() { this.withFile { |f| f.read(); } }, mtimeFor: function(f) { return File.mtime(f); }, withFile: function(block) { try { var f = File.open(this.name, "r"); block(f); } finally { f.close(); } } };
Note that a parallel proposal, which replaces function-scoped var with block-scoped let, will almost certainly be accepted by TC39, which would slightly, but not substantively, change this example. Also note block lambdas automatically return their last statement.
Our experience with Smalltalk and Ruby show that people do not need to understand the SCARY correspondence principle for a language that satisfies it to yield the desired results. I love the fact that the concept of “iterator” is not built into the language, but is instead a consequence of natural block semantics. This gives Ruby a rich, broadly useful set of built-in iterators, and language users commonly build custom ones. As a JavaScript practitioner, I often run into situations where using a for loop is significantly more straight-forward than using forEach, always because of the lack of correspondence between the code inside a built-in for loop and the code inside the function passed to forEach.
For the reasons described above, I strongly approve of the block lambda proposal and hope it is adopted.
Amber.js (formerly SproutCore 2.0) is now Ember.js
December 12th, 2011
After we announced Amber.js last week, a number of people brought Amber Smalltalk, a Smalltalk implementation written in JavaScript, to our attention. After some communication with the folks behind Amber Smalltalk, we started a discussion on Hacker News about what we should do.
Most people told us to stick with Amber.js, but a sizable minority told us to come up with a different name. After thinking about it, we didn’t feel good about the conflict and decided to choose a new name.
Henceforth, the project formerly known as SproutCore 2.0 will be known as Ember.js. Our new website is up at http://www.emberjs.com
(and yes, we know this is pretty ridiculous)
Announcing Amber.js
December 8th, 2011
A little over a year ago, I got my first serious glimpse at SproutCore, the JavaScript framework Apple used to build MobileMe (now iCloud). At the time, I had worked extensively with jQuery and Rails on client-side projects, and I had never found the arguments for the “solutions for big apps” very compelling. At the time, most of the arguments (at least within the jQuery community) focused on bringing more object orientation to JavaScript, but I never felt that they offered the layers of abstraction you really want to manage complexity.
When I first started to play with SproutCore, I realized that the bindings and computed properties were what gave it its real power. Bindings and computed properties provide a clean mechanism for building the layers of abstractions that improve the structure of large applications.
But even before I got involved in SproutCore, I had an epiphany one day when playing with Mustache.js. Because Mustache.js was a declarative way of describing a translation from a piece of JSON to HTML, it seemed to me that there was enough information in the template to also update the template when the underlying data changed. Unfortunately, Mustache.js itself lacked the power to implement this idea, and I was still lacking a robust enough observer library.
Not wanting to build an observer library in isolation (and believing that jQuery’s data support would work in a pinch), I started working on the first problem: building a template engine powerful enough to build automatically updating templates. The kernel of the idea for Handlebars (helpers and block helpers as the core primitives) came out of a discussion with Carl Lerche back when we were still at Engine Yard, and I got to work.
When I met SproutCore, I realized that it provided a more powerful observer library than anything I was considering at the time for the data-binding aspect of Handlebars, and that SproutCore’s biggest weakness was the lack of a good templating solution in its view layer. I also rapidly became convinced that bindings and computed properties were a significantly better abstraction, and allowed for hiding much more complexity, than manually binding observers.
After some months of retooling SproutCore with Tom Dale to take advantage of an auto-updating templating solution that fit cleanly into SproutCore’s binding model, we reached a crossroads. SproutCore itself was built from the ground up to provide a desktop-like experience on desktop browsers, and our ultimate plan had started to diverge from the widget-centric focus of many existing users of SproutCore. After a lot of soul-searching, we decided to start from scratch with SproutCore 2.0, taking with us the best, core ideas of SproutCore, but leaving the large, somewhat sprawling codebase behind.
Since early this year, we have worked with several large companies, including ZenDesk, BazaarVoice and LivingSocial, to iterate on the core ideas that we started from to build a powerful framework for building ambitious applications.
Throughout this time, though, we became increasingly convinced that calling what we were building “SproutCore 2.0″ was causing a lot of confusion, because SproutCore 1.x was primarily a native-style widget library, while SproutCore 2.0 was a framework for building web-based applications using HTML and CSS for the presentation layer. This lack of overlap causes serious confusion in the IRC room, mailing list, blog, when searching on Google, etc.
To clear things up, we have decided to name the SproutCore-inspired framework we have been building (so far called “SproutCore 2.0″) “Amber.js”. Amber brings a proven MVC architecture to web applications, as well as features that eliminate common boilerplate. If you played with SproutCore and liked the concepts but felt like it was too heavy, give Amber a try. And if you’re a Backbone fan, I think you’ll love how little code you need to write with Amber.
In the next few days, we’ll be launching a new website with examples, documentation, and download links. Stay tuned for further updates soon.
UPDATE: The code for Amber.js is still, as of December 8, hosted at the SproutCore organization. It will be moved and re-namespaced within a few days.
How to Marshal Procs Using Rubinius
November 19th, 2011
The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.
There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.
One example is the Binding object. To create a new binding in Rubinius, you call Binding.setup:
def self.setup(variables, code, static_scope, recv=nil) bind = allocate() bind.self = recv || variables.self bind.variables = variables bind.code = code bind.static_scope = static_scope return bind end
This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding in pure Ruby.
In fact, Rubinius implements Kernel#binding itself in terms of Binding.setup.
def binding return Binding.setup( Rubinius::VariableScope.of_sender, Rubinius::CompiledMethod.of_sender, Rubinius::StaticScope.of_sender, self) end
Yes, you’re reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller’s scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It’s core to how Rubinius manages eval, which of course makes heavy use of bindings.
Marshalling Procs
For a while, I have wanted the ability to Marshal.dump a proc in Ruby. MRI has historically disallowed it, but there’s nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.
Let’s take a quick detour to learn about the constructs in question.
Rubinius::StaticScope
Rubinius represents Ruby’s constant lookup scope as a Rubinius::StaticScope object. Perhaps the easiest way to understand it would be to look at Ruby’s built-in Module.nesting function.
module Foo p Module.nesting module Bar p Module.nesting end end module Foo::Bar p Module.nesting end # Output: # [Foo] # [Foo::Bar, Foo] # [Foo::Bar]
Every execution context in Rubinius has a Rubinius::StaticScope, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object.
Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting in Rubinius:
def nesting scope = Rubinius::StaticScope.of_sender nesting = [] while scope and scope.module != Object nesting << scope.module scope = scope.parent end nesting end
A static scope also has an addition property called current_module, which is used during class_eval to define which module the runtime should add new methods to.
Adding Marshal.dump support to a static scope is therefore quite easy:
class Rubinius::StaticScope def marshal_dump [@module, @current_module, @parent] end def marshal_load(array) @module, @current_module, @parent = array end end
These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don’t show up in the instance_variables list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.
Rubinius::CompiledMethod
A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.
It’s actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.
class Rubinius::CompiledMethod def _dump(depth) Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)]) end def self._load(string) scope, dump = Marshal.load(string) cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump) cm.scope = scope cm end end
Rubinius::VariableScope
A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self, and several other characteristics.
I wrote about the variable scope before. It’s one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.
Dumping and loading the VariableScope is also easy:
class VariableScope def _dump(depth) Marshal.dump([@method, @module, @parent, @self, nil, locals]) end def self._load(string) VariableScope.synthesize *Marshal.load(string) end end
The synthesize method is new to Rubinius master; getting a new variable scope previously required synthesizing its locals using class_eval, and the new method is better.
Rubinius::BlockEnvironment
A Proc is basically nothing but a wrapper around a Rubinius::BlockEnvironment, which wraps up all of the objects we’ve been working with so far. Its scope attribute is a VariableScope and its code attribute is a CompiledMethod.
Dumping it should be quite familiar by now.
class BlockEnvironment def marshal_dump [@scope, @code] end def marshal_load(array) scope, code = *array under_context scope, code end end
The only thing new here is the under_context method, which gives a BlockEnvironment its variable scope and compiled method. Note that we dumped the static scope along with the compiled method above.
Proc
Finally, a Proc is just a wrapper around a BlockEnvironment, so dumping it is easy:
class Proc def _dump(depth) Marshal.dump(@block) end def self._load(string) block = Marshal.load(string) self.__from_block__(block) end end
The __from_block__ method constructs a new Proc from a BlockEnvironment.
So there you have it. Dumping and reloading Proc objects in pure Ruby using Rubinius! (the full source is at https://gist.github.com/1378518).
A Proposal for ES.next Proposals
September 13th, 2011
Over the past few years, I have occasionally expressed frustration (in public and private) about the process for approving new features to the next edition of ECMAScript. In short, the process is extremely academic in nature, and is peppered with inside baseball terms that make it nearly impossible for lay developers to provide feedback about proposed new features. In general, this frustration was met with a general assumption that the current process works the way it does for good reason, and that academic descriptions of the new features was the correct (and only) way to properly discuss them.
I have nothing against new features being described in the language of implementors, but I would like to propose some additions to the current process that would make it significantly easier for language consumers (especially framework and library implementors) to provide timely feedback about the proposal in the hope of making an impact before it’s too late.
I would like proposals for new features to have the following elements, in addition to whatever elements are already normally included (such as BNF for any new syntax).
What Problem Are We Solving?
At a high level, what can language users do now that they could not do before. In some cases, proposals may provide simpler or more convenient ways to achieve already-possible goals. These kinds of proposals are often just as important. For example, there is a current proposal to provide a new syntax for class creation. In this case, the new class syntax significantly improves the experience of building a common JavaScript construct.
Provide Non-Trivial Example Code Showing the Problem
If the proposal is solving a problem that exists in the wild, it should be possible to identify non-trivial examples of the problem rearing its head. At the very least, the process of identifying or synthesizing these examples will help language users understand what real-world problems the proposal is attempting to solve. At best, finding real-world examples will help refine the proposal at an early stage.
Show How the Example Code is Improved With the Proposal
After identifying or synthesizing example code to illustrate the problem, show how the problem would be improved if the proposal was accepted. In some cases, the problem is as simple as “needing a large library to perform this operation” and the solution is “building common functionality into the language”. In an example from a related field, the DOM library, the problem addressed by querySelectorAll was “many incompatible implementations of a CSS3 selector library”. The solution was to build the functionality into the browser.
In this case, a mistake in the querySelectorAll specification, which was resolved by the addition of queryScopedSelectorAll in the Selectors API Level 2 could have been addressed ahead of time by evaluating real-life code using selector engine libraries. Of course, the DOM library is not the same as the language specification, so the example is merely an analogy.
What are Alternate ES3 (or ES5) Solutions to the Same Problem?
Simply enumerate the ways that existing JavaScript developers have attempted to resolve the problem without language support. In small language changes, this overlaps considerably with the previous questions. By having proposal authors enumerate existing solutions to the problem, it will be easier for language users to identify the scope of the solution.
This will allow language users to provide feedback about how the solution on offer stacks up compared to existing pure-JS solutions.
Are there any restrictions that do not exist in original pure-JS solutions?
Are there any restrictions in the proposal that limit its utility as a solution to the problem in question, especially if those restrictions do not apply to solutions currently used by language users.
If the Proposal is For New Syntax, What if Anything Does it Desugar To?
Also, if the proposal desugars, why choose this particular desugaring as opposed to alternatives?
If New Syntax, Can it Be Compiled to ES3 (or ES5)?
If the proposal can desugar to an older version of the specification, can a source-to-source translator be written? If so, is there a reference implementation of a source-to-source translator written in that version?
By writing such a source-to-source translator, existing language users can experiment with the new syntax easily in a browser environment without requiring a separate compilation pass. This also allows users to build an in-browser translation UI (similar to try CoffeeScript), which can improve general understanding of the new syntax and produce important feedback.
To be more specific, what I would like to see here is a general-purpose source-to-source translation engine written in ES3 with a mechanism for plugging in translation passes for specific features. If new features come with translation passes, it would be trivial for language users to try out new features incrementally in production applications (with a nice development-time workflow). This would provide usability feedback at an early enough stage for it to be useful.
If a New Standard Library, Can it Be Polyfilled to ES3 (or ES5)?
If the proposal is for a new library whose syntax is valid in an earlier version of the specification, can it be implemented in terms of the existing primitives available in that version. If necessary, primitives not defined by the language, but provided historically by web browsers, can be used instead. The goal is to provide shims for older browsers so that a much broader group of people can experiment with the APIs and provide feedback.
In general, new libraries that parse in older versions of the specifications should also come with a well-defined mechanism to detect whether the feature is present, to make it easy for library and framework vendors, as well as the general public, to opt their users into the new features where appropriate.
Even if a fully backwards-compatible shim cannot be provided, it is still useful to provide a partial implementation together with a feature detection mechanism. At the very least, error-checking shims can be useful, so language users can easily understand the interface to the proposed library.
Does the Proposal Change any Existing Semantics?
In some cases, the proposal unavoidably changes existing semantics. For example, ES5 changed the semantics of an indirect call to eval (a call to an alias to eval, such as x = eval; x('some code') to use the global environment for the evaluated code. In ES3, indirect calls to eval behaved the same as direct calls to eval.
These cases are rare, and in most cases, require an explicit opt-in (such as the directive "use strict";).
When such changes are made, especially when they do not require an opt-in, they should be explicitly called out in the proposal to gather feedback about their likely impact on existing code. Even when they require an opt-in, information about the frequency of their use could be useful to assess the difficulty of opting in.
Since these opt-ins often enable new features as well as changing existing semantics, understanding the impact of the opt-in on existing code would help language users assess their overall utility and timeframe for adoption. This information could also help drive these decisions.
New Hope for The Ruby Specification
September 5th, 2011
For a few years, a group of Japanese academics have been working on formalizing the Ruby programming language into a specification they hoped would be accepted by ISO. From time to time, I have read through it, and I had one major concern.
Because Ruby 1.9 was still in a lot of flux when they were drafting the specification, the authors left a lot of details out. Unfortunately, some of these details are extremely important. Here’s one example, from the specification of String#[]:
Behavior:
a) If the length of args is 0 or larger than 2, raise a
direct instance of the class ArgumentError.
b) Let P be the first element of args. Let n be the length
of the receiver.
c) If P is an instance of the class Integer, let b be the
value of P.
1) If the length of args is 1:
i) If b is smaller than 0, increment b by n. If b is
still smaller than 0, return nil.
ii) If b >= n, return nil.
iii) Create an instance of the class Object which
represents the bth character of
the receiver and return this instance.The important bit here is c(1)(iii), which says to create “an instance of the class Object which represents the btw character of the receiver”. The reason for this ambiguity, as best as I can determine, is that Ruby 1.8 and Ruby 1.9 differ on the behavior:
1.8 >> "hello"[1] => 101 1.9 >> "hello"[1] => "e"
Of course, neither of these results in a direct instance of the class Object, but since Fixnums and Strings are both “instances of the class Object”, this is technically true. Unfortunately, any real-life Ruby code will need to know what actual object this method will return.
Another very common reason for unspecified behaviors is a failure to specify Ruby’s coercion protocol, so String#+ is unspecified if the other is not a String, even though all Ruby implementations will call to_str on the other to attempt to coerce it. The coercion protocol has been underspecified for a long time, and it’s understandable that the group punted on it, but because it is heavily relied on by real-life code, it is important that we actually describe the behavior.
This week, I am in Matsue in Japan for RubyWorld, and I was glad to learn that the group working on the ISO specification sees the current work as a first step that will continue with a more rigid specification of currently “unspecified” behavior based on Ruby 1.9.
The word “unspecified” appears 170 times in the current draft of the Ruby specification. I hope that the next version will eliminate most if not all of these unspecified behaviors in favor of explicit behavior or explicitly requiring an exception to be thrown. In cases that actually differ between implementations (for instance, Rubinius allows Class to be subclassed), I would hope that these unspecified behaviors be the subject of some discussion at the implementor level.
In any event, I am thrilled at the news that the Ruby specification will become less ambiguous in the future!
Understanding “Prototypes” in JavaScript
August 12th, 2011
For the purposes of this post, I will be talking about JavaScript objects using syntax defined in ECMAScript 5.1. The basic semantics existed in Edition 3, but they were not well exposed.
A Whole New Object
In JavaScript, objects are pairs of keys and values (in Ruby, this structure is called a Hash; in Python, it’s called a dictionary). For example, if I wanted to describe my name, I could have an object with two keys: `firstName` would point to “Yehuda” and `lastName` would point to “Katz”. Keys in a JavaScript object are Strings.
To create the simplest new object in JavaScript, you can use Object.create:
var person = Object.create(null); // this creates an empty objects
Why didn’t we just use var person = {};? Stick with me! To look up a value in the object by key, use bracket notation. If there is no value for the key in question, JavaScript will return `undefined`.
person['name'] // undefined
If the String is a valid identifier[1], you can use the dot form:
person.name // undefined
[1] in general, an identifier starts with a unicode letter, $, _, followed by any of the starting characters or numbers. A valid identifier must also not be a reserved word. There are other allowed characters, such as unicode combining marks, unicode connecting punctuation, and unicode escape sequences. Check out the spec for the full details
Adding values
So now you have an empty object. Not that useful, eh? Before we can add some properties, we need to understand what a property (what the spec calls a “named data property”) looks like in JavaScript.
Obviously, a property has a name and a value. In addition, a property can be enumerable, configurable and writable. If a value is enumerable, it will show up when enumerating over an object using a for(prop in obj) loop. If a property is writable, you can replace it. If a property is configurable, you can delete it or change its other attributes.
In general, when we create a new property, we will want it to be enumerable, configurable, and writable. In fact, prior to ECMAScript 5, that was the only kind of property a user could create directly.
We can add a property to an object using Object.defineProperty. Let’s add a first name and last name to our empty object:
var person = Object.create(null); Object.defineProperty(person, 'firstName', { value: "Yehuda", writable: true, enumerable: true, configurable: true }); Object.defineProperty(person, 'lastName', { value: "Katz", writable: true, enumerable: true, configurable: true });
Obviously, this is extremely verbose. We can make it a bit less verbose by eliminating the common defaults:
var config = { writable: true, enumerable: true, configurable: true }; var defineProperty = function(obj, name, value) { config.value = value; Object.defineProperty(obj, name, config); } var person = Object.create(null); defineProperty(person, 'firstName', "Yehuda"); defineProperty(person, 'lastName', "Katz");
Still, this is pretty ugly to create a simple property list. Before we can get to a prettier solution, we will need to add another weapon to our JavaScript object arsenal.
Prototypes
So far, we’ve talked about objects as simple pairs of keys and values. In fact, JavaScript objects also have one additional attribute: a pointer to another object. We call this pointer the object’s prototype. If you try to look up a key on an object and it is not found, JavaScript will look for it in the prototype. It will follow the “prototype chain” until it sees a null value. In that case, it returns undefined.
You’ll recall that we created a new object by invoking Object.create(null). The parameter tells JavaScript what it should set as the Object’s prototype. You can look up an object’s prototype by using Object.getPrototypeOf:
var man = Object.create(null); defineProperty(man, 'sex', "male"); var yehuda = Object.create(man); defineProperty(yehuda, 'firstName', "Yehuda"); defineProperty(yehuda, 'lastName', "Katz"); yehuda.sex // "male" yehuda.firstName // "Yehuda" yehuda.lastName // "Katz" Object.getPrototypeOf(yehuda) // returns the man object
We can also add functions that we share across many objects this way:
var person = Object.create(null); defineProperty(person, 'fullName', function() { return this.firstName + ' ' + this.lastName; }); // this time, let's make man's prototype person, so all // men share the fullName function var man = Object.create(person); defineProperty(man, 'sex', "male"); var yehuda = Object.create(man); defineProperty(yehuda, 'firstName', "Yehuda"); defineProperty(yehuda, 'lastName', "Katz"); yehuda.sex // "male" yehuda.fullName() // "Yehuda Katz"
Setting Properties
Since creating a new writable, configurable, enumerable property is pretty common, JavaScript makes it easy to do so using assignment syntax. Let’s update the previous example using assignment instead of defineProperty:
var person = Object.create(null); // instead of using defineProperty and specifying writable, // configurable, and enumerable, we can just assign the // value directly and JavaScript will take care of the rest person['fullName'] = function() { return this.firstName + ' ' + this.lastName; }); // this time, let's make man's prototype person, so all // men share the fullName function var man = Object.create(person); man['sex'] = "male"; var yehuda = Object.create(man); yehuda['firstName'] = "Yehuda"; yehuda['lastName'] = "Katz"; yehuda.sex // "male" yehuda.fullName() // "Yehuda Katz"
Just like when looking up properties, if the property you are defining is an identifier, you can use dot syntax instead of bracket syntax. For instance, you could say man.sex = "male" in the example above.
Object Literals
Still, having to set a number of properties every time can get annoying. JavaScript provides a literal syntax for creating an object and assigning properties to it at one time.
var person = { firstName: "Paul", lastName: "Irish" }
This syntax is approximately sugar for:
var person = Object.create(Object.prototype); person.firstName = "Paul"; person.lastName = "Irish";
The most important thing about the expanded form is that object literals always set the newly created object’s prototype to an object located at Object.prototype. Internally, the object literal looks like this:

The default Object.prototype dictionary comes with a number of the methods we have come to expect objects to contain, and through the magic of the prototype chain, all new objects created as object literal will contain these properties. Of course, objects can happily override them by defining the properties directly. Most commonly, developers will override the toString method:
var alex = { firstName: "Alex", lastName: "Russell" }; person.toString() // "[object Object]" var brendan = { firstName: "Brendan", lastName: "Eich", toString: function() { return "Brendan Eich"; } }; brendan.toString() // "Brendan Eich"
This is especially useful because a number of internal coercion operations use a supplied toString method.
Unfortunately, this literal syntax only works if we are willing to make the new object’s prototype Object.prototype. This eliminates the benefits we saw earlier of sharing properties using the prototype. In many cases, the convenience of the simple object literal outweighs this loss. In other cases, you will want a simple way to create a new object with a particular prototype. I’ll explain it right afterward:
var fromPrototype = function(prototype, object) { var newObject = Object.create(prototype); for (var prop in object) { if (object.hasOwnProperty(prop)) { newObject[prop] = object[prop]; } } return newObject; }; var person = { toString: function() { return this.firstName + ' ' + this.lastName; } }; var man = fromPrototype(person, { sex: "male" }); var jeremy = fromPrototype(man, { firstName: "Jeremy", lastName: "Ashkenas" }); jeremy.sex // "male" jeremy.toString() // "Jeremy Ashkenas"
Let’s deconstruct the fromPrototype method. The goal of this method is to create a new object with a set of properties, but with a particular prototype. First, we will use Object.create() to create a new empty object, and assign the prototype we specify. Next, we will enumerate all of the properties in the object that we supplied, and copy them over to the new object.
Remember that when you create an object literal, like the ones we are passing in to fromPrototype, it will always have Object.prototype as its prototype. By default, the properties that JavaScript includes on Object.prototype are not enumerable, so we don’t have to worry about valueOf showing up in our loop. However, since Object.prototype is an object like any other object, anyone can define a new property on it, which may (and probably would) be marked enumerable.
As a result, while we are looping through the properties on the object we passed in, we want to restrict our copying to properties that were defined on the object itself, and not found on the prototype. JavaScript includes a method called hasOwnProperty on Object.prototype to check whether a property was defined on the object itself. Since object literals will always have Object.prototype as their prototype, you can use it in this manner.
The object we created in the example above looks like this:

Native Object Orientation
At this point, it should be obvious that prototypes can be used to inherit functionality, much like traditional object oriented languages. To facilitate using it in this manner, JavaScript provides a new operator.
In order to facilitate object oriented programming, JavaScript allows you to use a Function object as a combination of a prototype to use for the new object and a constructor function to invoke:
var Person = function(firstName, lastName) { this.firstName = firstName; this.lastName = lastName; } Person.prototype = { toString: function() { return this.firstName + ' ' + this.lastName; } }
Here, we have a single Function object that is both a constructor function and an object to use as the prototype of new objects. Let’s implement a function that will create new instances from this Person object:
function newObject(func) { // get an Array of all the arguments except the first one var args = Array.prototype.slice.call(arguments, 1); // create a new object with its prototype assigned to func.prototype var object = Object.create(func.prototype); // invoke the constructor, passing the new object as 'this' // and the rest of the arguments as the arguments func.apply(object, args); // return the new object return object; } var brendan = newObject(Person, "Brendan", "Eich"); brendan.toString() // "Brendan Eich"
The new operator in JavaScript essentially does this work, providing a syntax familiar to those comfortable with traditional object oriented languages:
var mark = new Person("Mark", "Miller"); mark.toString() // "Mark Miller"
In essence, a JavaScript “class” is just a Function object that serves as a constructor plus an attached prototype object. I mentioned before that earlier versions of JavaScript did not have Object.create. Since it is so useful, people often created something like it using the new operator:
var createObject = function (o) { // we only want the prototype part of the `new` // behavior, so make an empty constructor function F() {} // set the function's `prototype` property to the // object that we want the new object's prototype // to be. F.prototype = o; // use the `new` operator. We will get a new // object whose prototype is o, and we will // invoke the empty function, which does nothing. return new F(); };
I really love that ECMAScript 5 and newer versions have begun to expose internal aspects of the implementation, like allowing you to directly define non-enumerable properties or define objects directly using the prototype chain.
Understanding JavaScript Function Invocation and “this”
August 11th, 2011
Over the years, I’ve seen a lot of confusion about JavaScript function invocation. In particular, a lot of people have complained that the semantics of `this` in function invocations is confusing.
In my opinion, a lot of this confusion is cleared up by understanding the core function invocation primitive, and then looking at all other ways of invoking a function as sugar on top of that primitive. In fact, this is exactly how the ECMAScript spec thinks about it. In some areas, this post is a simplification of the spec, but the basic idea is the same.
The Core Primitive
First, let’s look at the core function invocation primitive, a Function’s call method[1]. The call method is relatively straight forward.
- Make an argument list (
argList) out of parameters 1 through the end - The first parameter is
thisValue - Invoke the function with
thisset tothisValueand theargListas its argument list
For example:
function hello(thing) { console.log(this + " says hello " + thing); } hello.call("Yehuda", "world") //=> Yehuda says hello world
As you can see, we invoked the hello method with this set to "Yehuda" and a single argument "world". This is the core primitive of JavaScript function invocation. You can think of all other function calls as desugaring to this primitive. (to “desugar” is to take a convenient syntax and describe it in terms of a more basic core primitive).
[1] In the ES5 spec, the call method is described in terms of another, more low level primitive, but it’s a very thin wrapper on top of that primitive, so I’m simplifying a bit here. See the end of this post for more information.
Simple Function Invocation
Obviously, invoking functions with call all the time would be pretty annoying. JavaScript allows us to invoke functions directly using the parens syntax (hello("world"). When we do that, the invocation desugars:
function hello(thing) { console.log("Hello " + thing); } // this: hello("world") // desugars to: hello.call(window, "world");
This behavior has changed in ECMAScript 5 only when using strict mode[2]:
// this: hello("world") // desugars to: hello.call(undefined, "world");
The short version is: a function invocation like fn(...args) is the same as fn.call(window [ES5-strict: undefined], ...args).
Note that this is also true about functions declared inline: (function() {})() is the same as (function() {}).call(window [ES5-strict: undefined).
[2] Actually, I lied a bit. The ECMAScript 5 spec says that undefined is (almost) always passed, but that the function being called should change its thisValue to the global object when not in strict mode. This allows strict mode callers to avoid breaking existing non-strict-mode libraries.
Member Functions
The next very common way to invoke a method is as a member of an object (person.hello()). In this case, the invocation desugars:
var person = { name: "Brendan Eich", hello: function(thing) { console.log(this + " says hello " + thing); } } // this: person.hello("world") // desugars to this: person.hello.call(person, "world");
Note that it doesn't matter how the hello method becomes attached to the object in this form. Remember that we previously defined hello as a standalone function. Let's see what happens if we attach is to the object dynamically:
function hello(thing) { console.log(this + " says hello " + thing); } person = { name: "Brendan Eich" } person.hello = hello; person.hello("world") // still desugars to person.hello.call(person, "world") hello("world") // "[object DOMWindow]world"
Notice that the function doesn't have a persistent notion of its 'this'. It is always set at call time based upon the way it was invoked by its caller.
Using Function.prototype.bind
Because it can sometimes be convenient to have a reference to a function with a persistent this value, people have historically used a simple closure trick to convert a function into one with an unchanging this:
var person = { name: "Brendan Eich", hello: function(thing) { console.log(this.name + " says hello " + thing); } } var boundHello = function(thing) { return person.hello.call(person, thing); } boundHello("world");
Even though our boundHello call still desugars to boundHello.call(window, "world"), we turn right around and use our primitive call method to change the this value back to what we want it to be.
We can make this trick general-purpose with a few tweaks:
var bind = function(func, thisValue) { return function() { return func.apply(thisValue, arguments); } } var boundHello = bind(person.hello, person); boundHello("world") // "Brendan Eich says hello world"
In order to understand this, you just need two more pieces of information. First, arguments is an Array-like object that represents all of the arguments passed into a function. Second, the apply method works exactly like the call primitive, except that it takes an Array-like object instead of listing the arguments out one at a time.
Our bind method simply returns a new function. When it is invoked, our new function simply invokes the original function that was passed in, setting the original value as this. It also passes through the arguments.
Because this was a somewhat common idiom, ES5 introduced a new method bind on all Function objects that implements this behavior:
var boundHello = person.hello.bind(person); boundHello("world") // "Brendan Eich says hello world"
This is most useful when you need a raw function to pass as a callback:
var person = { name: "Alex Russell", hello: function() { console.log(this.name + " says hello world"); } } $("#some-div").click(person.hello.bind(person)); // when the div is clicked, "Alex Russell says hello world" is printed
This is, of course, somewhat clunky, and TC39 (the committee that works on the next version(s) of ECMAScript) continues to work on a more elegant, still-backwards-compatible solution.
On jQuery
Because jQuery makes such heavy use of anonymous callback functions, it uses the call method internally to set the this value of those callbacks to a more useful value. For instance, instead of receiving window as this in all event handlers (as you would without special intervention), jQuery invokes call on the callback with the element that set up the event handler as its first parameter.
This is extremely useful, because the default value of this in anonymous callbacks is not particularly useful, but it can give beginners to JavaScript the impression that this is, in general a strange, often mutated concept that is hard to reason about.
If you understand the basic rules for converting a sugary function call into a desugared func.call(thisValue, ...args), you should be able to navigate the not so treacherous waters of the JavaScript this value.

PS: I Cheated
In several places, I simplified the reality a bit from the exact wording of the specification. Probably the most important cheat is the way I called func.call a "primitive". In reality, the spec has a primitive (internally referred to as [[Call]]) that both func.call and [obj.]func() use.
However, take a look at the definition of func.call:
- If IsCallable(func) is false, then throw a TypeError exception.
- Let argList be an empty List.
- If this method was called with more than one argument then in left to right order starting with arg1 append each argument as the last element of argList
- Return the result of calling the [[Call]] internal method of func, providing thisArg as the this value and argList as the list of arguments.
As you can see, this definition is essentially a very simple JavaScript language binding to the primitive [[Call]] operation.
If you look at the definition of invoking a function, the first seven steps set up thisValue and argList, and the last step is: "Return the result of calling the [[Call]] internal method on func, providing thisValue as the this value and providing the list argList as the argument values."
It's essentially identical wording, once the argList and thisValue have been determined.
I cheated a bit in calling call a primitive, but the meaning is essentially the same as had I pulled out the spec at the beginning of this article and quoted chapter and verse.
There are also some additional cases (most notably involving with) that I didn't cover here.
What’s Up With All These Changes in Rails?
June 14th, 2011
Yesterday, there was a blog post entitled “What the Hell is Happening to Rails” that stayed at the number one spot on Hacker News for quite a while. The post and many (but not most) the comments on the post reflect deep-seated concern about the recent direction of Rails. Others have addressed the core question about change in the framework, but I’d like to address questions about specific changes that came up in the post and comments.
The intent of this post is not to nitpick the specific arguments that were made, or to address the larger question of how much change is appropriate in the framework, but rather to provide some background on some of the changes that have been made since Rails 2.3 and to explain why we made them.
Block Helpers
I too get a feeling of “change for the sake of change” from Rails at times. That’s obviously not something they’re doing, as all the changes have some motivation, but at times it feels a bit like churn.
At one point in time, you did forms like <%= form …. and then they switched to <% form …. do and now they’ve switched back to <%= form … do again.
Also, the upgrade to Rails 3 is not an easy one. Yeah, you get some nice stuff, but because it’s so painful, it’s not happening for a lot of people, which is causing more problems.
Prior to Rails 3.0, Rails never used <= form_for because it was technically very difficult to make it work. I wrote a post about it in Summer 2009 that walked through the technical problems. The short version is that every ERB template is compiled into a Ruby method, and reliably converting <%= with blocks proved to be extremely complicated.
However, knowing when to use <% and when to use <= caused major issues for new developers, and it was a major 3.0 priority to make this work. In addition, because of the backwards compatibility issue, we went to extreme Ruby-exploiting lengths to enable deprecation warnings about the change, so that we could fix the issue for new apps, but also not break old apps.
The night José and I figured out how to do this (the Engine Yard party at Mountain West RubyConf 2009), we were pretty close to coming to the conclusion that it couldn’t be done short of using a full language parser in ERB, which should give you some sense of how vexing a problem it was for us.
Performance
The general disregard for improving performance, inherited from Ruby, is also something endemic from a while back.
Yes, there have been some performance regressions in Rails 3.0. However, the idea that the Rails core team doesn’t care about performance, and neither does the Ruby team doesn’t pass the smell test.
Aaron Patterson, the newest core team member, worked full time for almost a year to get the new ActiveRecord backend into decent performance shape. Totally new code often comes with some performance regressions, and the changes to ActiveRecord were important and a long-time coming. Many of us didn’t see the magnitude of the initial problem until it was too late for 3.0, but we take the problem extremely seriously.
Ruby core (“MRI”) itself has sunk an enormous amount of time into performance improvements in Ruby 1.9, going so far as to completely rewrite the core VM from scratch. This resulted in significant performance improvements, and the Ruby team continues to work on improvements to performance and to core systems like the garbage collector.
The Ruby C API poses some long-term problems for the upper-bound of Ruby performance improvements, but the JRuby and Rubinius projects are showing how you can use Ruby C extensions inside a state-of-the-art virtual machine. Indeed, the JRuby and Rubinius projects show that the Ruby community both cares about, and is willing to invest significant energy into improving the performance of Ruby.
Heroku and the Asset Pipeline
The assets pipeline feels horrible, it’s really slow. I upgraded to Rails 3.1rc, realized it fights with Heroku unless upgrading to the Cedar stack
The problem with Heroku is that the default Gemfile that comes with Rails 3.1 currently requires a JavaScript runtime to boot, even in production. This is clearly wrong and will be fixed posthaste. Requiring Rails apps to have node or a compiled v8 in production is an unacceptable burden.
On the flip side, the execjs gem, which manages compiling CoffeeScript (and more importantly minifying your JavaScript), is actually a pretty smart piece of work. It turns out that both Mac OSX and Windows ship with usable JavaScript binaries, so in development, most Rails users will already have a JavaScript library ready to use.
It’s worth noting that a JavaScript engine is needed to run “uglify.js”, the most popular and most effective JavaScript minifier. It is best practice to minify your JavaScript before deployment, so you can feel free to format and comment your code as you like without effecting the payload. You can learn more about minification in this excellent post by Steve Souders from a few years back. Rails adding minification by default is an unambiguous improvement in the workflow of Rails applications, because it makes it easy (almost invisible) to do something that everyone should be doing, but which has previously been something of a pain.
Again, making node a dependency in production is clearly wrong, and will be removed before the final release.
Change for Change’s Sake
The problem with Rails is not the pace of change so much as the wild changes of direction it takes, sometimes introducing serious performance degradations into official releases. Sometimes it’s hard to see a guiding core philosophy other than the fact they want to be on the shiny edge
When Rails shipped, it came with a number of defaults:
- ActiveRecord
- Test::Unit
- ERB
- Prototype
Since 2004, alternatives to all of those options have been created, and in many cases (Rspec, jQuery, Haml) have become somewhat popular. Rails 3.0 made no changes to the defaults, despite much clamoring for a change from Prototype to jQuery. Rails 3.1 changed to jQuery only when it became overwhelmingly clear that jQuery was the de facto standard on the web.
As someone who has sat in on many discussions about changing defaults, I can tell you that defaults in Rails are not changed lightly, and certainly not “to be on the shiny edge.” In fact, I think that Rails is more conservative than most would expect in changing defaults.
Right, but the problem is that there doesn’t seem to be a ‘right’ way, That’s the problem.
We were all prototype a few years ago, now it jquery … we (well I) hadn’t heard of coffeescript till a few months ago and now its a default option in rails, The way we were constructing ActiveRecord finders had been set all through Rails 2, now we’ve changed it, the way we dealt with gems was set all through rails 2 now its changed completely in Rails 3.
I like change, I like staying on the cutting edge of web technologies, but I don’t want to learn something, only to discard it and re-do it completely to bring it up to date with a new way of doing things all the time.
First, jQuery has become the de facto standard on the web. As I said earlier, Rails resisted making this change in 3.0, despite a lot of popular demand, and the change to jQuery is actually an example of the stability of Rails’ default choices over time, rather than the opposite.
Changes to gem handling evolved as the Rails community evolved to use gems more. In Rails 1, all extensions were implemented as plugins that got pulled out of svn and dumped into your project. This didn’t allow for versioning or dependencies, so Rails 2 introduced first-class support for Rails plugins as gems.
During the Rails 2 series, Rails added a dependency on Rack, which caused serious problems when Rails was used with other gems that rely on Rack, due to the way that raw Rubygems handles dependency activation. Because Rails 3 uses gem dependencies more extensively, we spent a year building bundler, which adds per-application dependency resolution to Rubygems. This was simply the natural evolution of the way that Rails has used dependencies over time.
The addition of CoffeeScript is interesting, because it’s a pretty young technology, but it’s also not really different from shipping a new template handler. When you create a new Rails app, the JavaScript file is a regular JS file, and the asset compilation and dependency support does not require CoffeeScript. Shipping CoffeeScript is basically like shipping Builder: should you want it, it’s there for you. Since we want to support minification out of the box anyway, CoffeeScript doesn’t add any new requirements. And since it’s just there in the default Gemfile, as opposed to included in Rails proper like Builder, turning it off (if you really want to) is as simple as removing a line in your Gemfile. Nothing scary here.
ActiveRecord Changes
The way we were constructing ActiveRecord finders had been set all through Rails 2, now we’ve changed it
…
The Rails core team does seem to treat the project as if it’s a personal playground
One of the biggest problems with ActiveRecord was the way it internally used Strings to represent queries. This meant that changes to queries often required gsubing String to make simple changes. Internally, it was a mess, and it expressed itself in the public API in how conditions were generated, and more importantly how named scopes were created.
One goal of the improvements in Rails 3 was to get rid of the ad-hoc query generation in Rails 2 and replace it with something better. ActiveRelation, this library, was literally multiple years in the making, and a large amount of the energy in the Rails 3.0 process was spent on integrating ActiveRelation.
From a user-facing perspective, we wanted to unify all of the different ways that queries were made. This means that scopes, class methods, and one-time-use queries all use the same API. As in the <= case, a significant amount of effort was spent on backwards compatibility with Rails 2.3. In fact, we decided to hold onto support for the old API as least as long as Rails 3.2, in order to soften the community transition to the new API.
In general, we were quite careful about backwards compatibility in Rails 3.0, and while a project as complex as Rails was not going to be perfect in this regard, characterizing the way we handled this as “irresponsible” or “playground” disregards the tremendous amount of work and gymnastics that the core team and community contributors put into supporting Rails 2.3 APIs across the entire codebase when changes were made.
