Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; he spends his daytime hours at the startup he founded, Tilde Inc.. Yehuda is co-author of best-selling jQuery in Action and Rails 3 in Action. He spends most of his time hacking on open source—his main projects, like Thor, Handlebars and Janus—or traveling the world doing evangelism work. He can be found on Twitter as @wycats and on Github.
New Hope for The Ruby Specification
September 5th, 2011
For a few years, a group of Japanese academics have been working on formalizing the Ruby programming language into a specification they hoped would be accepted by ISO. From time to time, I have read through it, and I had one major concern.
Because Ruby 1.9 was still in a lot of flux when they were drafting the specification, the authors left a lot of details out. Unfortunately, some of these details are extremely important. Here’s one example, from the specification of
Behavior: a) If the length of args is 0 or larger than 2, raise a direct instance of the class ArgumentError. b) Let P be the ﬁrst element of args. Let n be the length of the receiver. c) If P is an instance of the class Integer, let b be the value of P. 1) If the length of args is 1: i) If b is smaller than 0, increment b by n. If b is still smaller than 0, return nil. ii) If b >= n, return nil. iii) Create an instance of the class Object which represents the bth character of the receiver and return this instance.
The important bit here is
c(1)(iii), which says to create “an instance of the class Object which represents the btw character of the receiver”. The reason for this ambiguity, as best as I can determine, is that Ruby 1.8 and Ruby 1.9 differ on the behavior:
1.8 >> "hello" => 101 1.9 >> "hello" => "e"
Of course, neither of these results in a direct instance of the class
Object, but since Fixnums and Strings are both “instances of the class Object”, this is technically true. Unfortunately, any real-life Ruby code will need to know what actual object this method will return.
Another very common reason for unspecified behaviors is a failure to specify Ruby’s coercion protocol, so
String#+ is unspecified if the
other is not a String, even though all Ruby implementations will call
to_str on the
other to attempt to coerce it. The coercion protocol has been underspecified for a long time, and it’s understandable that the group punted on it, but because it is heavily relied on by real-life code, it is important that we actually describe the behavior.
This week, I am in Matsue in Japan for RubyWorld, and I was glad to learn that the group working on the ISO specification sees the current work as a first step that will continue with a more rigid specification of currently “unspecified” behavior based on Ruby 1.9.
The word “unspecified” appears 170 times in the current draft of the Ruby specification. I hope that the next version will eliminate most if not all of these unspecified behaviors in favor of explicit behavior or explicitly requiring an exception to be thrown. In cases that actually differ between implementations (for instance, Rubinius allows
Class to be subclassed), I would hope that these unspecified behaviors be the subject of some discussion at the implementor level.
In any event, I am thrilled at the news that the Ruby specification will become less ambiguous in the future!