Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; he spends his daytime hours at the startup he founded, Tilde Inc.. Yehuda is co-author of best-selling jQuery in Action and Rails 3 in Action. He spends most of his time hacking on open source—his main projects, like Thor, Handlebars and Janus—or traveling the world doing evangelism work. He can be found on Twitter as @wycats and on Github.

New Hope for The Ruby Specification

For a few years, a group of Japanese academics have been working on formalizing the Ruby programming language into a specification they hoped would be accepted by ISO. From time to time, I have read through it, and I had one major concern.

Because Ruby 1.9 was still in a lot of flux when they were drafting the specification, the authors left a lot of details out. Unfortunately, some of these details are extremely important. Here’s one example, from the specification of String#[]:

Behavior:
 
a) If the length of args is 0 or larger than 2, raise a
   direct instance of the class ArgumentError.
b) Let P be the first element of args. Let n be the length
   of the receiver.
c) If P is an instance of the class Integer, let b be the
   value of P.
   1) If the length of args is 1:
      i)   If b is smaller than 0, increment b by n. If b is
           still smaller than 0, return nil.
      ii)  If b >= n, return nil.
      iii) Create an instance of the class Object which
           represents the bth character of
           the receiver and return this instance.

The important bit here is c(1)(iii), which says to create “an instance of the class Object which represents the btw character of the receiver”. The reason for this ambiguity, as best as I can determine, is that Ruby 1.8 and Ruby 1.9 differ on the behavior:

1.8 >> "hello"[1]
=> 101
 
1.9 >> "hello"[1]
=> "e"

Of course, neither of these results in a direct instance of the class Object, but since Fixnums and Strings are both “instances of the class Object”, this is technically true. Unfortunately, any real-life Ruby code will need to know what actual object this method will return.

Another very common reason for unspecified behaviors is a failure to specify Ruby’s coercion protocol, so String#+ is unspecified if the other is not a String, even though all Ruby implementations will call to_str on the other to attempt to coerce it. The coercion protocol has been underspecified for a long time, and it’s understandable that the group punted on it, but because it is heavily relied on by real-life code, it is important that we actually describe the behavior.

This week, I am in Matsue in Japan for RubyWorld, and I was glad to learn that the group working on the ISO specification sees the current work as a first step that will continue with a more rigid specification of currently “unspecified” behavior based on Ruby 1.9.

The word “unspecified” appears 170 times in the current draft of the Ruby specification. I hope that the next version will eliminate most if not all of these unspecified behaviors in favor of explicit behavior or explicitly requiring an exception to be thrown. In cases that actually differ between implementations (for instance, Rubinius allows Class to be subclassed), I would hope that these unspecified behaviors be the subject of some discussion at the implementor level.

In any event, I am thrilled at the news that the Ruby specification will become less ambiguous in the future!

2 Responses to “New Hope for The Ruby Specification”

what exactly are the benefits of ISO acceptance in this case?
or is having a proper spec the main deal?

Thanks for sharing. I think that the string encoding in Ruby 1.9.3 makes Ruby a lot less flexible as there are many databases out there using Ruby 1.8.6/7 that have all kinds of encoded strings in their tables. Forcing the maintainer of the database to decide on one encoding is a bit of a headache and not very flexible.

Best
Zeno

Leave a Reply

Archives

Categories

Meta