2 min read

New Hope for The Ruby Specification

For a few years, a group of Japanese academics have been working on formalizing the Ruby programming language into a specification they hoped would be accepted by ISO. From time to time, I have read through it, and I had one major concern.

Because Ruby 1.9 was still in a lot of flux when they were drafting the specification, the authors left a lot of details out. Unfortunately, some of these details are extremely important. Here's one example, from the specification of String#[]:

Behavior:

a) If the length of args is 0 or larger than 2, raise a
   direct instance of the class ArgumentError.
b) Let P be the first element of args. Let n be the length
   of the receiver.
c) If P is an instance of the class Integer, let b be the
   value of P.
   1) If the length of args is 1:
      i)   If b is smaller than 0, increment b by n. If b is
           still smaller than 0, return nil.
      ii)  If b >= n, return nil.
      iii) Create an instance of the class Object which
           represents the bth character of
           the receiver and return this instance.

The important bit here is c(1)(iii), which says to create "an instance of the class Object which represents the btw character of the receiver". The reason for this ambiguity, as best as I can determine, is that Ruby 1.8 and Ruby 1.9 differ on the behavior:

1.8 >> "hello"[1]
=> 101

1.9 >> "hello"[1]
=> "e"

Of course, neither of these results in a direct instance of the class Object, but since Fixnums and Strings are both "instances of the class Object", this is technically true. Unfortunately, any real-life Ruby code will need to know what actual object this method will return.

Another very common reason for unspecified behaviors is a failure to specify Ruby's coercion protocol, so String#+ is unspecified if the other is not a String, even though all Ruby implementations will call to_str on the other to attempt to coerce it. The coercion protocol has been underspecified for a long time, and it's understandable that the group punted on it, but because it is heavily relied on by real-life code, it is important that we actually describe the behavior.

This week, I am in Matsue in Japan for RubyWorld, and I was glad to learn that the group working on the ISO specification sees the current work as a first step that will continue with a more rigid specification of currently "unspecified" behavior based on Ruby 1.9.

The word "unspecified" appears 170 times in the current draft of the Ruby specification. I hope that the next version will eliminate most if not all of these unspecified behaviors in favor of explicit behavior or explicitly requiring an exception to be thrown. In cases that actually differ between implementations (for instance, Rubinius allows Class to be subclassed), I would hope that these unspecified behaviors be the subject of some discussion at the implementor level.

In any event, I am thrilled at the news that the Ruby specification will become less ambiguous in the future!