8 min read

Tokaido Status Update: Implementation Details

Hey guys!

Since my last update, Tokaido was fully funded, and I've been hard at work planning, researching and working on Tokaido.

So far, we have a working binary build of Ruby, but no setup chrome. Because the binary build already exists, Terence Lee was able to experiment with it at a recent Rails Girls event, with great success:

Great thanks to Terence to put together a simple installer script that we could use to test whether the core build worked on a wide variety of OSX systems.

One thing that I mentioned in my original proposal was a desire to work closely with others working on related projects. Very soon after my project was announced, I teamed up with Michal Papis of the rvm team to make the core statically compiled distribution something that would work outside of the GUI part of Tokaido.

We decided to use the sm scripting framework to build Tokaido, to make it easy to share code between rvm2, Tokaido, and the Unix Rails Installer. The majority of the work I have done so far has been in researching how to properly build a portable Ruby, and working with Michal to build the solution in terms of the sm framework. The rest of this blog post discusses the results of that research, for those interested.

The discussion in this blog post is specific to Mac OSX.

Portable Build

The hardest technical part of the project is creating a portable binary build of Ruby that can be moved around to various machines. What do I mean by that?

When you compile Ruby using the normal ./configure && make, the resulting binary is not portable between machines for a number of reasons.

Hard-Coded Paths

By default, the compiled Ruby comes with a binary .bundle file for each compiled part of the standard library. For example, Aaron Patterson wrote the psych library as a C library. When you compile Ruby, you get psych.bundle as part of the distribution. When a Ruby program invokes require "psych", the system's dynamic loader will load in psych.bundle.

By default, Ruby hard-codes the path to the Ruby dynamic library (libruby.1.9.1.dylib) into psych.bundle. Since Psych uses C functions from Ruby, this path is used by the dynamic loader to make sure that the require Ruby dependency is available. We can use a tool called otool to see all of the dependencies for a particular library:

$ otool -L psych.bundle 
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin11.3.0/psych.bundle:
	/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/libruby.1.9.1.dylib (compatibility version 1.9.1, current version 1.9.1)
	/Users/wycats/.rvm/usr/lib/libyaml-0.2.dylib (compatibility version 3.0.0, current version 3.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

The second line in the output references the libruby.1.9.1.dylib using an absolute path on my local machine. If I take these binaries and give them to you, the linker won't be able to find libruby and won't load Psych.

In addition to the problem with the compiled .bundle files, the compiler also hardcodes the paths in the Ruby binary itself:

$ otool -L `which ruby`
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/bin/ruby:
	/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/libruby.1.9.1.dylib (compatibility version 1.9.1, current version 1.9.1)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

Finally, the location of the standard library is hardcoded into the Ruby binary:

$ strings /Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/libruby.1.9.1.dylib | grep rvm
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/x86_64-darwin11.3.0
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/vendor_ruby/1.9.1
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/vendor_ruby/1.9.1/x86_64-darwin11.3.0
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/vendor_ruby
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1
/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin11.3.0

Fortunately, the C Ruby folks know about this problem, and include an (undocumented?) flag that you can pass to ./configure, --enable-load-relative. This flag fixes the problems with hardcoded paths:

Instead of creating a separate libruby.1.9.1.dylib that the ruby executable links to, this flag includes the compiled binary code inside of the ruby executable.

$ ./configure --enable-load-relative --prefix=/Users/wycats/Code/ruby/build
... snip ...
$ make && make install
... snip ...
$ otool -L build/bin/ruby                                                  
build/bin/ruby:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

You can see that Ruby still links against a few system dynamic libraries. These dynamic libraries are extremely stable in OSX, and aren't a problem for binary distributions.

In order to enable compilation of native extensions, this build of Ruby distributes an archive file instead of a dylib. As we will see later, the OSX linker knows how to automatically handle this.

This flag also affects psych.bundle:

$ otool -L build/lib/ruby/1.9.1/x86_64-darwin11.4.0/psych.bundle                      
build/lib/ruby/1.9.1/x86_64-darwin11.4.0/psych.bundle:
	/usr/local/lib/libyaml-0.2.dylib (compatibility version 3.0.0, current version 3.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

External Dependencies

In addition to the general problem of hardcoded paths, there's another issue lurking in the above otool output for Psych. After eliminating the hardcoded path to a local Ruby, we are still left with a link to /usr/local/lib/libyaml-0.2.dylib. Unfortunately, libyaml doesn't come with OSX, so if I take this distribution of Ruby and hand it off to a fresh system, Psych will fail to find libyaml at runtime and fail to load.

A number of the .bundle files that Ruby ships with have similar external dependencies. In general, these dependencies ship with OSX, but some, like openssl, may not last more than another release or two. In addition, the specific versions of these dependencies shipped with OSX may change over time, possibly resulting in different behavior on different systems.

In general, we can eliminate these problems by including the binaries we need into the .bundle files, instead of asking the operating system's dynamic loader to find them at runtime.

The OSX linker's (ld) behavior in this respect is interesting:

  • The linker starts with a list of paths to search for libraries
  • When compiling a program, it may need a particular dependency (psych needs libyaml)
  • It searches through the path for that library. Both libyaml.dylib and libyaml.a will suffice.
  • If the linker finds a .dylib first, it will dynamically link that dependency.
  • If the linker finds a .a first, it will statically link that dependency. By statically linking, we mean that it simply includes the binary into the outputted compiled file
  • If the linker finds a directory containing both a .a and a .dylib, it will dynamically link the dependency

In this case, our goal is to get the linker to statically link libyaml. In order to do this, we will need to build a libyaml.a and get the directory containing that file to the front of the linker's path.

In the case of libyaml, getting a .a looks like this:

$ wget http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
... snip ...
$ tar -xzf yaml-0.1.4.tar.gz
$ cd yaml-0.1.4 
$ ./configure --disable-shared
... snip ...
$ make
... snip ...
$ otool -L src/.libs/libyaml.a
Archive : src/.libs/libyaml.a
src/.libs/libyaml.a(api.o):
src/.libs/libyaml.a(reader.o):
src/.libs/libyaml.a(scanner.o):
src/.libs/libyaml.a(parser.o):
src/.libs/libyaml.a(loader.o):
src/.libs/libyaml.a(writer.o):
src/.libs/libyaml.a(emitter.o):
src/.libs/libyaml.a(dumper.o):

We now have a libyaml.a. Note that the configure flag for getting a .a for a given library is not particularly standardized. Three popular ones: --static, --enable-static, --disable-shared.

Next, we need to move libyaml.a into a directory with any other .a files we want to use and pass them to the compilation process:

$ LDFLAGS="-L/Users/wycats/Code/ruby/deps" ./configure --enable-load-relative --prefix=/Users/wycats/Code/ruby/build
... snip ...
$ make && make install
... snip ...
$ otool -L build/lib/ruby/1.9.1/x86_64-darwin11.4.0/psych.bundlebuild/lib/ruby/1.9.1/x86_64-darwin11.4.0/psych.bundle:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

And voila! We now have a psych.bundle that does not depend on libyaml.dylib. Instead, libyaml is now included in psych.bundle itself. This moves us a step closer to having a portable Ruby build.

We will want to repeat this process for every part of the Ruby standard library with external dependencies (openssl, readline, and zlib are some others). Even though libyaml is the only library that does not ship with OSX, eliminating external dependencies on the operating system insulates our build from changes that Apple makes in the future. Of the dependencies, OpenSSL is the most problematic, as it has already been deprecated in Lion.

The sm Framework

This is where the sm (scripting management) framework comes into play. The goal of the sm framework is to encapsulate solutions to these concerns into reusable libraries. In particular, it abstracts the idea of downloading and compiling a package, and common requirements, like static compilation.

For example, let's take a look at the libyaml library.

The first important file here is config/defaults:

version=0.1.4
base_url=http://pyyaml.org/download/libyaml
configure_flag_static=--disable-shared

This specifies the current version, the URL to download the tarball from, and importantly for us, the configure flag that libyaml expects in order to build a .a file. We added that third line because we needed it for Tokaido. This satisfies one of the major goals of the project: to get as much of the code as possible into shared code instead of code that is specific to Tokaido.

The other important file in the libyaml library is shell/functions:

#!/bin/sh

libyaml_prefetch()
{
  package define \
    file "yaml-${package_version}.${archive_format}" \
    dir "yaml-${package_version}"
}

libyaml_preconfigure()
{
  os is darwin || autoreconf -is --force > autoreconf.log 2>&1 ||
    __sm.package.error "Autoreconf of ${package_name} ${package_version} failed! " "$PWD/autoreconf.log"
}

The sm framework defines a series of steps that a package install goes through:

# preinstall
#
#   prefetch
# fetch
#   postgetch
#   preextract
# extract
#   prepatch
# patch
#   preconfigure
# configure
#   postconfigure
#   prebuild
# build
#   preinstall
# install
#   preactivate
# activate
#   postactivate
#
# postinstall

The indented functions above are user-defined. Functions like fetch and configure are defined by sm.

In our case, the libyaml library defines two of those steps: prefetch and preconfigure. The prefetch function allows us to provide extra information to the fetch method, which specifically allows the prefetch to override the package_file (${package_file:="${package_name}-${package_version}.${archive_format}"}). In our case, even though the package name is libyaml, we want to download the file yaml-1.1.4.tar.gz.

The openssl library is somewhat more complicated. As with libyaml, we needed to teach sm how to install openssl statically. You can check out the commit to see how easy that was.

The great thing about getting this stuff into sm is that there is now a shared resource to answer questions like "how do you statically build openssl". The work I did with Michal to improve the information for the libaries that Ruby depends on can now be used by anyone else trying to build Ruby (or anything else with those dependencies for that matter).

Tokaido is an sm library

Tokaido itself is an sm library! This means that the core Tokaido build will work on Linux, so it can be used to create a standalone distribution of Ruby for Linux, and maybe even the core of a Tokaido for Linux!

The Tokaido package implements a lot of the sm hooks, so check it out to learn more about what you can do using sm's package API.

In my next post, I'll talk about the architecture of the Tokaido UI component.