Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

New Rails Isolation Testing

A little while ago, Carl and I starting digging into Rails’ initializer. We already made a number of improvements, such as adding the ability to add a new initializer at any step in the process, and to make it possible to have multiple initializers in a single process. The second improvement is the first step toward running multiple Rails apps in a single process, which requires moving all global Rails state into instances of objects, so each application can have its own contained configuration in its own object. More on this in the next few weeks.

As I detailed on the Engine Yard blog this week, when moving into a new area to refactor, it’s important to make sure that there are very good tests. Although the Rails initializer tests covered a fair amount of area, successfully getting the tests to pass did not guarantee that Rails booted. Thankfully, Sam Ruby’s tests were comprehensive enough to get us through the initial hump.

After making the initial change, we went back to see what we could do to improve the test suite. The biggest problem was a problem we’d already encountered in Merb: you can’t uninitialize Rails. Once you’ve run through the initialization process, many of the things that happen are permanent.

Our solution, which we committed to master today, is to create a new test mixin that runs each test case in its own process. Getting it working on OSX wasn’t trivial, but it was pretty elegant once we got down to it. All we did was override the run method on TestCase to fork before actually running the test. The child then runs the test (and makes whatever invasive changes it needs to), and communicates any methods that were called on the Test::Unit result object back to the parent.

The parent then replays those methods, which means that as far as the parent is concerned, all of the cases are part of a single suite, even though they are being run in a separate process. Figuring out what parts of Test::Unit to hook into took all of yesterday afternoon, but once we were done, it was only about 40 lines of code.

Today, we tackled getting the same module to work in environments that don’t support forking, like JRuby and Windows. Unfortunately, these environments are going to run these tests significantly more slowly, because they have to boot up a full process for each test case, where the forking version can simply use the setup already done in the parent process (which makes it almost as fast as running all the tests in a single process).

The solution was to emulate forking by shelling out to a new process that was identical to the one that was just launched, but with an extra constraint on the test name (basically, booting up the test suite multiple times, but each run only runs a single test). The subprocess then communicates back to the launching process using the same protocol as in the forking model, which means that we only had to change the code that ran the tests in isolation; everything else remains the same.

There was one final caveat, however. It turns out that in Test::Unit, using a combination of -t to specify the test case class and -n to specify the test case name doesn’t work. Test::Unit’s semantics are to include any test for which ANY of the appropriate filters match. I’m not proud of this, but what we did was a small monkey-patch of the Test::Unit collector in the subprocess only which does the right thing:

# Only in subprocess for windows / jruby.
if ENV['ISOLATION_TEST']
  require "test/unit/collector/objectspace"
  class Test::Unit::Collector::ObjectSpace
    def include?(test)
      super && test.method_name == ENV['ISOLATION_TEST']
    end
  end
end

Not great, but all in all, not all that much code (the entire module, including both forking and subprocess methods is just 98 lines of code).

A crazy couple of days yielding a pretty epic hack, but it works!

10 Responses to “New Rails Isolation Testing”

thanks for a nice write-up!

i also noticed that you bumped rails version to 3.0pre.
does it mean that something like beta will be out in the nearest months?

It just means that Carl and I have started actively using edge for projects, and don’t want to see “Starting Rails 2.3″ every time we boot our apps ;)

The best way to gauge how close we are to a public beta is to keep watching these posts (and posts we’re going to start doing on the RoR blog).

Does this slow things down any more? One of the biggest headaches we have is slow individual test startup time. In a Rails project that has many plugins/gems/dependencies and a complex environment, running a *single* test can take up to 15+ seconds – all environment initialization.

This completely kills the TDD red-green-refactor flow of a pair.

There are several ways people try to get around this. Some people use tools that keep the environment spun up (e.g. spec server or autotest) – this is a pain, because you always have situations where tests are failing falsely due to stale state, and you have to manually restart. Other people go to great lengths to lazily initialize and defer everything in their environment, so less stuff gets loaded during most tests. This also takes a lot of careful thought and work.

Does your approach of new process creation for every test exacerbate this problem? What other thoughts do you have on this topic?

Thanks,
– Chad

I’ve done something similar with forking to isolate tests. I’m curious how you handles the communication back to the parent process?

I know that Merb does forking for tests in one of their suites as well as that caused us to have to support test isolation for Devver. It wasn’t that hard of a feature but it sure through me for a loop trying to dig in and see why the tests were behaving so different locally or on devver. Kind of nice to keep things isolated though.

@chad on the contrary! Because we’re isolating just the running of the test inside the fork, it means we can do far more invasive things inside the test itself, but not have to start up the full environment each time. Unfortunately, this approach doesn’t work on Windows, so Windows tests would indeed be slower, but that should impact your scenario, which is development-time test.

In effect, you get roughly the same speed as running the tests in a single process (with the extra overhead of forking, which is quite small), but all the benefits of running tests in isolation.

@dan interestingly, the Merb tests were much more primitive than what we’ve done here. In this case, the parent literally doesn’t know that the tests are being run in forks; everything that happens in the child (in terms of collecting stats) is simply replayed in the parent. In Merb, we massively subverted the test runner ;)

What about debugging? Will this mess with call stacks, breakpoints or stepping through test code or environment code during tests? That is, do most current debuggers (ruby-debug, rubymine, etc) handle forking OK?

Wouldn’t this be susceptible to race conditions at the database level? If one test has an assert_difference and expects X records to be created, but simultaneously another test runs that creates records in the same table, you’d have a false positive. Or is each test inherently wrapped in a transaction (in which case parallelizing them would seem to provide a negligible benefit)? Thanks.

@Ted, this isn’t about parallelization. It’s just that some tests modify global state which can be hard/impossible to undo without restarting the VM. This isolation allows those things to occur, be examined, and then be completely discarded at the OS level.

Leave a Reply

Archives

Categories

Meta