Best Things Come in Threes

I was at the Dallas Tech Fest this past weekend, and had the opportunity to meet up with some cool guys in the DataMapper community (Sam, Adam, Ben and Bryan). While I was there, I ended up hacking out a few unrelated tools that I thought I'd share with the community.


Benchwarmer is an improved DSL for doing benchmarks (hat tip for the name to br0nette). It provides options for grouping, and produces output like:

Running the benchmarks 100000 times each...

                         Option 1 |   TWO | Option 3 |
 Squeezing with #squeeze     0.15 |  0.15 |     0.14 |
              with #gsub     0.38 |  0.35 |     0.36 |
 Spliting    with #split     0.43 |  0.51 |     0.61 |
             with #match     0.29 |  0.35 |     0.38 |

You get that output by doing:

  Benchmark.warmer(TIMES) do                              
    columns :one, :two, :three                            
    titles :one => "Option 1", :three => "Option 3"      

    group("Squeezing") do                                
      report "with #squeeze" do                          
        one { "abc//def//ghi//jkl".squeeze("/") }        
        two { "abc///def///ghi///jkl".squeeze("/") }
        three { "abc////def////ghi////jkl".squeeze("/") }
      report "with #gsub" do
        one { "abc//def//ghi//jkl".gsub(/\/+/, "/") }
        two { "abc///def///ghi///jkl".gsub(/\/+/, "/") }
        three { "abc////def////ghi////jkl".gsub(/\/+/, "/") }

    group("Spliting") do
      report "with #split" do
        one { "aaa/aaa/aaa.bbb.ccc.ddd".split(".") }
        two { "aaa//aaa//aaa.bbb.ccc.ddd.eee".split(".") }
        three { "aaa///aaa///aaa.bbb.ccc.ddd.eee.fff".split(".") }
      report "with #match" do
        one { "aaa/aaa/aaa.bbb.ccc.ddd".match(/\.([^\.]*)$/) }
        two { "aaa//aaa//aaa.bbb.ccc.ddd.eee".match(/\.([^\.]*)$/) }
        three { "aaa///aaa///aaa.bbb.ccc.ddd.eee.fff".match(/\.([^\.]*)$/) }

Most of that is optional; you can get usable benchmarks by stripping the DSL down to:

  Benchmark.warmer(TIMES) do
    report "squeezing with #squeeze" do
    report "squeezing with #gsub" do
      "abc//def//ghi//jkl".gsub(/\/+/, "/")

which produces:

                         Results |
 squeezing with #squeeze    0.15 |
    squeezing with #gsub    0.34 |

It is available at github. I will add the appropriate stuff so you can do
gem install wycats-benchwarmer. For now, you can just check out the git repo and do rake install.

I extracted this out of the benchmarks I was doing as I was building...

10x Faster Rails and Merb Inflector

On the plane over to Dallas Tech Fest, I thought it would be nice to try and improve the performance of the Rails Inflector, which is currently pretty slow (albeit probably not a bottleneck). Merb already uses the Facets English Inflector, which is 2x or so faster, but I was pretty sure I could do even better.

When I analyzed the English Inflector, I noticed a few things:

  • They were doing something similar to Rails looping over a list of regexen and picking the correct ones
  • Without exception, the correct choice was the longest string that matched the end of the word.
  • Matching a string (like fooses) against a regex with (foo|foos|fooses) will always match the longest string
  • Neither Rails nor English were caching the resulting words, which don't change, and in the case of Inflecting Rails models and controllers, are a small universe of total words

As a result, I did two optimizations:

  • I packed all of the regexen into a single regex, and got rid of the sort-by-longest-string code. I then did a simple sub! against the word, pulling the results out of the rules Hash (that already existed in English)
  • English already cached irregular words (its first step was to look in the irregular words hash for the word in question), so I simply extended this cache to include any word already found.

Between these two optimizations, I was able to get around 10x over Rails, and got a huge boost for simple pluralization words (not so much for things like "person" => "people"). Rails also caught a bunch of cases in their tests that were not supported by English, so I added support for things like capital versions ("Person" => "People") and partial words ("foochild" => "foochildren").

Here are the benchwarmer results:

                                         OLD  | NEW   | RAILS
Simple: account => accounts    Singular  0.16 |  0.02 |  0.25  
                                 Plural  0.14 |  0.03 |  0.24
Simple: American => Americans  Singular  0.16 |  0.02 |  0.27  
                                 Plural  0.16 |  0.31 |  0.27
Abnormal: dwarf => dwarves     Singular  0.07 |  0.04 |  0.17  
                                 Plural  0.06 |  0.03 |  0.17
Abnormal: hero => heroes       Singular  0.05 |  0.03 |  0.20  
                                 Plural  0.06 |  0.02 |  0.21
One Way: cactus => cactuses    Singular  0.11 |  0.02 |  0.26  
                                 Plural  0.07 |  0.02 |  0.26
One Way: wife => wives         Singular  0.11 |  0.02 |  0.16  
                                 Plural  0.13 |  0.03 |  0.14
Uncountable: fish => fish      Singular  0.03 |  0.02 |  0.03  
                                 Plural  0.02 |  0.02 |  0.03
Exception: person => people    Singular  0.02 |  0.03 |  0.09  
                                 Plural  0.03 |  0.02 |  0.11

It does break back-compat a bit with Rails, as the mechanism for adding new rules is simpler in order to be compatible with English's faster inflector algorithm. I also had to remove Inflector#clear, which I wasn't sure anyone was actually using (it allowed the clearing of very specific types of rules, which can no longer be supported as irregular words and regular singularization/pluralization rules are dumped in the same list for efficiency).

The new way of defining rules is:

Inflector.inflections do  
  # One argument means singular and plural are the same.

  word 'equipment'
  word 'information'
  word 'money'
... snip ...
  word 'Swiss'     , 'Swiss'
  word 'virus'     , 'viri'
  word 'octopus'   , 'octopi'
... snip ...
  rule 'person' , 'people', true
  rule 'shoe'   , 'shoes', true
  rule 'hive'   , 'hives', true
  rule 'man'    , 'men', true    
  rule 'rf'     , 'rves'
  rule 'ero'    , 'eroes'
... snip ...
  singular_rule 'of' , 'ofs' # proof
  singular_rule 'o'  , 'oes' # hero, heroes
... snip ...
  plural_rule 's'   , 'ses'
  plural_rule 'ive' , 'ives' # don't want to snag wife
  plural_rule 'fe'  , 'ves'  # don't want to snag perspectives  

All of the inflector tests still pass in Rails, with the exception of the #clear tests, which were too coupled to the old implementation to salvage (they actually introspected into specific ivars).

The Rails modifications are available on my Rails branch on github. I hope to be able to push
the changes upstream if the Rails core folks are amenable :).

OSX Window Resizing Tools

I was going crazy trying to figure out a way to cordon off a section of my screen for screencasts (the center 800x600 for instance), and ended up in crazy AppleScript-land. I ended up with two useful scripts:

  • center, which takes any window, resizes it to a provided size, and centers it on the screen
  • maximize, which takes any window and maximizes it to fill the screen (even for crazy-zoom-windows like Safari)

Both scripts take drawers into consideration, so you can resize a window like TextMate to the correct size.

It is available at github and comes with a convenient raketask for installing
that will compile and copy your scripts into the appropriate folders. It also comes with a raketask that will install FastScripts Lite, which will
allow you to bind keys to these scripts (see the README for full details).