Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

Best Things Come in Threes

I was at the Dallas Tech Fest this past weekend, and had the opportunity to meet up with some cool guys in the DataMapper community (Sam, Adam, Ben and Bryan). While I was there, I ended up hacking out a few unrelated tools that I thought I’d share with the community.

benchwarmer

Benchwarmer is an improved DSL for doing benchmarks (hat tip for the name to br0nette). It provides options for grouping, and produces output like:

Running the benchmarks 100000 times each...
 
                         Option 1 |   TWO | Option 3 |
 -----------------------------------------------------
 Squeezing with #squeeze     0.15 |  0.15 |     0.14 |
              with #gsub     0.38 |  0.35 |     0.36 |
 -----------------------------------------------------
 Spliting    with #split     0.43 |  0.51 |     0.61 |
             with #match     0.29 |  0.35 |     0.38 |
 -----------------------------------------------------  

You get that output by doing:

  Benchmark.warmer(TIMES) do                              
    columns :one, :two, :three                            
    titles :one => "Option 1", :three => "Option 3"      
    
    group("Squeezing") do                                
      report "with #squeeze" do                          
        one { "abc//def//ghi//jkl".squeeze("/") }        
        two { "abc///def///ghi///jkl".squeeze("/") }
        three { "abc////def////ghi////jkl".squeeze("/") }
      end
      report "with #gsub" do
        one { "abc//def//ghi//jkl".gsub(/\/+/, "/") }
        two { "abc///def///ghi///jkl".gsub(/\/+/, "/") }
        three { "abc////def////ghi////jkl".gsub(/\/+/, "/") }
      end
    end
    
    group("Spliting") do
      report "with #split" do
        one { "aaa/aaa/aaa.bbb.ccc.ddd".split(".") }
        two { "aaa//aaa//aaa.bbb.ccc.ddd.eee".split(".") }
        three { "aaa///aaa///aaa.bbb.ccc.ddd.eee.fff".split(".") }
      end
      report "with #match" do
        one { "aaa/aaa/aaa.bbb.ccc.ddd".match(/\.([^\.]*)$/) }
        two { "aaa//aaa//aaa.bbb.ccc.ddd.eee".match(/\.([^\.]*)$/) }
        three { "aaa///aaa///aaa.bbb.ccc.ddd.eee.fff".match(/\.([^\.]*)$/) }
      end
    end
  end

Most of that is optional; you can get usable benchmarks by stripping the DSL down to:

  Benchmark.warmer(TIMES) do
    report "squeezing with #squeeze" do
      "abc//def//ghi//jkl".squeeze("/")
    end
    report "squeezing with #gsub" do
      "abc//def//ghi//jkl".gsub(/\/+/, "/")
    end
  end

which produces:


                         Results |
----------------------------------
 squeezing with #squeeze    0.15 |
    squeezing with #gsub    0.34 |
----------------------------------

It is available at github. I will add the appropriate stuff so you can do
gem install wycats-benchwarmer. For now, you can just check out the git repo and do rake install.

I extracted this out of the benchmarks I was doing as I was building…

10x Faster Rails and Merb Inflector

On the plane over to Dallas Tech Fest, I thought it would be nice to try and improve the performance of the Rails Inflector, which is currently pretty slow (albeit probably not a bottleneck). Merb already uses the Facets English Inflector, which is 2x or so faster, but I was pretty sure I could do even better.

When I analyzed the English Inflector, I noticed a few things:

  • They were doing something similar to Rails looping over a list of regexen and picking the correct ones
  • Without exception, the correct choice was the longest string that matched the end of the word.
  • Matching a string (like fooses) against a regex with (foo|foos|fooses) will always match the longest string
  • Neither Rails nor English were caching the resulting words, which don’t change, and in the case of Inflecting Rails models and controllers, are a small universe of total words

As a result, I did two optimizations:

  • I packed all of the regexen into a single regex, and got rid of the sort-by-longest-string code. I then did a simple sub! against the word, pulling the results out of the rules Hash (that already existed in English)
  • English already cached irregular words (its first step was to look in the irregular words hash for the word in question), so I simply extended this cache to include any word already found.

Between these two optimizations, I was able to get around 10x over Rails, and got a huge boost for simple pluralization words (not so much for things like “person” => “people”). Rails also caught a bunch of cases in their tests that were not supported by English, so I added support for things like capital versions (“Person” => “People”) and partial words (“foo_child” => “foo_children”).

Here are the benchwarmer results:

                                         OLD  | NEW   | RAILS
Simple: account => accounts    Singular  0.16 |  0.02 |  0.25
                                 Plural  0.14 |  0.03 |  0.24
-------------------------------------------------------------
Simple: American => Americans  Singular  0.16 |  0.02 |  0.27
                                 Plural  0.16 |  0.31 |  0.27
-------------------------------------------------------------
Abnormal: dwarf => dwarves     Singular  0.07 |  0.04 |  0.17
                                 Plural  0.06 |  0.03 |  0.17
-------------------------------------------------------------
Abnormal: hero => heroes       Singular  0.05 |  0.03 |  0.20
                                 Plural  0.06 |  0.02 |  0.21
-------------------------------------------------------------
One Way: cactus => cactuses    Singular  0.11 |  0.02 |  0.26
                                 Plural  0.07 |  0.02 |  0.26
-------------------------------------------------------------
One Way: wife => wives         Singular  0.11 |  0.02 |  0.16
                                 Plural  0.13 |  0.03 |  0.14
-------------------------------------------------------------
Uncountable: fish => fish      Singular  0.03 |  0.02 |  0.03
                                 Plural  0.02 |  0.02 |  0.03
-------------------------------------------------------------
Exception: person => people    Singular  0.02 |  0.03 |  0.09
                                 Plural  0.03 |  0.02 |  0.11
-------------------------------------------------------------

It does break back-compat a bit with Rails, as the mechanism for adding new rules is simpler in order to be compatible with English’s faster inflector algorithm. I also had to remove Inflector#clear, which I wasn’t sure anyone was actually using (it allowed the clearing of very specific types of rules, which can no longer be supported as irregular words and regular singularization/pluralization rules are dumped in the same list for efficiency).

The new way of defining rules is:

Inflector.inflections do
  # One argument means singular and plural are the same.

  word 'equipment'
  word 'information'
  word 'money'
... snip ...
  word 'Swiss'     , 'Swiss'
  word 'virus'     , 'viri'
  word 'octopus'   , 'octopi'
... snip ...
  rule 'person' , 'people', true
  rule 'shoe'   , 'shoes', true
  rule 'hive'   , 'hives', true
  rule 'man'    , 'men', true    
  rule 'rf'     , 'rves'
  rule 'ero'    , 'eroes'
... snip ...
  singular_rule 'of' , 'ofs' # proof
  singular_rule 'o'  , 'oes' # hero, heroes
... snip ...
  plural_rule 's'   , 'ses'
  plural_rule 'ive' , 'ives' # don't want to snag wife
  plural_rule 'fe'  , 'ves'  # don't want to snag perspectives  

All of the inflector tests still pass in Rails, with the exception of the #clear tests, which were too coupled to the old implementation to salvage (they actually introspected into specific ivars).

The Rails modifications are available on my Rails branch on github. I hope to be able to push
the changes upstream if the Rails core folks are amenable :).

OSX Window Resizing Tools

I was going crazy trying to figure out a way to cordon off a section of my screen for screencasts (the center 800×600 for instance), and ended up in crazy AppleScript-land. I ended up with two useful scripts:

  • center, which takes any window, resizes it to a provided size, and centers it on the screen
  • maximize, which takes any window and maximizes it to fill the screen (even for crazy-zoom-windows like Safari)

Both scripts take drawers into consideration, so you can resize a window like TextMate to the correct size.

It is available at github and comes with a convenient raketask for installing
that will compile and copy your scripts into the appropriate folders. It also comes with a raketask that will install FastScripts Lite, which will
allow you to bind keys to these scripts (see the README for full details).

4 Responses to “Best Things Come in Threes”

Looks pretty to me :)

Could the caching possibly cause a “memory leak” if you were using inflections on user inputed/dynamic text?

Dictionary.find(123).word.pluralize

Stupid example, and I know I’ve never used it that way, but who knows what other devs might do :)

Or does the caching work different than I’m thinking?

Also, did you push this patch upstream to the english lib maintainers?

gem install wycats-benchwarmer doesn’t work any more.

Leave a Reply

Archives

Categories

Meta