Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

SafeBuffers and Rails 3.0

As you may have read, Rails adds XSS protection by default in Rails 3. This means that you no longer have to manually escape user input with the h helper, because Rails will automatically escape it for you.

However, it’s not as simple as all that. Consider the following:

Hello <strong>friends</strong>!
 
<%= tag(:p, some_text) %>
<%= some_text %>

In the above example, we have a few different scenarios involving HTML tags. First off, Rails should not escape the strong tag surrounding “friends”, because it is unambiguously not user input. Second, Rails should escape some_text in the <p> tag, but not the <p> tag itself. Finally, the some_text in the final tag should be escaped.

If some_text is <script>evil_js</script>, the above should output:

Hello <strong>friends</strong>!
 
<p>&lt;script&gt;evil_js&lt;/script&gt;</p>
&lt;script&gt;evil_js&lt;/script&gt;

In order to make this happen, we have introduced a new pervasive concept called html_safe into Rails applications. If a String is html_safe (which Rails determines by calling html_safe? on the String), ERB may insert it unaltered into the output. If it is not safe, ERB must first escape it before inserting it into the output.

def tag(name, options = nil, open = false, escape = true)
  "<#{name}#{tag_options(options, escape) if options}#{open ? ">" : " />"}".html_safe
end

Here, Rails creates the tag, telling tag_options to escape the contents, and then marks the entire body as safe. As a result, the <p> and </p> will emerge unaltered, while Rails will escape the user-supplied content.

The first implementation of this, in Koz’s rails-xss plugin, accomplished the above requirements by adding a new flag to all Strings. Rails, or Rails applications, could mark any String as safe, and Rails overrode + and << to mark the resulting String appropriately based on the input Strings.

However, during my last performance pass of Rails, I noticed that overriding every String concatenation resulted in quite a bit of performance overhead. Worse, the performance overhead was linear with the number of <%= %> in a template, so larger templates didn’t absorb the cost (as they would if the problem was once-per-template).

Thinking about the problem more, I realized (and confirmed with Koz, Jeremy, and Evan Phoenix of Rubinius), that we could implement roughly the same feature-set in a more performant way with a smaller API impact on Ruby. Because the problem itself is reasonably complex, I won’t go into a lot of detail about the old implementation, but will explain how you should use the XSS protection with the new implementation. If you already used Koz’s plugin or are working with the prereleases of Rails, you’ll notice that today’s commit changes very little.

SafeBuffer

In Rails 3, the ERB buffer is an instance of ActiveSupport::SafeBuffer. SafeBuffer inherits from String, overriding +, concat and << so that:

  • If the other String is safe (another SafeBuffer), the buffer concatenates it directly
  • If the other String is unsafe (a plain String), the buffer escapes it first, then concatenates it

Calling html_safe on a plain String returns a SafeBuffer wrapper. Because SafeBuffer inherits from String, Ruby creates this wrapper extremely efficiently (just sharing the internal char * storage).

As a result of this implementation, I was starting to see a lot of the following idiom in the codebase:

buffer << other_string.html_safe

Here, Rails is creating a new SafeBuffer for the other_string, then passing it to the << method of the original SafeBuffer, which then checks to see if it is safe. For cases like this, I created a new safe_concat method on the buffer which uses the original, native concat method, skipping both the need to create a new SafeBuffer and the need to check it.

Similarly, concat and safe_concat in ActionView proxy to the concat and safe_concat on the buffer itself, so you can use safe_concat in a helper if you have some HTML you want to concatenate to the buffer with no checks and without escaping.

ERB uses safe_concat internally on the parts of the template outside of <% %> tags, which means that with the changes I pushed today, the XSS protection code adds no performance impact to those cases (basically, all of the plain text in your templates).

Finally, ERB can now detect the raw helper at compile time, so if you do something like <%= raw some_stuff %>, ERB will use safe_concat internally, skipping the runtime creation of a SafeBuffer and checks for html_safety.

Summary

In summary, the XSS protection has the following characteristics:

  • If a plain String is passed into a <%= %>, Rails always escapes it
  • If a SafeBuffer is passed into a <%= %>, Rails does not escape it. To get a SafeBuffer from a String, call html_safe on it. The XSS system has a very small performance impact on this case, limited to a guard calling the html_safe? method
  • If you use the raw helper in a <%= %>, Rails detects it at compile-time of the template, resulting in zero performance impact from the XSS system on that concatenation
  • Rails does not escape any part of a template that is not in an ERB tag. Because Rails handles this at template compile-time, this results in zero performance impact from the XSS system on these concatenations

In comparison, the initial implementation of XSS impacted each concatenation or + of String, had impact even if the app used the raw helper, and even on plain Strings in templates.

That said, I want to extend personal thanks to Koz for getting the first draft out the door. It worked, demonstrated the concept, and let the community test it out. All in all, an excellent first pass.

18 Responses to “SafeBuffers and Rails 3.0”

Awesome news. Do these changes affect the performance information you posted on the Engine Yard blog? If so, how does it fare now?

What about HAML? :)

I’ve spoken with nex3; Haml already has support for the old style and they should be able to support the new style without problems.

Actually the commit that implemented this broke Haml, but it seems like nex3 is on the case. http://github.com/nex3/haml/issues#issue/83

In the example when you do:

shouldn’t have been …

In the example when you do:

shouldn’t have been …

Sorry, i was trying to paste example code …

In the example when you do:

tag(:p, some_text)

shouldn’t have been …

content_tag(:p, some_text)

Santiago Pastorino: I think it should be content_tag too. But content_tag does not escape its input. So, I’ve created a ticket: https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/3883-content_tag-does-not-escape-its-input.

I have been looking at something very similar to this. Quick question, how do you decide what kind of escaping to do?

For example if you were outputting the following in a page

var some_var = ;

how does rails know to escape the text using javascript escaping vs. html escaping?

ahhh, sorry, my example got eaten by some xss protection you have on your site:)

The point I was getting at is that the escaping mechanism depends on the context that you are outputting the variable in. i.e. if you are in a javascript block it needs to be escaped in a different manner to that in an html block. How does rails get to understand the context?

Thanks, and apologies for the double post.

Yehuda,

I know it is a bit too late to reply on this post, but I hope you get some time to read it.

A question that does’t leave my mind is why you didn’t use Ruby’s internal “tain/untain” mechanism, that was designed exactly for a use like that.

Besides not having any performance impact, it would also “propagate” on all Ruby’s internal methods that return strings, not only + and << (ie. gsubing a html_safe string now yields a non html_safe string).

Another great positive side effect would be being able to mark as “unsafe” (or tainted) only strings that came from external sources (DB and params). Now whenever you enter a string in a helper you need to make sure to mark it as html_safe (even strings that don’t have html code inside, since they could later on be added to another string that does). This is a hazzle and is going to break several helpers in existing apps being ported to rails 3.

Yehuda, have you encountered any evil related to the safebuffer implementation leaking into yaml dumps? What I have going on is taking some params that came in a request, and storing them as yaml on my object (i.e. object.something = params.to_yaml). When I do this the yaml turns to complete crud, something like this:

foo: !str
str: bar
“@_rails_html_safe”: false

Previously, this looked quite normal, like this:

foo: bar

This makes the yaml damn near unuseable by other processes reading it, not to mention just ugly :). I’ve googled all over the place but can’t seem to find any references to dealing with this. Right now I have to iterate the entire params hash and wrap each value with String.new() to avoid getting SafeBuffer junk in my yaml. If there are nested params it gets uglier from there, as you can imagine.

This is on Rails 2.3…but I assume same problem happens in Rails 3.

Any insight on this would be greatly appreciated. I read through the safebuffer source and can’t say I see a way to easily turn it off…I don’t want to have to litter my code with calls to methods that clean up parameters by wrapping with strings…I could hack into the rails request processing but that doesn’t seem right.

I think either I’ve got something really wrong, or this usecase (storing params as yaml) was maybe never considered by rails core team, and the damages to the yaml were not considered by the safebuffer code…

Having same problem described by Yan. Calling to_yaml on a simple Hash in a controller results in an overly-complex YAML dump that cannot be unserialized by simple YAML parsers.

Yup, same problem as Yan and Nick.
Specifically, when a form with :multipart => true gets posted and attributes are being serialized in my model, the resulting string contains _rails_html_safe which screws up deserializing..

I was experiencing the same problem as Yan, Nick, and bartzon. Rails 2.3.7 just happened to come out today so I updated. This fixed the problem with ugly YAML, and my app worked once I added the rails_xss plugin.

The difference between raw and html_safe isn’t completely clear to me. Should you use “raw” if possible and “html_safe” only if you are concatenating safe and unsafe strings?

There appears to be an inconsistency in html_safe concatenation in Rails 2.3.8:

$ script/console
Loading development environment (Rails 2.3.8)
>> (‘a’ + ‘b’.html_safe).html_safe?
=> nil
>> (‘a’.html_safe + ‘b’).html_safe?
=> true

The rails XSS safety stuff has changed the meaning of html_scape (and thus h) from “escape this” to “escape this if it’s not safe”. So neither of them always DWYM and to work around this you have to do something like “CGI::escape_html mystr”.

IMHO the escape_html and h should be changed back to their original meaning and the automatic escaping done by XSS should use their own functions.

Leave a Reply

Archives

Categories

Meta