Yehuda Katz is a member of the Ember.js, Ruby on Rails and jQuery Core Teams; his 9-to-5 home is at the startup he founded, Tilde Inc.. There he works on Skylight, the smart profiler for Rails, and does Ember.js consulting. He is best known for his open source work, which also includes Thor and Handlebars. He travels the world doing open source evangelism and web standards work.

Archive for August, 2011

Understanding “Prototypes” in JavaScript

For the purposes of this post, I will be talking about JavaScript objects using syntax defined in ECMAScript 5.1. The basic semantics existed in Edition 3, but they were not well exposed.

A Whole New Object

In JavaScript, objects are pairs of keys and values (in Ruby, this structure is called a Hash; in Python, it’s called a dictionary). For example, if I wanted to describe my name, I could have an object with two keys: `firstName` would point to “Yehuda” and `lastName` would point to “Katz”. Keys in a JavaScript object are Strings.

To create the simplest new object in JavaScript, you can use Object.create:

var person = Object.create(null); // this creates an empty objects

Why didn’t we just use var person = {};? Stick with me! To look up a value in the object by key, use bracket notation. If there is no value for the key in question, JavaScript will return `undefined`.

person['name'] // undefined

If the String is a valid identifier[1], you can use the dot form:

person.name // undefined

[1] in general, an identifier starts with a unicode letter, $, _, followed by any of the starting characters or numbers. A valid identifier must also not be a reserved word. There are other allowed characters, such as unicode combining marks, unicode connecting punctuation, and unicode escape sequences. Check out the spec for the full details

Adding values

So now you have an empty object. Not that useful, eh? Before we can add some properties, we need to understand what a property (what the spec calls a “named data property”) looks like in JavaScript.

Obviously, a property has a name and a value. In addition, a property can be enumerable, configurable and writable. If a value is enumerable, it will show up when enumerating over an object using a for(prop in obj) loop. If a property is writable, you can replace it. If a property is configurable, you can delete it or change its other attributes.

In general, when we create a new property, we will want it to be enumerable, configurable, and writable. In fact, prior to ECMAScript 5, that was the only kind of property a user could create directly.

We can add a property to an object using Object.defineProperty. Let’s add a first name and last name to our empty object:

var person = Object.create(null);
Object.defineProperty(person, 'firstName', {
  value: "Yehuda",
  writable: true,
  enumerable: true,
  configurable: true
});
 
Object.defineProperty(person, 'lastName', {
  value: "Katz",
  writable: true,
  enumerable: true,
  configurable: true
});

Obviously, this is extremely verbose. We can make it a bit less verbose by eliminating the common defaults:

var config = {
  writable: true,
  enumerable: true,
  configurable: true
};
 
var defineProperty = function(obj, name, value) {
  config.value = value;
  Object.defineProperty(obj, name, config);
}
 
var person = Object.create(null);
defineProperty(person, 'firstName', "Yehuda");
defineProperty(person, 'lastName',   "Katz");

Still, this is pretty ugly to create a simple property list. Before we can get to a prettier solution, we will need to add another weapon to our JavaScript object arsenal.

Prototypes

So far, we’ve talked about objects as simple pairs of keys and values. In fact, JavaScript objects also have one additional attribute: a pointer to another object. We call this pointer the object’s prototype. If you try to look up a key on an object and it is not found, JavaScript will look for it in the prototype. It will follow the “prototype chain” until it sees a null value. In that case, it returns undefined.

You’ll recall that we created a new object by invoking Object.create(null). The parameter tells JavaScript what it should set as the Object’s prototype. You can look up an object’s prototype by using Object.getPrototypeOf:

var man = Object.create(null);
defineProperty(man, 'sex', "male");
 
var yehuda = Object.create(man);
defineProperty(yehuda, 'firstName', "Yehuda");
defineProperty(yehuda, 'lastName', "Katz");
 
yehuda.sex       // "male"
yehuda.firstName // "Yehuda"
yehuda.lastName  // "Katz"
 
Object.getPrototypeOf(yehuda) // returns the man object

We can also add functions that we share across many objects this way:

var person = Object.create(null);
defineProperty(person, 'fullName', function() {
  return this.firstName + ' ' + this.lastName;
});
 
// this time, let's make man's prototype person, so all
// men share the fullName function
var man = Object.create(person);
defineProperty(man, 'sex', "male");
 
var yehuda = Object.create(man);
defineProperty(yehuda, 'firstName', "Yehuda");
defineProperty(yehuda, 'lastName', "Katz");
 
yehuda.sex        // "male"
yehuda.fullName() // "Yehuda Katz"

Setting Properties

Since creating a new writable, configurable, enumerable property is pretty common, JavaScript makes it easy to do so using assignment syntax. Let’s update the previous example using assignment instead of defineProperty:

var person = Object.create(null);
 
// instead of using defineProperty and specifying writable,
// configurable, and enumerable, we can just assign the
// value directly and JavaScript will take care of the rest
person['fullName'] = function() {
  return this.firstName + ' ' + this.lastName;
};
 
// this time, let's make man's prototype person, so all
// men share the fullName function
var man = Object.create(person);
man['sex'] = "male";
 
var yehuda = Object.create(man);
yehuda['firstName'] = "Yehuda";
yehuda['lastName'] = "Katz";
 
yehuda.sex        // "male"
yehuda.fullName() // "Yehuda Katz"

Just like when looking up properties, if the property you are defining is an identifier, you can use dot syntax instead of bracket syntax. For instance, you could say man.sex = "male" in the example above.

Object Literals

Still, having to set a number of properties every time can get annoying. JavaScript provides a literal syntax for creating an object and assigning properties to it at one time.

var person = { firstName: "Paul", lastName: "Irish" }

This syntax is approximately sugar for:

var person = Object.create(Object.prototype);
person.firstName = "Paul";
person.lastName  = "Irish";

The most important thing about the expanded form is that object literals always set the newly created object’s prototype to an object located at Object.prototype. Internally, the object literal looks like this:

prototype-chain.png

The default Object.prototype dictionary comes with a number of the methods we have come to expect objects to contain, and through the magic of the prototype chain, all new objects created as object literal will contain these properties. Of course, objects can happily override them by defining the properties directly. Most commonly, developers will override the toString method:

var alex = { firstName: "Alex", lastName: "Russell" };
 
alex.toString() // "[object Object]"
 
var brendan = {
  firstName: "Brendan",
  lastName: "Eich",
  toString: function() { return "Brendan Eich"; }
};
 
brendan.toString() // "Brendan Eich"

This is especially useful because a number of internal coercion operations use a supplied toString method.

Unfortunately, this literal syntax only works if we are willing to make the new object’s prototype Object.prototype. This eliminates the benefits we saw earlier of sharing properties using the prototype. In many cases, the convenience of the simple object literal outweighs this loss. In other cases, you will want a simple way to create a new object with a particular prototype. I’ll explain it right afterward:

var fromPrototype = function(prototype, object) {
  var newObject = Object.create(prototype);
 
  for (var prop in object) {
    if (object.hasOwnProperty(prop)) {
      newObject[prop] = object[prop];      
    }
  }
 
  return newObject;
};
 
var person = {
  toString: function() {
    return this.firstName + ' ' + this.lastName;
  }
};
 
var man = fromPrototype(person, {
  sex: "male"
});
 
var jeremy = fromPrototype(man, {
  firstName: "Jeremy",
  lastName:  "Ashkenas"
});
 
jeremy.sex        // "male"
jeremy.toString() // "Jeremy Ashkenas"

Let’s deconstruct the fromPrototype method. The goal of this method is to create a new object with a set of properties, but with a particular prototype. First, we will use Object.create() to create a new empty object, and assign the prototype we specify. Next, we will enumerate all of the properties in the object that we supplied, and copy them over to the new object.

Remember that when you create an object literal, like the ones we are passing in to fromPrototype, it will always have Object.prototype as its prototype. By default, the properties that JavaScript includes on Object.prototype are not enumerable, so we don’t have to worry about valueOf showing up in our loop. However, since Object.prototype is an object like any other object, anyone can define a new property on it, which may (and probably would) be marked enumerable.

As a result, while we are looping through the properties on the object we passed in, we want to restrict our copying to properties that were defined on the object itself, and not found on the prototype. JavaScript includes a method called hasOwnProperty on Object.prototype to check whether a property was defined on the object itself. Since object literals will always have Object.prototype as their prototype, you can use it in this manner.

The object we created in the example above looks like this:

prototype-chain-2.png

Native Object Orientation

At this point, it should be obvious that prototypes can be used to inherit functionality, much like traditional object oriented languages. To facilitate using it in this manner, JavaScript provides a new operator.

In order to facilitate object oriented programming, JavaScript allows you to use a Function object as a combination of a prototype to use for the new object and a constructor function to invoke:

var Person = function(firstName, lastName) {
  this.firstName = firstName;
  this.lastName = lastName;
}
 
Person.prototype = {
  toString: function() { return this.firstName + ' ' + this.lastName; }
}

Here, we have a single Function object that is both a constructor function and an object to use as the prototype of new objects. Let’s implement a function that will create new instances from this Person object:

function newObject(func) {
  // get an Array of all the arguments except the first one
  var args = Array.prototype.slice.call(arguments, 1);
 
  // create a new object with its prototype assigned to func.prototype
  var object = Object.create(func.prototype);
 
  // invoke the constructor, passing the new object as 'this'
  // and the rest of the arguments as the arguments
  func.apply(object, args);
 
  // return the new object
  return object;
}
 
var brendan = newObject(Person, "Brendan", "Eich");
brendan.toString() // "Brendan Eich"

The new operator in JavaScript essentially does this work, providing a syntax familiar to those comfortable with traditional object oriented languages:

var mark = new Person("Mark", "Miller");
mark.toString() // "Mark Miller"

In essence, a JavaScript “class” is just a Function object that serves as a constructor plus an attached prototype object. I mentioned before that earlier versions of JavaScript did not have Object.create. Since it is so useful, people often created something like it using the new operator:

var createObject = function (o) {
  // we only want the prototype part of the `new`
  // behavior, so make an empty constructor
  function F() {}
 
  // set the function's `prototype` property to the
  // object that we want the new object's prototype
  // to be.
  F.prototype = o;
 
  // use the `new` operator. We will get a new
  // object whose prototype is o, and we will
  // invoke the empty function, which does nothing.
  return new F();
};

I really love that ECMAScript 5 and newer versions have begun to expose internal aspects of the implementation, like allowing you to directly define non-enumerable properties or define objects directly using the prototype chain.

Understanding JavaScript Function Invocation and “this”

Over the years, I’ve seen a lot of confusion about JavaScript function invocation. In particular, a lot of people have complained that the semantics of `this` in function invocations is confusing.

In my opinion, a lot of this confusion is cleared up by understanding the core function invocation primitive, and then looking at all other ways of invoking a function as sugar on top of that primitive. In fact, this is exactly how the ECMAScript spec thinks about it. In some areas, this post is a simplification of the spec, but the basic idea is the same.

The Core Primitive

First, let’s look at the core function invocation primitive, a Function’s call method[1]. The call method is relatively straight forward.

  1. Make an argument list (argList) out of parameters 1 through the end
  2. The first parameter is thisValue
  3. Invoke the function with this set to thisValue and the argList as its argument list

For example:

function hello(thing) {
  console.log(this + " says hello " + thing);
}
 
hello.call("Yehuda", "world") //=> Yehuda says hello world

As you can see, we invoked the hello method with this set to "Yehuda" and a single argument "world". This is the core primitive of JavaScript function invocation. You can think of all other function calls as desugaring to this primitive. (to “desugar” is to take a convenient syntax and describe it in terms of a more basic core primitive).

[1] In the ES5 spec, the call method is described in terms of another, more low level primitive, but it’s a very thin wrapper on top of that primitive, so I’m simplifying a bit here. See the end of this post for more information.

Simple Function Invocation

Obviously, invoking functions with call all the time would be pretty annoying. JavaScript allows us to invoke functions directly using the parens syntax (hello("world"). When we do that, the invocation desugars:

function hello(thing) {
  console.log("Hello " + thing);
}
 
// this:
hello("world")
 
// desugars to:
hello.call(window, "world");

This behavior has changed in ECMAScript 5 only when using strict mode[2]:

// this:
hello("world")
 
// desugars to:
hello.call(undefined, "world");

The short version is: a function invocation like fn(...args) is the same as fn.call(window [ES5-strict: undefined], ...args).

Note that this is also true about functions declared inline: (function() {})() is the same as (function() {}).call(window [ES5-strict: undefined).

[2] Actually, I lied a bit. The ECMAScript 5 spec says that undefined is (almost) always passed, but that the function being called should change its thisValue to the global object when not in strict mode. This allows strict mode callers to avoid breaking existing non-strict-mode libraries.

Member Functions

The next very common way to invoke a method is as a member of an object (person.hello()). In this case, the invocation desugars:

var person = {
  name: "Brendan Eich",
  hello: function(thing) {
    console.log(this + " says hello " + thing);
  }
}
 
// this:
person.hello("world")
 
// desugars to this:
person.hello.call(person, "world");

Note that it doesn't matter how the hello method becomes attached to the object in this form. Remember that we previously defined hello as a standalone function. Let's see what happens if we attach is to the object dynamically:

function hello(thing) {
  console.log(this + " says hello " + thing);
}
 
person = { name: "Brendan Eich" }
person.hello = hello;
 
person.hello("world") // still desugars to person.hello.call(person, "world")
 
hello("world") // "[object DOMWindow]world"

Notice that the function doesn't have a persistent notion of its 'this'. It is always set at call time based upon the way it was invoked by its caller.

Using Function.prototype.bind

Because it can sometimes be convenient to have a reference to a function with a persistent this value, people have historically used a simple closure trick to convert a function into one with an unchanging this:

var person = {
  name: "Brendan Eich",
  hello: function(thing) {
    console.log(this.name + " says hello " + thing);
  }
}
 
var boundHello = function(thing) { return person.hello.call(person, thing); }
 
boundHello("world");

Even though our boundHello call still desugars to boundHello.call(window, "world"), we turn right around and use our primitive call method to change the this value back to what we want it to be.

We can make this trick general-purpose with a few tweaks:

var bind = function(func, thisValue) {
  return function() {
    return func.apply(thisValue, arguments);
  }
}
 
var boundHello = bind(person.hello, person);
boundHello("world") // "Brendan Eich says hello world"

In order to understand this, you just need two more pieces of information. First, arguments is an Array-like object that represents all of the arguments passed into a function. Second, the apply method works exactly like the call primitive, except that it takes an Array-like object instead of listing the arguments out one at a time.

Our bind method simply returns a new function. When it is invoked, our new function simply invokes the original function that was passed in, setting the original value as this. It also passes through the arguments.

Because this was a somewhat common idiom, ES5 introduced a new method bind on all Function objects that implements this behavior:

var boundHello = person.hello.bind(person);
boundHello("world") // "Brendan Eich says hello world"

This is most useful when you need a raw function to pass as a callback:

var person = {
  name: "Alex Russell",
  hello: function() { console.log(this.name + " says hello world"); }
}
 
$("#some-div").click(person.hello.bind(person));
 
// when the div is clicked, "Alex Russell says hello world" is printed

This is, of course, somewhat clunky, and TC39 (the committee that works on the next version(s) of ECMAScript) continues to work on a more elegant, still-backwards-compatible solution.

On jQuery

Because jQuery makes such heavy use of anonymous callback functions, it uses the call method internally to set the this value of those callbacks to a more useful value. For instance, instead of receiving window as this in all event handlers (as you would without special intervention), jQuery invokes call on the callback with the element that set up the event handler as its first parameter.

This is extremely useful, because the default value of this in anonymous callbacks is not particularly useful, but it can give beginners to JavaScript the impression that this is, in general a strange, often mutated concept that is hard to reason about.

If you understand the basic rules for converting a sugary function call into a desugared func.call(thisValue, ...args), you should be able to navigate the not so treacherous waters of the JavaScript this value.

this-table.png

PS: I Cheated

In several places, I simplified the reality a bit from the exact wording of the specification. Probably the most important cheat is the way I called func.call a "primitive". In reality, the spec has a primitive (internally referred to as [[Call]]) that both func.call and [obj.]func() use.

However, take a look at the definition of func.call:

  1. If IsCallable(func) is false, then throw a TypeError exception.
  2. Let argList be an empty List.
  3. If this method was called with more than one argument then in left to right order starting with arg1 append each argument as the last element of argList
  4. Return the result of calling the [[Call]] internal method of func, providing thisArg as the this value and argList as the list of arguments.

As you can see, this definition is essentially a very simple JavaScript language binding to the primitive [[Call]] operation.

If you look at the definition of invoking a function, the first seven steps set up thisValue and argList, and the last step is: "Return the result of calling the [[Call]] internal method on func, providing thisValue as the this value and providing the list argList as the argument values."

It's essentially identical wording, once the argList and thisValue have been determined.

I cheated a bit in calling call a primitive, but the meaning is essentially the same as had I pulled out the spec at the beginning of this article and quoted chapter and verse.

There are also some additional cases (most notably involving with) that I didn't cover here.

Archives

Categories

Meta