How To Make GoogleBot Crawl Your Backbone.js Website

If you create single page applications with Backbone.js or similiar you will end up with websites which requires JavaScript. This is not a big problem when it comes to regular user-agents, but Google will not be able to see your content. We solve this issue by using PhantomJS.

One of our websites, Godt.no, is built as a single page applications with Backbone.js as MVC and Router. It talks to PHP backends which is “reduced” to APIs. The front-end is built and compressed using RequireJS + Compass and Capistrano ensure a correct deployment every time.

Rendering is done entirely client-side with Underscore templates. The backend server creates only a skeleton index-page which is used for including stylesheets and initializing our Backbone.Router instance.

The website is built by our talented teams of developers in Kraków, through Schibsted Tech Polska, a sister company in Schibsted Media Group.

Motivation for client-side rendering

There are several benefits with building websites like this. It results in extremely fast navigation (or at least perceived as such) as we can start preparing data server-side asynchronously and start updating the front-end immediately. In other words we give the user visual feedback instantly when performing an action even if the action requires some time to complete server-side. This is what gives us the top position on New Relic’s App Speed Index list for CMS pages.

Another benefit is that it’s much easier to compose front-end components according to the user-agents capabilities. We can detect available viewport and load just the amount of content we can display, which greatly reduces the amount of data transferred between server and client.

The single page approach has one major problem. GoogleBot does not support JavaScript and is thus not capable of seeing anything but a blank index-page with no content. The most common solution is to set up server-side rendering with separate templates server-side or try to share templates by using template engines like Mustache. Both approaches increases both the cost of initial development, but also, and most importantly, maintenance costs.

Use JavaScript to render JavaScript templates!

Instead of letting PHP know anything about how to render, we have set up a solution which uses a combination of PhantomJS, Varnish and GoogleBot’s escaped fragment. All navigation uses the hashbang (#!) so Google knows we support the escaped fragment.

We let Varnish detect the requests using escaped fragments trough the following piece of VCL:

This will basically send all requests containg “escaped_fragment” to our “vg_render_01” backend which is a PhantomJS server. It will recieve the request, construct the URL with hashbang instead of _escaped_fragment_ and start a PhantomJS process to render the HTML content.

Our PhantomJS solution is on roughly 300 lines of code and contains verbose logging and error handling. It will recieve the request and render it asynchronously, which makes it capable of handling a lot of traffic. A typical full round-trip request takes about ~150 ms.

After PhantomJS has finished rendering, the HTML content is sent back to Varnish. Since the URL will be unique for this content we allow Varnish to cache the content.

GoogleBot has now received the exact same content that any other JavaScript enabled user-agent would produce, and we have achieved this with only a few lines of code. We do not need to maintain two sets of templates and GoogleBot will always see new content.

Read more from the Software engineering category
SUBSCRIBE TO OUR UPDATES
Menu