Last month Google added Website speed to its site ranking algorithm: It’s Official: Google Now Counts Site Speed As A Ranking Factor.

According to the article there are two primary ways Google will measure page speed:

  1. How a page responds to Googlebot
  2. Load time as measured by the Google Toolbar

A slow site can translate into a poor user experience and a lower conversion rate -- which in itself can lose you money. But this new development can actually cause you further harm by lowering your Google search ranking, as was recently reported by this users: Google Rankings Drop! Where to Check Site Response Time?

I noticed some of my sites drop significantly for many of the keywords that I rank for in Google. I rank in the top 10 for several keywords (on several different sites) and my traffic came to a screeching halt today. My rankings are down to the 7th or 8th page or lower!

After doing some digging around, I decided to login into my Google Webmaster tools. Sure enough I found a warning next the sitemaps for all my affected sites.
The warning was: "Some URLs in the Sitemap have a high response time.
This may indicate a problem with your server or with the content of the page."

Understanding why Web Speed Matters

The rationale behind this move by Google is fairly straightforward:

Slow web sites lead to a poor user experience, and therefore should not appear at the top of the search list even if they contain relevant content.

This is not something Google made up -- the effects of slow speed, or latency, are well documented.

Matt McGee quotes interesting statistics on the impact of latency on traffic behavior:

google-web-speed-statistics


Amazon found that every 100ms of latency cost them 1% in sales. Google found that an extra .5 seconds in search page generation time dropped traffic by 20%. James Hamilton (Amazon) provides more references in his post The Cost of Latency.

How to Improve Web Speed

What makes web-speed improvement difficult is that it involves multiple layers, some of which you don’t really control. It depends on the browser and how fast it processes CSS, JavaScript, etc... On the size of your pages and images, where your site is physically located, and the actual server-side architecture.

There is plenty of information on how to achieve better web speed from your browser to your server. Steve Souders book High Performance Web Sites published 14 rules for performance optimization:

  • Make Fewer HTTP Requests
  • Use a Content Delivery Network
  • Add an Expires Header
  • Gzip Components
  • Put Stylesheets at the Top
  • Put Scripts at the Bottom
  • Avoid CSS Expressions
  • Make JavaScript and CSS External
  • Reduce DNS Lookups
  • Minify JavaScript
  • Avoid Redirects
  • Remove Duplicates Scripts
  • Configure ETags
  • Make Ajax Cacheable

Web Speed at Large Scale 

Steve's 14 rules apply to any site of any size. However, large-scale sites require additional special treatment. Large-scale web sites depend more heavily on the architecture. Non-scalable architecture can lead to devastating results under load. In this section I will discuss how to control web speed at large scale from an application architecture perspective.

Brief History

In Dec 2007 I summarized the main lessons from Google, Amazon, and LinkedIn large-scale web site architecture in this post. On December 8th, 2008, I wrote a response to Todd Hoff's post on highscalability.com, Latency is Everywhere and it Costs You Sales - How to Crush it - My Take. In those articles I tried to provide architecture guidelines on how to control latency in large-scale environments. Most of those lessons still hold true today. In this post I want to update some of those lessons based on recent experience with social networking and the emergence of NoSQL alternatives.

The Emergence of Read/Write Web

During the past two years, social networking has significantly changed the web experience. Today’s web sites deal with viral traffic behavior, as can be seen in the twitter traffic sample below. In addition, most of the content on these sites is now written by external users rather than by the site owner.

image

These differences in behavior led to a demand for read/write scaling as opposed to read-mostly scaling to deal with continuous scaling demands. This later led to the emergence of the NoSQL alternatives which started to pick up during the first quarter of 2010.

Last week I had the honor of a visit by Cees de Groot and members of his team from Marktplaats/eBay. Cees designed large scale web application for years, specifically in the eCommerce area. He was recently in charge of a design of an Adword Service (see reference here) and is now working on moving their entire site from a database-centric PHP architecture to scale-out architecture in Java.

Here are some of the main takeaways from our discussions on how to improve web performance in this new age of read/write scaling demand:

  • Improve Data Speed and Scaling

Data access is probably the most notable area of contention in many sites. In a large-scale system, data contention means that concurrent user access to the same table or data item is serialized due to locking. This makes one user request dependent on the other, and therefore, as the scaling grows it will have more and more significant impact on latency.

In many e-commerce sites, product inventory, product catalog, and user profiles tend to be typical areas where this sort of contention happens. In online gaming sites this would apply to the user profile and also to the gaming table. The important thing to note is that not all of our data is exposed to this level of contention. Understanding where the contention happens is the first step in solving the problem.

There are various option to reduce the contention points depending on the access pattern:

Read Mostly

In a read-mostly scenario, many users try to fetch the same content at the same time. Only a few users actually update the content, and even fewer share the content and try to update it concurrently. A large portion of large-scale web sites use memcache today to handle their read-mostly scaling scenarios. Memcache is extremely simple and it exposes the key/value store API. At the same time, Memcache is fairly limited as it doesn't provide consistency and high availability, and therefore cannot be used as a system of record. This means that update operations still need to go through the database, making memcache suitable for read-mostly scenarios but not suitable for write scaling scenarios.

Write-Intensive with High Latency

A write-intensive scenario means that the insert/update rate reaches fairly high levels (compared with read-mostly scenarios). It doesn't necessarily mean that the write rate is higher then the read rate, but rather that it's high enough to hit the limit of the database. Many social networking sites fall under this category as most of the site content is driven by users and not by the site owner.

NoSQL alternatives such as Cassandra can manage write scalability, but in most cases at the expense of consistency. With NoSQL alternatives, getting consistency between write/read comes at the expense of read latency (to ensure read consistency you need to read multiple copies of the same data from all the replicas). Furthermore, write latency is still bound to disk. So with NoSQL we can remove the scaling overhead of read/write but we don't come close to the latency that a memory-based solution such as memcache provides. A good reference to the type of performance that you could expect from some of the file-based NoSQL alternatives is provided here and here.

Read/Write-Intensive with Low latency

In this scenario it is not enough to manage the scaling of our write/read operation -- we need to be able to reduce the time it takes to perform the actual read/write operation. A good example is Twitter. With Twitter, read latency provided by NoSQL alternatives could be too slow to meet overall performance goals.

The solution would be very similar to the one provided by file-based NOSQL alternatives, only that it would be entirely based on memory .

The emergence of large memory devices such as Cisco UCS makes it possible to store Terra bytes of data purely in-memory (see Memory is the New Disk). Unlike memcache, an in-memory data grid such as the one provided by GigaSpaces turns memory devices into a transactional data store that can act as the system of record. This makes them suitable for handling both read and write scaling at extremely low latency.

You can read a more detailed description on how an in-memory data grid can be used for read/write scaling of existing databases in my Scaling Out MySQL post.

How Can I Manage Read/Write Scaling if I'm Already Using Memcache?

Write scaling pressure seem to be pushing sites like Twitter, Digg and others toward NoSQL as a replacement for memcache + MySQL as noted in this article: MYSQL AND MEMCACHED: END OF AN ERA? 

Having said that, many of the sites are already heavily invested in memcache so the implication of that transition translates into fairly significant rewrites.

One way to avoid this rewrite exercise would be to turn memcache into a transactional data-store just as its close in-memory data grid relatives.

Because memcache is basically just a client/server protocol, we can easily add memcache support to an existing data-grid. In this way we can use memcache as a system of record that can manage both read & write scaling. As a matter of fact, we're just about to announce our first memcache support for GigaSpaces for this exact purpose. Other data-grid providers are expected to announce their support for memcache as well.

  • Improve Dynamic Page Load Time

Web page content is derived from many sources. The use of asynchronous calls to those services makes it possible to parallelize page rendering, reducing the time it takes to build the entire page content significantly. The diagram below illustrates how parallel page part fetching works:


image

Sequential Page Part Fetching

imageParallel Page Part Fetching

 

While all this makes sense, writing the actual code to do this sort of parallel fetching might not be as trivial as it seems. One of the things that can make this work much simpler is the use of Future. The following snippet taken from GigaSpaces documentation illustrates what this API looks like:

 

image

The Sync mode is where you call an executor to fetch content from a remote service and then use the Future handle to poll for the result of that call at some other point in time. Even though the user would call the execute method sequentially it wouldn’t be blocked for its execution and therefore the actual execution will happen in parallel.

In the A-Sync mode we are not polling for the result but instead we will get a call-back with the actual results. This could be an ideal way to combine async execution with Ajax, where the callback method could be used to update our page asynchronously after the user already loaded the page given the user a fairly low latency experience.

  • Use On-Demand Scaling to Ensure Latency Under Load

Viral traffic behavior means that we can’t predict the load. Or to be more precise, we need to be ready to change our site sizing more frequently then we were previously accustomed. The current practice of over-provisioning based on the busiest hour of the busiest day doesn't hold up anymore: 1) You end up with a huge investment to meet the peak load traffic, 2) Provisioning for peak load leads to average-low utilization during regular hours. To be able to handle this type of traffic behavior we need to design our site for on-demand scaling. On-demand scaling involves the following steps:

Monitor the current traffic

If latency grows beyond a certain threshold, add another web container and update the load-balancer with the IP address of that container. You can read more on how you can take an existing Java web application and add dynamic scaling without changing a single line of code here

Control your user traffic

Whether or not you managed to add dynamic scaling, you are always going to be bound by the amount of physical resources you currently have. One of the worst thing that could happen is that your site will crash as a result of an unpredictable peak load. If the Google search bot happens to visit your site while it is down, Google will “punish” your site severely and remove it completely from  search results for quite some time (this is one of the SEO experts' worst nightmares, I've witnessed this on several occasions as my wife is becoming an SEO expert herself). It is therefore considered a best practice to control your user traffic and put a limit on how users can access your site. For example, Twitter limits the size of your tweets, and the number of tweets each user can post in an hour. This is also a good practice to protect your site from malicious attacks. In other words, it is better to send denial of service to some of your users then to lose them all.

Final Notes

In brick and mortar stores we've known for a long time that slow customer service will turn customers away. The retail industry has put a lot of effort into improving customer service and reducing the time customers spend waiting in queues or not being answered. Not surprisingly, we're seeing analogous developments with web site traffic. Slow sites turn customers away. Various measures such as the one presented by Google shows a direct correlation between web site latency and user traffic behavior. Lower user traffic translates immediately to fewer purchases on your site and therefore loss of potential revenue.

Google made a right move by adding web-speed to the search engine ranking. As users, this will help ensure that we get better service from sites that want to be at the top of the search list. For site owners, this places site performance as a much higher priority. For us techies, this is also good news. It will make our work of justifying why we should use better architecture easier as we can easily measure how our work translates to real business value.



References


Tagged on: