How to Optimize Drupal with Caching
In a previous blog post, we covered three simple things at the Admin level for optimizing Drupal websites. This blog takes a deeper look at ways to use caching for performance improvement.
As always, before digging into Drupal-specific fixes, remember that your website is part of a total webserver environment. It’s a good idea to take a step back first and determine whether your performance issues are an infrastructure problem that could be resolved by troubleshooting your server, bandwidth, or database.
Drupal and dynamic content
The way Drupal builds a web page affects your caching strategy, so it’s worthwhile taking a minute to walk through this.
Static pages deliver quickly because except for the lookup, a webserver can send a static page immediately to the browser.
Fig. 1: Static web pages
Drupal’s default is dynamic content. The moment a user (human or app) clicks on a URL, Drupal begins pulling together all the building blocks of content that need to be rendered independently, then bundles them all together into a page for the browser. What this means is that instead of serving up static HTML pages, Drupal executes a series of tasks: loading up settings and modules, connecting to the database, initializing the user session, running the business logic in each module, and a myriad of little steps for every page request.
Fig. 2: With dynamic content, Drupal must process each page request
If static pages are at one end of the resource spectrum and dynamic content at the other, then the goal of caching is to reduce resource consumption by minimizing the work Drupal has to do to render pages.
Fortunately Drupal can cache pages and even parts of pages, so depending on the situation, you can deploy caching to reuse an entire page or part of a page. By default Drupal caches directly to the database. Therefore as more and more items get cached, you need to pay more attention to database interactions to improve performance. You may want Drupal to cache to an external system or backend.
NOTE: For more background and a step-by-step description of how Drupal builds a page of dynamic content, read the blog titled A High Level Look at How Drupal Works, especially the section ‘How Drupal Renders a Page’.
Formulate a Caching Strategy
Content can be cached in different ways, and there are tools designed especially for Drupal caching. However, one of the worst mistakes you can make is to jump into caching without first understanding how your pages and traffic behave.
If you have a lot of anonymous traffic, a good caching strategy can be as simple caching entire pages. If you are serving a high volume of authenticated traffic with pages personalized by user (“Hello Jane Doe”) or by session (“Welcome back, Jane Doe”), you may need to get more granular and look at individual building blocks as well as which pages are visited most often.
You also need to understand how the various caching tools work so that you can take advantage of the features most relevant to your situation.
Page caching for anonymous users
If your website contains a lot of pages that might as well be static because user context (anonymous or authenticated) doesn’t matter, then you can apply simple page caching. Each time there is a page request, Drupal checks to see whether a copy of that page or the data to build that page already exists in the cache. If there is, it can bypass all the work of pulling together a page.
Fig. 3: Cached copies speed up getting a page to the browser
Drupal normal caching
Drupal has built-in caching and by default it’s turned off. To enable in Drupal 7, look under ‘admin/config/development/performance’. When you enable page caching for anonymous users, Drupal will start caching; instead of generating each requested page again, Drupal will give the next anonymous user the cached version.
Fig. 4: Enabling anonymous page caching in Drupal 7
This module lets Drupal serve requests from static disk files. Whenever a page is accessed, Boost generates an html page which it saves to disk. On the next request, Drupal will check whether there’s a static HTML file it can serve up before trying to generate a new one. Boost includes a crawler that runs on cron to regenerate expired content in a timely manner to maintain fast page loading.
This reverse proxy HTTP accelerator is extremely popular in the Drupal world. Think of it as an external cache which sits in front of the webserver and retrieves resources on behalf of Drupal, then caching them in memory, saving to disk when needed, and serving up cached pages. Because it’s dedicated to caching, it’s extremely fast, faster than the webserver.
If it’s a first request for the page, Varnish passes it on to the web server, and caches a copy for next time. If the page is already cached in Varnish, the web server never has to deal with the request.
Fig. 5: A first-time page request
Fig. 6: Subsequent page requests
A Drupal page can consist of blocks, entities, listings, regions, and many other objects. In many situations, a page is mostly static content. This means, however, that even if there’s only one piece of dynamic content, when that bit of content changes, the cache clears and that entire container of content has to regenerate to render a new page.
Partial caching lets you regenerate just the parts that change. When it comes to partial caching, you really need to know how your pages are built in order to determine whether or not it’s worth the effort, so involve a developer. Some choices are:
Block caching: Set in ‘admin/config/development/performance' to enable block caching. See Figure 4 (above).
Entity Cache: In Drupal, entities are content. An entity can be a node, a user profile, a comment, or a taxonomy term. By storing a fully-loaded object into cache, Entity Cache lets Drupal bypass SQL queries and other resource-intensive work associated with loading up content. Drupal can just retrieve the whole object out of cache.
Views Content Cache: the best way to describe this is that it’s content-aware caching for Views, the most popular module in Drupal. As a bonus, if you use Views to generate a block, it opens up more granular Drupal block caching for: per page/ per role/ per role per page/ per user/ and per user per page – as well as global caching.
Page caching for authenticated users
Although you can apply page caching for authenticated users via a single module, Authcache, you still need to implement this caching mechanism with great care. Work with a developer to implement this one. You’ll need a solid understanding of how your pages are built.
This is because the concept behind authenticated user caching is an automated form of partial caching. It involves identifying parts of a page that change for each user (the customized parts) and providing a placeholder for them. When the page gets cached, the caching mechanism ignores (or has no effect on) the placeholder (customized) content. This means for each page you want to cache, you need to identify each element that will be customized and set an authcache policy for that element.
Authcache will use whatever caching mechanism is already in place: Varnish, Boost.
Other caching mechanisms
APC: Alternative PHP Cache is useful because Drupal uses a huge number of PHP files in every page load. Loading these up for each page request uses up CPU and disk I/O resources. If there are any files that don’t change between page loads, APC stores them in RAM to speed up retrieval. Useful when the web server spends a lot of time executing PHP code.
Memcache: Caches database objects in memory. Use when your database server is working too hard because of too many connections and long query load times.
Redis: A “key-value store” that boosts database performance. Redis runs entirely in-memory for fast access. Drupal professionals with more complex requirements have been switching from Memcache to Redis because it supports richer data structures such as hashes, lists, and strings.
Clear your cache, but not too often
When Drupal caches, it creates copies of data. Over time the copies can get out of sync with the live data. By clearing the cache, you start again with fresh data. ‘Clear all cache’ is something Drupal administrators run using cron.php.
Remember however that after you clear the cache, pages need to be generated again from scratch and that takes time. A ‘warm cache’ delivers the fastest response time. For example, if you clear your cache every hour, then every hour, you’re back to a non-cached situation that may be detrimental to the benefits of caching. The first time a user requests a certain page that hour, that page has to regenerate. Remember also that the cache also clears during node (page) edits and when comments are posted, so you may not need to clear your cache as often as you think. Start by clearing cache once a day and monitor. Most small-to-medium business websites don’t need to do this more than once every 6-8 hours.
Two other settings relevant to caching are:
Minimum cache lifetime applies not only to cached pages but to all cached objects. If you set this to 5 minutes, for example, you’re telling Drupal that if anything new gets created within this 5-minute interval, it’s OK if users don’t see it. It also means that any cache-clearing/update action can run only after the cached content is at least 5 minutes old. If your site does not support heavy traffic, just leave this value as ‘none.’
Expiration of cached pages lets you configure settings that work with external caching systems. If you set the ‘max-age’ value to 30 minutes, for example, the external caching system will serve up the page from its own cache for 30 minutes before checking back with the Durpal server to see whether the content on that page has changed.
There are no hard-and-fast answers to caching. There are as many tools and tricks as there are issues for every business website. For any performance problem, the best approach is to monitor and analyze before leaping in. Is there a particular page or a View that’s the culprit? Is the entire website sluggish? Are any queries taking an unusually long time? As Drupal.org says, maximizing performance for Drupal requires understanding all the spokes in the wheel that makes it run.