With increasing amounts of data and server-side processing, caching systems are as important as ever. There are so many varieties of these systems out there, it can be confusing if you don't understanding the different nuances. Worse yet, if you pick the wrong caching system for your needs or fail to optimize it correctly, the results could be disastrous.
There are two broad types of cache when it comes to web sites and web applications. Client side, and server side.
Client side caching is what your web browser does. It saves the HTML and other content of a web page on your hard disk. This prevents you from repeatedly downloading the same resources from the server over and over again.
Server side caching is when the web server or web application saves these resources. This prevents the server from having to go through the processing required to build these documents. The rest of this article will be about different server side caching tools.
My goal in this article is to give you a basic overview of caching systems for web based applications, with a particular focus on PHP and Drupal. The 3 systems I'll cover are APC, Memcached, and Varnish. I'll also talk about Drupal's application cache and how that can tie in with APC or Memcached. Let's dive in.
Alternative PHP Cache (APC)
APC provides two caching mechanisms.
- APC caches the opcode generated during the PHP execution cycle and then later will be able to avoid loading and parsing PHP files repeatedly.
- APC also provides a key-value, object cache for PHP applications in shared memory.
The opcode cache can speed up PHP applications by as much as 3 times. We find it particularly helpful in speeding up Drupal. That's because Drupal code is distributed over many different modules and consequently many different files. APC cuts down on the disk I/O by caching the bytecode from the PHP compiler. The next time these files are needed, they won't have to be loaded from the disk or parsed by the PHP compiler. This is great but an important thing to keep in mind is that APC's opcode cache only caches the PHP bytecode. It still has to execute that PHP, which will run database queries and build the HTML output.
The object cache simply allows your application to cache arbitrary pieces of data in shared memory and retrieve them later. For example, if you have to query some tables to build a data object, you can cache the data object with APC and fetch it later by a key that you define when you store it. You can then keep retrieving it from shared memory until it expires and then re-build and re-cache the data object.
Memcached
Memcached is a distributed, key-value, object cache in memory. This is similar to the object cache provided by APC but there are some important differences. It's in-memory, while APC's object cache is in shared memory. This will make Memcached faster, but will also require the memory allocation for it's storage. The other major difference is that Memcached is distributed. This means that it runs across multiple servers. For example, if you're load-balancing your application. Generally, the need for a distributed object cache is why you would use Memcached.
Websites we currently know of using Memcached:
- Wikipedia
- Flickr
- Youtube
- Digg
- WordPress.com
- Craigslist
Varnish (Web Application Accelerator)
Varnish is a caching HTTP reverse proxy. The reverse proxy part means that it sits between your application and the outside world. Visiting your domain will actually connect to Varnish. Varnish will then make the corresponding request to your application and then deliver it to the client. It will cache the results of these requests based on a configuration file you can write in the Varnish Configuration Language.
Varnish is neither an opcode cache like APC's opcode cache component, nor an object cache like those in APC or Memcached. Varnish operates outside of your application and caches the entire HTTP response such as the whole HTML document returned by your application.
Varnish is extremely effective because it will cache the final resulting HTML document after all PHP or other server side processing is done. This means that when varnish delivers a page from cache, it avoids running the PHP code at all and consequently any database queries involved in generating the document. Delivering a cached page from Varnish is like delivering a static HTML file.
Varnish is built for dynamic and content heavy sites so it allows you a good deal of flexibility in controlling how pages are cached using the Varnish Configuration Language. It also lets you dynamically expire cache entries from your application.
This flexibility and speed has made it very popular and effective with Drupal, though it can be tricky to configure.
Websites we currently know of using Varnish:
- BBC
- Wired
- Vimeo
- Zappos
- MorningStar
- Business Insider
- Thinkgeek
Application Cache (eg. Drupal's caching system)
An in-application cache like Drupal's built in caching system is used within the application to save calculated values or generated output for later use. Drupal is filled with operations that can be cached. When you put content into an input field, it's run through various input filters. This output can be cached so the same content doesn't have to be filtered again. Similarly, when executing a hook, Drupal has to determine all the modules that implement that hook. This is information that it caches after it scans all the modules. It will re-scan when you enable a new module or if you clear the cache. The resulting cache data can be stored in different places but it typically is stored in the database and in the file system. It can also take advantage of APC or Memcached object stores to hold this data. This is better because it will be stored in shared memory or in memory.
Conclusion
These caching systems will cache data at different stages of the application life cycle, so they can certainly be used together but you should be mindful of how they interact, and what each one does. Let's break it down with a quick re-cap.
- APC opcode cache will cache your PHP opcode. There's really no reason not to use this with PHP. It will simply make your PHP run faster.
- APC object cache can be used as a shared memory back-end for your application cache.
- Memcached can be used as an in-memory, distributed back-end for your application cache.
- Varnish can be used as a reverse proxy to externally cache HTTP requests.
You would choose between APC and Memcached as a back-end for your application cache, but otherwise you could use the APC opcode cache, an object cache back-end, and Varnish all together.
It's good to use multiple caching strategies and have separated caches because you can cache different aspects of your web application and you can also employ different strategies for clearing each cache. The more pieces of cached data that are valid, the faster your application will load. A good caching structure will mean that on any request to your server, only a small portion of the cached data needs to updated and you can still provide fresh, dynamic content for your users.