John Gruber of Daring Fireball fame wondered “[…] why WP Super Cache (or some other caching solution) isn’t part of the default WordPress installation” in a recent blog post, so I thought I’d write a blog post explaining why.
First though, a bit of background. When you visit a WordPress-powered website, WordPress asks the database software for some details such as the post contents, details about the authors of the posts to be displayed, the comments for that post (if you’re in a single post view), and so forth. The database is really good at this kind of thing, but if you ask too much of it (too many queries), it will start slowing down and eventually start not replying in time before the webserver software (Apache, etc.) gives up on it. For the vast, vast majority of WordPress sites, this never happens. The database chugs happily along responding to queries and you never have any problems.
However if you get linked to from a popular site such as Daring Fireball, Digg, or Reddit, the high number of page requests will soon overload your database software. This is because most people host their sites on what’s called shared webhosting where you and a bunch of other people share a server. The server doesn’t have much spare CPU power as hosts want to make the most out of the servers. It’s a case of you get what you pay for.
To help avoid this, WordPress does actually have a built-in cache called the object cache. It was introduced way back in 2005 and it basically caches database query results. When displaying a post for example, WordPress usually asks the database for information about the author of that post (name, etc.). When another post is displayed by that same author further down the page, instead of asking the database for the information again, WordPress fetches this data out of it’s internal cache. This simple little feature greatly cuts down on the number of queries required to generate a webpage for a visitor.
However as soon as the page is done being generated, that object cache is discarded. Initially the object cache cached these little chunks of data to the filesystem so that they could be reused on subsequent pageviews. While great in theory, the concept turned out to be terrible in practice. You see WordPress has to be able to run on countless different types of server configurations. One common enough configuration that had trouble with this file-based type of caching was where all of your files were network based. Rather than your website’s files being stored on the server that was generating the webpage for the visitor, they were stored on a different server elsewhere in the datacenter. Having to fetch all of these little cache files over the network was a slow and resource intensive exercise. And that’s only one example of many configurations that the file-based caching had trouble with.
What’s worse is that many sites were actually slower with a persistent object cache enabled. Databases are actually quite fast (as they’re designed to store lots of little chunks of data), often faster than fetching that data from the filesystem and decoding it. So for the vast majority of sites, the ones that get relatively low traffic, having a persistent cache was a lot more trouble than it was worth. Just having a temporary object cache is enough.
While the same issues don’t really apply to WP Super Cache and other plugins total caching plugins because they caching each page to a single file instead of multiple little files, as with a persistent object cache, it simply isn’t needed for most sites due to lack of traffic. More importantly though, they require setup and there’s a tradeoff as your blog is no longer dynamic. Things like “latest comments” will be out of date for example.
So the WordPress developers have instead decided that the best course of action is to make it really easy to use a caching plugin, both behind the scenes and from the user perspective. You can install a plugin in just a couple of clicks directly from the admin interface.
Infact this is how WordPress development is done as a whole — include things that the majority of users will want and make it easy to install a plugin to do the rest. The core of WordPress should stay as slim as possible — bloat is bad!
I hope that clears up the decision why not to include persistent caching by default in WordPress. 🙂
TL;DR: Persistent caching isn’t needed for most sites and is a pain to get working in all hosting environments. Instead WordPress makes it really easy to install a caching plugin if you need it.
I agree in general that caching lives outside the core functionality.
Another reason worth mentioning is that the vast majority of sites running WP (though Automattic, through wordpress.com stats probably knows better than I) have fewer than 30 visits a day.
Easy implementation and installation is a key part of WP’s success – many who are big users today (like us) just tried it out once on a cheap host they had lying round. In my case the rather dreadful Yahoo! hosting offer about six years ago.
What would be nice would be to shift the underlying architecture of WP to use the database in a cleverer way – the configuration of tables, and the way many items (such as options) are stored, could lead to better performance. Using memory tables (as we did for our Spectacu.la Discussion plugin) can make for stunning performance, for example.
I also believe that on large, busy sites, the built-in search is something of a liability, as well as providing poor performance. We fixed that too (see http://wordpress.org/extend/plugins/spectacula-advanced-search/ ) but although it can recognise the different database capabilities there are some hosts with clustered mySQL implementations that won’t tolerate large indexes. So whilst I would love to suggest our code for core (albeit in modified form), I’m not sure it would ever be accepted due to the potential for making it harder to implement the core product.
It’s a tricky one to get right. I come from a heavy iron background, so my tendency is to assume more control. That works well when working for a large newspaper or publishing group, less well when working for a first-time blogger.
I look at WordPress more as a CMS framework than a do everything CMS. It gets you going and has the basics of what you need to make a blog or normal website built-in. More importantly however is that it also has the capability to allow you to do so much more without forcing that on users who don’t need it.
I tried Joomla for example a few years back and it was overwhelming. So many options and other various thing.
Anyway, as you said — most blogs don’t get the traffic to need caching or any type of complex optimization. The ones that do can easily install solutions. 🙂
A nice side benefit: With dynamic publishing, you don’t need to publish your site to see changes. I’m in the process of redesigning a forum using Movable Type, with the database being migrated from something else (not sure what, that’s not my part), but it goes back to like 1999 or 2000. The small sample of content that we’re using as test data took 17 minutes to publish the other day. I’ve heard that the full database dump takes four hours. It’s probably an extreme example, but even with a really small bit of dummy content, that takes about a minute or so. That minute adds up over the course of a day (especially with copying and pasting into MT). I love the save and hit reload nature of WordPress.
Great explanation Alex and I also agree caching should be left as a plugin for those users that need (or think!) they need it.
And let’s not forget: MySQL database servers are often configured with their own query cache, which will persist across requests without any help from WordPress. In many cases, you may already be getting some cache benefits from that, even if it’s harder to quantify. You still have to make a call to the database, but if the information is already cached, it doesn’t have to actually do any work, just return the results from the previous time you asked!
Pingback: links for 2010-08-10 | Links | WereWP
It really depends of the traffic and the infrastructure.
On my side, I cache some parts of the HTML in files (CACHE_LITE) to have some pages MySQL connection free. This is a saver to lower the number of opened connections. (the max connections + memory consummed was always an issue: “too many connections…”/Swapping).
This is a very surprising decision, given the number of WordPress sites that I’ve seen go down (including my own, and others I’ve run) when under heavy traffic. MySQL *can* be very efficient, but it’s pretty easy to max out connections when Digg links to you.
Also, the inefficiencies in re-generating the same page over and over again are part of the reason that shared hosting is can be so terrible. With no options for built-in caching from such a popular platform, WordPress is taking part in a decision to doom datacenters everywhere. Is it really that hard to implement a few different options for caching, whether it be file-based, or integration with memcached, where available?
Did you read my post? I explained why non-full page file-based caching is terrible and I don’t know of any shared hosts that have memcache available. I also explained the disadvantages of full page caching.
Plus what about the majority of blogs that never need caching? The code would be bloat to them.
WordPress strives to serve the majority while still making it easy for the minority via plugins. It takes only a few moments to install a caching plugin. 🙂
Pingback: GoTop’s Blog » Wordpress ?????
Pingback: Michael Tsai - Blog - A Plea for Baked Weblogs
Well, I understand why a feature that is only useful for a small percentage of users has to be scrapped from the plans and let the plugins handle that. It makes sense to think like that.
However, I do not agree with your opinion. The reason is that most caching plugins do not work properly:
– They cannot work well together with mobile plugins (wp-touch pro, to name one). Mobile pages have to be served non cached, excluding those UAs from the cache plugin configuration. This is a pity because more and more we’re getting a higher share of traffic from mobile devices. We can use a responsive theme to deal with mobile devices but that brings other issues (for example same page weight as desktop devices).
– As you mention in your article, there are issues like not being able to show fresh content.
– There are many other things in WordPress “basic” installation (sans plugins) that are only used by a minority of users. Still, they were included.
Caching is hard. We all know that. But I would like WordPress to be also aimed to users who earn their living thanks to this awesome software. If you only think about all the users who open a blog using it and then get 30 visits a month, I think we’re missing the mark this very good CMS was created for. just 5% of users might seem to be a small percentage but we might be talking about dozens of thousands of people here, because WordPress is being used by many, many people this days. So even if dozens of thousands of people is a minority, I think it is still a lot of people. On top of that, if this people stops using WordPress because they cannot handle the server load properly, the most popular websites not using this CMS is not a situation Automattic would like to be in, I presume.
WordPress has changed the life of many people for good, and it is great – but I am sure it can be greater, and one of the most important and necessary features it lacks is a proper page (not only db) caching system that plays well with plugins and at least the most common LAMP environment. I am sure Automattic can pull this out.
Yes they can. WP Super Cache has an option to enable caching pages served to mobile users separately from desktop users. Batcache (used on WordPress.com) has a similar configuration ability.
Any type of caching is going to have this problem. It’s just how caching works.
Anyway, WordPress is not meant to be a one-size fits all kitchen sink solution. It would be incredibly bloated if it tried to be everything to everyone.
Instead it provides the framework for you to get most of the way there and then makes it easy for you to tailor it to exactly fit your needs.
For example, the caching API is right in WordPress: http://core.trac.wordpress.org/browser/trunk/src/wp-includes/cache.php
And then with a single file dropped in your
wp-contentfolder, it can meet the needs of your use case.
Regardless, most shared hosting servers don’t have memcached, Xcache, or any other non-file or non-DB caching solution available so even if WordPress supported those out of the box, they couldn’t be used.
Automattic doesn’t own and isn’t in charge of WordPress (the software). It’s run by an independent non-profit. 🙂
Great help, thanks
You should have the tl;dr; at the top 🙂