Netgen will be presenting at OMR24 in Hamburg Meet us there on May 7th & 8th!

Overview of caching in eZ Publish new stack

by Ivo Lukač - October 15, 2014

Summer is over and we are returning to our usual duties: from working on projects to sharing our experience. As we started releasing sites based on eZ Publish 5 and Symfony, it is time to tell what we learned so far, at least some of the more interesting parts. And what would be better topic if not caching :) The post was written while preparing presentation for Zagreb PHP Conference earlier this month (slides can be found here).

Caching system is one of the more important parts of any complex web site. As profile of our projects grow in size and complexity it became vital to learn how to optimally configure the cache. With 10 years of experience in eZ Publish 3 and 4 (now called legacy) we mastered all its caching layers. We knew what are the capabilities, what are the limits and how to get most out of it.

With emergence of eZ Publish 5, one of the first parts of the CMS that changed a lot was caching. So we needed to learn it from scratch. We are still learning it. This post shares what we learned so far.

The most important thing to remember is that the new stack uses HTTP Cache implementation of Symfony and its HMVC capabilities to cache the output. For comparison, in legacy eZ Publish we used Content Cache (enhanced with Smart View Cache) and Template Block Cache to cache the output of page parts.

Legacy

Content Cache was used to cache the output of the content module as a HTML chunk in a file on the disk. CMS took care about the invalidation natively so the use was really simple, enable/disable in the settings. In a bit more complex cases Smart View Cache rules needed to be set to automate the invalidation of related locations (nodes).

As Content Cache was caching only the "inner stuff" generated by content module (rendering was strictly divided between module and layout), it was important to use cache blocks in the main layout template so things like header, main menu and footer got cached as files too. With a good caching strategy the content page could be executed with just few queries to the database (if the session was handled natively in PHP front page could be without and query).

All in all this was good caching system but not enough performant for bigger traffic and it was hard to scale. Biggest problem was the fact that cache was stored to the disk. With several web servers, handling those files was hard. eZ had several approaches, the last was the eZ DFS cluster which utilised NAS and some smart syncing scripts.

There was few other solutions:

using SAN storage for cache: quite expensive for some decent storage throughput
using Static Cache: basically dumping all pages to disk and using apache rewrites to completely bypass php and serve the whole HTML page. This was very fast and very inflexible: no possibility to have page variance (per url parameter, per user, etc.)
using Varnish reverse proxy to cache pages: more flexible then static cache and more performant overall, but the integration with CMS was not so tight

So bigger projects were usually done with all possible caches enabled + Varnish in front but with low TTL on html pages so that changes in content propagate on time without editors freaking out. That caused frequent cache invalidation and could hurt the backend. In edge cases the database would become a bottleneck especially if MySQL was configured poorly without enough query cache and so on. There was no way to cache database query output on the web server side to ease the database load.

New stack

One of the first good traits of eZ's new stack is the new caching system. It is based on solid Symfony grounds and on HTTP standards. And it solves some of the issues from legacy mentioned above.

Important concepts to grasp:

1. Symfony has native HTTP Cache implementation for output caching. It basically means that output caching is not done by the web application itself, but left for intermediate caches like reverse proxies, gateway caches and CDNs. As the protocol is well defined in HTTP specification it should work with any system that supports it, being Varnish, Squid, Akamai or something else. There is a native reverse proxy within Symfony written in PHP but its only for development and testing and not for production use.

2. HTTP Cache defines 2 main ways for caching: Expiration and Validation. At this point eZ 5 supports the Expiration model with additional invalidation triggers (explained later). Validation model (ETag) is at the moment not completely supported due to some scenarios where its not behaving very well. Anyway, the Expiration model should be enough if configured well.

3. Expiration model is simple. In the response header a Cache-Control header is set with few parameters. If the page should be cached only on the client first parameter should be set to private, if it can be cached on intermediate caches it should be set to public. Second parameter is TTL in seconds: max-age. If the TTL for intermediate caches should be different, s-maxage should be used. That is basically it.
Much more info could be found about the topic on Things Caches Do page.

In Symfony/eZ5 control is simple, when dealing with Response object in the controller its possible to set the headers:

use Symfony\Component\HttpFoundation\Response;
$response = new Response();

// mark the response as either public or private
$response->setPublic(); 
$response->setPrivate();

// set the private or shared max age
$response->setMaxAge(600);
$response->setSharedMaxAge(7200);

4. It is very important to note that HTTP Cache is implemented on the Response object level. As Symfony supports hierarchical MVC, it is possible to call a second controller action (make a sub request) within a first controller action or view (twig template). Even more, sub requests can be exposed as fragment urls and cached on intermediate caches like any other page. This will prove usable later.

Enabling fragments is also simple in ezpublish/config/config.yml:

framework:
  fragments: { path: /_fragment }

4. By default in Symfony there is a integration with the reverse proxy (Varnish in particular) to invalidate the cache by calling PURGE method to invalidate intermediate caches. Varnish needs to be configured to support the PURGE call, in the vcl file:

sub vcl_recv {
  if (req.request == "PURGE") { return(lookup); }
}
sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged";
  }
}
sub vcl_miss {
  if (req.request == "PURGE") { error 404 "Not purged”; }
}

Expiration by itself would not be much efficient if there would not be some kind of invalidation going on automatically. To support this eZ new stack adds one custom header to the Response: X-Location-Id. The value is set to the location id of the page by the default ViewController.

How eZ new stack caching works

if output rendering is done by legacy all configured legacy caching will be executed. For instance if the full template is overridden in legacy and the content module is called to render it, legacy content cache is also executed. If some of the rendered legacy templates has cache-block statement and the appropriate setting is enabled the template block cache will be used.
default eZ view controller sets the Cache-Control to: public, s-maxage:60. That means that Varnish (or any other intermediate cache) should cache the page for 60 seconds by default. There is no caching on the client as no max-age is set. The controller also sets the X-Location-Id header to locationId of the content viewed. Additionally it sets the Vary header to custom X-User-Hash so that intermediate caches differentiate content for different users and cache it accordingly. Note that no cache is being generated locally. Only response is getting tagged with cache information so that Varnish (and the client) know how long should the url be cached.
if there is a change in the backend (some content is republished) and invalidation of content cache is triggered for a certain location, e.g. value 123, either directly or via smart view cache rules, eZ will call PURGE method against Varnish to invalidate cache for all urls with response custom header X-Location-Id: 123. With this feature it should be safe to set the default content TTL to much larger amount, instead of default 60 seconds.

Now we learned what replaces Content Cache in the new stack. Keep in mind that there are some significant differences. Besides the fact that the cache is not utilised locally, it is important to note that the full view of a content page is cached as a whole. Remember, the content cache in legacy was just the "inner part".

All in all, by utilising HTTP Cache content caching just become much more scalable :)

What about template cache blocks

Yes, what about it? Do we need it at all, now that the page is cached completely in Varnish? It might not be necessary in some cases (e.g. sites with not so many pages), but on bigger sites its not very wise to generate the main menu always when the whole page is being generated. Main menu has usually same markup on every page (though it might have few variations for selected item states) and it changes very rarely. It makes sense to cache it somehow independently from the page.

As I already mentioned Symfony supports sub requests and fragment urls. It is very simple to create a sub request for the main menu and have different caching strategy for it, presumably with longer TTL and with no X-Location-Id header. But how to integrate this with Varnish?

Edge Side Includes (ESI) are the answer. The specification is fairly large, but for this use case we just need the basic stuff and that is well supported by Varnish and Symfony. On one side ESI needs to be enabled in Symfony (its not by default) in ezpublish/config/config.yml:

framework:
  esi: { enabled: true }

On Varnish side special request header needs to be set to invoke the backend to send the ESI markup:

sub vcl_recv { 
  // Add a header to announce ESI support.
  set req.http.Surrogate-Capability = "abc=ESI/1.0";
}

If the sub request is then called with render_esi (instead of the default render) in your twig template:

{{ render_esi( controller(  'headerController:mainMenuAction',  { 'selectedLocationId': '123' }  ))  }}

Symfony will generate following markup within the returned HTML:

<esi:include src="/_fragment?........."/>

Before Varnish returns the request page it will search for such markup and replace it with HTML from the fragment url. Will that HTML be cached or not would depend on the caching strategy of that url. For a main menu with very long TTL it will probably be cached. Yeepee!

So with HTTP cache and ESI we replace legacy template block cache :)

Caution

In legacy there was a good practice not to overload the layout template with to many cache blocks. Searching for many cache files on the disk is not very performant. Similar caution is valid for sub requests but for different reasons. Each sub request adds overhead as it invokes the whole Symfony kernel from scratch, increases memory footprint and so on. In case of more than 5-10 (rule of thumb!) sub requests in one request it might be wise to split and nest the sub requests with different caching strategies to ensure that less sub requests are executed at the same time.

Sweet stuff for the end

So far we covered output caching with eZ new stack. There are other types of caches used. E.g. twig templates are being compiled like legacy templates and so on. Those are not very interesting caching level as its more or less the same as in legacy.

But there is one completely new cache level which was introduced last year. Its the SPI cache and it caches output from the database. Remember, we missed this in legacy :). Anyway, the mentioned cache is implemented using TedivmStashBundle, which means that is can be configured to be stored on local filesystem, but also on systems like Memcache. That should be easier to scale, right? :)

Bear in mind this was an overview post, there are many more details behind it to become the master with caches in new eZ stack. More details can be found here:

Symfony ezpublish

Comments

Another successful Summer Camp on PHP, Symfony, and eZ Publish organized by Netgen

September 30, 2014