Dealing with the complexity of eZ Platform architecture
When we started developing web solutions with eZ Publish in 2004, the architecture was simple. We had a Linux operating system, an Apache web server, a MySQL database, and PHP runtime as an Apache module. You know this architecture by its LAMP acronym.
Over the years, things got a bit more complicated due to performance reasons and because some other tools were suitable for some specific features. The situation exploded with the new stack and eZ Platform. We are now on LNMpPf7VHMS as our default architecture with the possibility of adding a few more letters in the near future.
Think of it like this: you started to build a simple church, then turned it into a more complicated building like the Sagrada Família in Barcelona. :)
We are still running on Linux but the architecture got more complicated due to various emerging virtualization and container technologies like VirtualBox, Vagrant, Docker, etc. It’s no longer just “command shell” and “vi editor” – it’s more of Ansible, Fabric, Kubernetes, etc.
By the way, it is also possible to base the stack on Windows, but we have was always found it too awkward to even considered it.
It’s becoming more and more Nginx than Apache. Nginx proved to be better for performance reasons. The letter here could still be "A" for some people.
It’s still Mysql in some variation, like Maria DB or Percona. In our case, it’s "p" for Percona. For other cases, it could be "P" for Postgres, et al.
It’s still PHP, but with an FPM flavor, hence the "f". And, of course, "7" means we are using the version 7 by default. The language and the runtime transformed immensely in the last 12 years. Although it might not be the most popular language out there, it still has the biggest footprint in the web sphere.
In the old days, the application, in our case eZ, was handling its own caching. In some use cases, you could put a reverse proxy in front to cache the output of the web server, but this layer was optional and not integrated with eZ. It would basically just cache whole pages for X minutes and that is it.
With eZ Platform and Symfony, the whole front caching is now handled by HTTP Cache which means you have to have a reverse proxy in production – in our case, Varnish. It is very tightly integrated with eZ via FOSCacheBundle and it’s now a standard part of the architecture.
We also integrated Netgen Layouts with Varnish in the 0.8 version to make the cache management even easier.
Varnish comes with some issues which are not easily solvable without yet another layer in front. For example, Varnish doesn't have SSL termination possibilities by default – there is an open source project Hitch for that.
We use HAProxy in front by default, not only to handle the SSL termination but also to help with security and redirection management and for better load balancing. Some people would use yet another Nginx instance or rely on CDN infrastructure for this task.
For any serious applications, you need to have an object caching so your database doesn't get hit with too many requests. For smaller sites, the filesystem-based cache will do. However, in most of our use cases, it's just not good enough, especially performance-wise when invalidating caches. Therefore, we use Memcached for object cache. For some people, it could be "R" for Redis, etc.
Many years ago, we had difficulties with eZ projects that had a lot of content because the database queries were too complex for the database to handle. It was inherently a problem with the database model of eZ which implemented a meta content model (EAV) on an SQL engine. That concept is better suited for NoSQL kind of storage engines but back then NoSQL was still not really a thing yet.
So, in order to make those complex queries faster, especially if it involved full-text search, we used Solr as a better tool for the job. Content from a database got indexed in Solr and delivered much faster. With eZ Platform, Solr is fully integrated into the core, so the public API used by developers to fetch content can be transparently configured to use Solr. This brings some other problems which we are trying to solve in the Site API, but the performance gains are potentially huge.
For some people, the letter here could be "E" for Elasticsearch, to conclude the dominance of Lucene-based search engines.
More parts mean more integration issues
From 4 big parts of the architecture twelve years ago we got to at least 8 nowadays, with a tendency to grow. For example, we could add to the list a queue/message system like RMQ, monitoring systems, file storage systems, image storage systems, CDNs. With each new system the higher the chance of an integration issue. The new developer norm is to be aware of all these parts during development and maintenance. For instance, caching in Varnish is now an integral part of the project, not simply a layer that sysadmins are managing.
Handling all these parts is not easy
Orchestrating this architecture for all phases, from development to production, requires utilization and know-how of even more tools. And it's not just about the ability to set up the architecture, the process of setting up the system needs to be simple and fast. Developers need to be able to quickly instantiate another project on their machine to help their colleague. When a new feature is implemented, a new UAT instance needs to be created. Also, new nodes in cluster environments need to be created quickly, etc.
To be able to do this, you need to have orchestration recipes and provisioning tools and maintain them so they are compatible with the application you develop.
Scaling over multiple projects, clients, datacenter is even harder
Doing all this for one project is a lot of work, but unifying it across all (or most) of your projects and clients is nearly impossible. Sometimes we manage the complete physical server, sometimes we don't have any access to production cluster and it takes several days to deploy something, sometimes the project is super small and there is no budget for and advanced developer operations or maintenance, sometimes we have continuous integration running nightly. With these huge differences, having the same architecture and same procedures just doesn't make sense.
Variety of languages isn't helping either
How we cope with the issues
Here are a few things I think are important from our experience:
1. Be aware of the complexity of the architecture
Everyone, from developers to project managers and even clients, needs to understand it. A developer needs to make sure that the application operates well and that includes all the parts I mentioned earlier. For example, taking care of cache strategies in the Response objects and making sure that only scalar parameters are used in ESI requests. Others involved need to understand the development process, from writing code on the developer’s local machine to deploying that code to production.
2. Empower people from the team to start improving the process, step by step, no matter how small the steps are.
There are surely some team members who are more eager to help improve the process – find them and support them. If there is no dedicated DevOps engineer, declare a DevOps champion from an existing senior engineer.
3. Take control of the code.
If you are still not using some sort of version control, you will have a hard time doing any DevOps. In web development, Git is the king of source control, although there are a few competitors you might consider. One of the most important things Git offers is a simple and fast branching. You don't have to use more complex workflows like Gitflow. In my opinion, already doing feature branches for bigger tasks is very valuable.
4. Standardize the system level.
Even if there is a chance that technologies like Docker would one day make the underlying OS choice irrelevant, I still think it is good to standardize it to some degree. Our current development environments are either MacOS with Homebrew or Ubuntu/Debian. On production, it is mostly Debian by default or Ubuntu. Using Windows would give us a headache, it’s not worth the effort. On some projects, we use Vagrant for VirtualBox provisioning. It’s valuable in cases where the architecture of the project application is more complex and it's just too much work to have everyone set it up on their local machines.
It's also useful to keep the Vagrant configuration on older projects for maintenance reasons so that the development environment is similar to prod environments. A typical case is when you have older PHP versions on production, like 5.3, which nobody has on the local machine anymore. For simpler projects, it is optimal to work on a local host (mostly because of better performance) and multi PHP setup helps with that.
5. Standardize your service level.
This is related to the long acronym from the beginning of this blog post. It helps if you use the same services for all projects. However, sometimes that is just not possible because some clients will enforce exceptions. But, at least we can try to maintain the default setup. If you standardize your default setup, you will get fewer integrations problems over time. Most importantly, you will be able to provide a provisioning script which will automate some steps. We use Ansible to provide all needed services, but there are other tools that could be used as well.
6. Automate provisioning
Now we are getting to more interesting efforts. If you are done with the first 4 steps, you can start thinking about automating regular developer operations. It helps in the following situations:
- creating new staging/testing/prod instances. You will save time that you would otherwise lose in setting up a bunch of configurations
- setting up developer environment. This tends to happen often if developers have more projects and if they are helping each other regularly
- maintaining an older project on a non-regular basis so you don't keep the instance alive all the time
- creating a UAT instance for a feature branch
As you can see, the chances are that you need to provision more often than you think. That is why the process needs to be simple and fast.
This is achievable with the above-mentioned tools, like Vagrant and Ansible, but there are also other options available, like Docker. Cloud platforms that simplify this even further have started to emerge recently. Solutions like eZ Platoform Cloud (which is based on Platform.sh), ContinuousPipe, ContinuousPHP provide the managed infrastructure and promise to make the process painless. We are in the process of investigating if and which solutions we could use.
7. Take control of the data
While the code is being managed by Git, the data being created by the user(s) is not. In our case, data is the database and the file storage. In some other cases, it’s also the Solr index (when the full reindex takes too long and simply copying the index data is a better solution). This means that, after the application goes live, there needs to be a way (ad hoc or regular) to get the data from the production back to developer instances. Rsync and mysqldump are sometimes enough but less adequate for projects with a huge amount of data.
Some tools, like Platform.sh, could help in such situation. Platform.sh provides fast cloning of the production data and makes it available for other instances. If you don't use such system, you still need to take care of this in some way, either with a defined manual process or in an automatic way.
It might also help to use additional systems to handle the data and do the heavy lifting. For example, Cloudinary is used for media files management and delivery and Netgen integrates it with eZ Platform via our Remote Media Bundle. If all images are on Cloudinary, you don't need to manage them locally which really helps if there are hundreds of GB of images.
8. Take control of the configuration
The environment-specific configuration gives a special type of a headache. You know, connection to databases, secrets, passwords, etc. – all the things you don't want stored in Git, but rather set during the initial deployment based on the environment. It is good practice to keep these configurations to a minimum. This is usually handled by parameters.yml file which is generated from parameters.yml.dist by Composer.
In Symfony 3 (and eZ Platform v2), the new standard is to get the environment specific configuration from the environment itself via ENV variables. Of course, it will still fall back to parameters.yml or to a dotenv configuration for development environments.
9. Take control of application logs
Once a solution goes to testing, staging, and production instances, you are less aware of the problems. That's why errors need to be logged for later examination. The trouble is that going through logs is not easy, you might also lose older logs if you rotate files. Sometimes you don't have access to production instance and you somehow need to arrange the transfer of the logs to you. All in all, it's a pain.
Recently, we came across a really nice tool called Sentry which gathers production logs into one interface. Great thing is that it gathers much more data than what you have in a single line of log. It is much easier to triage the logs, figure out the problem, and focus on solving it. There is a Symfony bundle for Sentry that is very simple to install. Sentry also supports many other languages and frameworks, including front side technologies, like React.
Finally, you need to monitor all of this so you know what is happening on all levels at any given time. This includes system level monitoring tools for system resources and also monitoring the application with tools like New Relic and Tideways.
There are many ways to deal with the complexity of eZ Platform architecture. Tools like Ansible, Docker, Kubernetes, Platform.sh, etc. emerged to solve some of the problems. We need to choose which one is the most appropriate for our use case before implementing it, slowly, in our daily routine. Tough job.
What is your experience in handling architecture complexity?