In this article we will describe a modern setup for a backend build of micro-services, deployed in Docker containers on a Cloud infrastructure. The article will give an overview of the available components and technologies. It is not intended to recap or summarize existing documentation on this topic, but to give a comprehensive overview and to explain the pros and cons of different approaches and strategies which have proven successful for 3SS. We will explain and detail the setup based on our own product development activities in order to provide practical examples and experiences.
When setting out to design and build the architecture for the backend infrastructure and system for our product, we had to keep in mind a few facts about our potential customers:
Peaks in usage at prime-time
VOD and TV services usually see high usage-peaks at certain hours of the day and days in the week, while the usage drops in the intervening periods. With a classic setup, the infrastructure needs to be dimensioned to handle the big peaks and then be less used or unused for the remaining 80-90% of the time between the peaks.
Constantly growing user-base
New services typically start with lower subscriber numbers and constantly gain new users. Therefore, there is a need for an infrastructure which scales with growth, but which can be maintained and upgraded with minimal effort.
We decided to move away from a monolithic backend architecture to a solution which allows us to scale with our customers and does not restrict us in any way as regards to future technology choices. Moreover, we wanted to make sure that we can maintain, upgrade and develop individual services independently while limiting the potential impact on errors when making production updates.
Although right now all our services are written in node.js this does not mean that, in the future, we will not decide that Python, Go or any other language might be better suited for a certain task.
As the title of this article suggests, the infrastructure which we are describing has the following main components:
Cloud Hosting
We use a Cloud-based infrastructure to host our cluster and containers. The same setup can be replicated locally or on a dedicated hosting environment, although, of course, the advantage of scalability is missing in that case. Cloud hosting does not only enable the possibility to rent CPU power or virtual machines - AWS, Google Cloud etc. It also enables a large set of services and products which make deployment and hosting more convenient.
Docker Containers and Cluster
The container system we are going to explain and describe is Docker. The containers are organized in clusters using an orchestration tool to manage the nodes.
A simplified view of this would be:
We will provide further details.
We will not dive in this topic too deeply, as it is a huge topic on its own. Micro-Service-based architecture has emerged in recent years with the shift towards web-based applications and the wide use of third-party services. The main principle behind this is often summarized as “Do one thing and do it well” rather than attempt to be a master of all. This can be very well applied to API development which is a crucial part of web-based applications.
From the development perspective, one of the biggest benefits of Micro-Service-based architecture is that it enables the smallest possible teams (down to individual developers) to own, develop and maintain them. When delivering new builds, the increments and scope are small enough to allow timely review and assessment with minimal effort, allowing faster roll-outs with less risk.
Micro-Services are usually characterized as:
Stateless and stupid – They are built to serve very specific purposes. They are unaware of and neutral to any state or session. They should act as a straight-forward “input – process – output”.
For our own product development, backend-services perform a lot of communication and mediation between different other web-services and backends while holding only a very limited amount of their own inherent functionality. This also makes Micro-Service- based architecture a very good fit to achieve high performance.
A container image is an executable package which contains everything required to run an OS and applications. Containers isolate applications and OS in separate environments. Therefore, they can ensure that there are no conflicts between versions of packages and dependencies on the same host. Containers are potable and can be executed on any host which runs the corresponding engine.
Container Architecture
Virtual Machine Infrastructure
One of the most common misconceptions about Docker is that it is a fully virtualized system like Virtualbox or Vagrant. In fact, Docker (and other container solutions) use APIs on OS-level to share and consume resources with the host-system, but they do not replicate a complete machine.
A Docker-based infrastructure consists of four building blocks:
Docker (Server and Client)
The Dockerengine which executes the Docker containers is the “bridge” to the host system. The Docker-client is a command-line tool which communicates with the daemon and is used to execute the actual commands.
Image + Container
The images contain the OS, libraries and application. They are configured using a “Dockerfile” and are then executed by the Dockerengine.
Registry
The registry holds the Docker images in a central repository. There are public registries such as Dockerhub. For our deployment we decided to use our own registry with Nexus OSS.
Docker itself uses a Client-Server architecture. The host-system which should run the containers needs to have the Docker-server/daemon (Docker-Engine).
Using Docker containers rather than another approach (e.g. virtual machines, virtual hosts) brings us some big advantages:
1. Using containers, all environments from development to production are identical for the application. One of the biggest issues when developing backend applications is that, typically, the environments are not identical. When deploying builds dependency issues arise (“Oh, this is version 10.2.1.2 – on staging we had 10.2.1.1 and it worked fine”). Environments for development/staging/production are not the same. They are configured differently and a test done by QA on any of these does not guarantee there will not be an issue after deployment. In the past, replicating a full setup was a major challenge and took huge efforts. Using containers, we can now run exactly the same environment anywhere.
2. Compared to virtual machines, our hosts require less overhead and deliver better performance. At one point, we experimented with using Vagrant as a solution for the problem of different environments. While the issue could be solved with this, using a full virtual environment requires a lot more resources from the host to run them.
3. We do not deploy source-code or builds, but full environments. When using containers, deployment no longer requires updating the builds on the target environments. Instead, we deploy the environment as well. This reduces the problem of builds breaking because a step of the build-process fails or is not working as expected. Since we deploy a container which is provisioned to be ready-to-run, we know that what we put on the target environment is actually running and contains a proper, functioning build. This is part of our CI-pipeline.
4. We can easily maintain containers, also in a collaborative way. Docker containers have version management with “layers” which is comparable to classic source control management. When making changes to the configuration or to the container itself, only the difference between the last version and the new version is stored. This makes updates on Docker containers to be fast and reduces the amount of data that needs to be transferred.
Docker containers are created based on images which can be either just an operating-system and the basic packages or a complete application-stack which can be launched alongside the containers. Each action taken on the Docker container adds a layer on top of the previous one. These steps are defined in a ”Dockerfile”:
Dockerfile example:
#
# This is a simple Dockerfile
#
FROM ubuntu:latest
MAINTAINER John doe "john.doe@3ss.tv"
RUN apt-get update
RUN apt-get install -y python python-pip wget
RUN pip install Flask
ADD hello.py /home/hello.py
WORKDIR /home
This Dockerfile takes the latest ubuntu image available (at dockerhub) and runs 3 commands to install Python and update the packages. Afterwards, it creates a file and sets the working directory. Similar to the example above, any commands and actions can basically be executed through a Dockerfile. This is useful in order to create, for example, images based on a pre-configuration and then extend them by additional steps which need to be adapted or extended over time, e.g. the installation of additional packages or further steps to configure and run a service.
5. We can scale our environments without any impact on the application itself
Using a cluster and orchestration we can run as many instances as we need from any container based on very simple rules for load-balancing. Since we use individual containers for each service, we can scale up only one special entry-point which sees a lot of usage.
Cloud-based architecture is clearly a big topic on its own. Using containers makes it easier in some ways, because in the easiest setup only the actual computing power of the Cloud is required by allocating hosts to run the Docker-engines. When moving from our experimental setup to a more stable and production-ready setup, we can still encounter some pitfalls that need to be considered when choosing the provider and details of the setup:
Pricing - Understanding the pricing models and their impact on how you build and deploy the system is not as easy as it seems. Besides the cost-calculation based on resources (CPU, RAM etc.) important factors are additional services or tools from the cloud-provider which are used for load-balancing, managing etc.
Since using such an infrastructure means that you will have a lot of running “machines”, it is crucial to be able to monitor the performance and health of your instances and the applications.
Finding a good setup here was especially challenging because of the amount and variety of data and data-sources. Gathering the data itself is complex, since having a large amount of individual containers multiplies the amount of sys-logs, service-logs and application logs that need to be collected and structured.
We decided to use the ELK stack to achieve this. ELK is the acronym for Elastic Logstash and Kibana, a set of 3 tools used in combination for retrieving (logstash), storing and indexing (Elastic) and displaying (Kibana) the data. To read and deliver the log-files from the machines, we use Filebeat, which is able to read logs from stdin and files. It keeps track of the states of each log and makes sure that every log-entry is delivered successfully at least once.
In order to collect and structure the logs we applied the following:
Each of our services writes out logs in JSON-format which contains attributes based on the environmental parameters of the containers and which identifies the service by a service_name. The pods write these logs into log-files which are stored in different folders based on the environment/customer/service. In every service-host, there are deamonsets (One pod running in every host which has the same label as node selector of the pod) of Filebeat running. Filebeat collects the logs and sends them to Logstash. Logstash performs filtering based on document type (which is added by filebeat for every kind of service log) and saves them into service-logs indexes of Elastic. Based on the service-names and service-types, we are able to filter and search the logs to see logs for individual services and/or environments etc.
As we use nginx as entry-point to all our services, we also collect nginx logs with request time, upstream time and the xforwarded for entries into logs. We enrich them with geolocation information. This way we not only understand service-health but also performance-monitoring of our APIs and can correlate the information.
Even though Docker containers are not full “virtual machines”, they still act as completely independent entities which require an OS and a full configuration. For example, to run a simple container with hosts and an nginx webserver, one needs to provide an OS which runs nginx and configure it according to their needs.
Typical use-cases for Docker containers involve running web-services, in most cases isolated Micro-Services. This, and the fact that Docker containers are, as described, not complete virtual machines, allows the use of very slim operating systems. This is especially important since one of the factors for performance and cost management is the size of Docker images. The smaller the image, the less space it needs to be allocated and the faster the container will be started. Sticking with the example of an nginx-webserver, there is no need to setup a full Debian OS with all libraries and dependencies. This is why container-based architecture has led to the increasing popularity of minimal Linux distributions similar to busybox. These distributions are designed to be very lightweight, to contain only the absolute minimum of required packages to run the required applications. Distributions such as CoreOS or RancherOS are specifically designed to run containers (or more specific docker-containers) and have corresponding configurations and modifications.
Usually a single Docker container will not be the setup that you are looking for. The fun with Docker containers starts when you are able to spawn and remove Docker containers on-the-fly just as your needs for resources change – best case without any manual effort. For this, an orchestration tool is needed. This is already the feature which the Docker-engine provides out of the box as you can start, stop and install Docker containers with simple command line commands. However, those capabilities are too limited for an automated setup, for which you will need another engine which monitors resource-consumption, the usage and health of your containers and which is capable of managing both the instances and the traffic.
There are different solutions available which cover different needs:
This is a native clustering system by Docker. It is part of Docker and it is available by default. It uses the standard Docker APIs. Naturally, it is the closest to the way Docker works and, therefore, the easiest to understand.
Kubernetes is an orchestration and clustering system maintained by Google which has some deviations from the “native Docker” operation. Yet, Kubernetes is the most popular system when it comes to larger scaled production environments which need to be capable of adapting to different resource-requirements based on load and traffic with high reliability.
Kubernetes and Google Cloud make a good combination, with a few notable benefits for our infrastructure:
Migrating Kubernetes clusters to a new version works automatically, by a simple click.
The master of all Kubernetes clusters is maintained by the Google infrastructure, so there is no need to allocate a server for this.
Fast support and bug fixing are ensured. New bugs are usually fixed within a few days.
The biggest advantage of using Kubernetes is the script-based automatic deployment of Micro-Services.
The setup of the load-balancer does not require too much knowledge or experience, With one command, Google Cloud reserves the static IP and sets up the load-balancer configuration. Setup for filtering and firewall is very straightforward.
With just a few commands to select the machine-types and amount of nodes for a group, a complete cluster can be set up, including dashboard and monitoring through Graphana and Heapster.
Upgrading of services, or even changing properties such as the ENV parameters of a deployment is easy and can be achieved through Bash scripts automatically, which greatly helps in maintenance of the services.
Using kubeadm there is a possibility to set up a cluster on Linux servers. The latest version seems to be stable and makes it easy to scale the cluster adding a new host, label as service. When doing so the pods are migrated and scaled automatically to the new host.
During the upgrade to the new version of Kubernetes, we faced some issues and crashes so we would still not recommend the kubeadm based setup for Production.
Rancher is appealing, as it is a very user-friendly and extensive UI which makes management of containers and cluster easy. As any user-friendly UI, it comes with the price of “hiding” the actual Docker internals, making it very hard to track down and solve issues.
Rancher is very useful for development. It is easy to set up. It is easy to deploy services and configure them using the Rancher UI which we always recommend for the fast development of small projects running a smaller number of containers. It is also very sensible to use the whole Micro-Service-based application set-up, from the development and configuration point of view. Any devops skilled developer can do this, and the final production system can be based on Kubernetes or any other setup.
In production environments, Rancher should be used with care. We found that, sometimes, services and pods stopped responding, did not get IP-addresses and similar issues occurred. These problems can be easily fixed for daily environments and then applied to production, but they still require time and attention.
Conclusion: For a production setup which meets the expectations of scalability, automation and reliability, at least in the non-enterprise, open-source area, Kubernetes, is the best solution even though it has the steepest learning curve.
As our infrastructure is based on Google Cloud and Kubernetes, we decided to also use the load-balancing solution provided by Google Cloud.
New services (in our case nginx) are published using the “Ingress deployment” which uses the backend service name, Secret (created before using ssl key and certificate) and the public port. At deployment of this ingress, the LB of GC is automatically created, a static IP is allocated, the forward rule is applied and the backend servers are added.
All the services communicate through port 80, SSL-termination is handled by Google-Cloud load-balancers.
At Google Cloud, auto scale can be activated for node groups. When the load requires it, Google Cloud adds another host to the cluster, distributes the pods to that server and automatically configures the load-balancer to point to this server as well.
From the development point of view, one of the biggest advantages of containerization is the fact that development, building, testing and deployment can be separated very clearly through very well integrated processes. Moreover, after the initial set-up, dev-ops skilled developers can manage all of these parts themselves, and this can often even be automated. Using containers also enables a much cleaner handling of inter-dependencies and hence, opens up many more possibilities in the technology stack, including the possibility to move towards more component-based, modular development. The fact that building a Docker image is merely running a set of commands is already a huge step towards the automatization of checking, building, testing and deploying the Micro-Services, which saves a lot of effort and pain when working in medium/large projects. An important note here is that we must not forget that modularization and CI-based development adds a lot of project, task and documentation management efforts and work. Therefore, the decision to go with this approach in a project needs to be weighed thoroughly.
As described earlier in this article, the basic Docker infrastructure consists of the Docker engine, container, image and repository. When integrating a CI-pipeline for Docker, the main task after a successful build is to publish images to the repository from where they can be pulled by the target environments and executed. In a nutshell, this is also the CI-pipeline, although there are more steps required for a fully automated setup:
Building Docker-images out of gitlab using the “docker build” command creates a new image based on the base-image and the Dockerfile which can then be deployed.
Pushing Docker-images to repository using the “docker push” command.
Getting the right setup and architecture up and running is not as simple as it seems at first sight. When we started with our development, we also had to face the issue that a lot of the software we used was still at the very beginning of its development. Often, new releases introduced damaging changes while other things simply did not yet work as expected. This situation has improved, but challenges remain.
Moreover, building the CI-pipeline proved to be quite complicated. Building containers often failed and finding the reasons why was a lot of trial and error.
The last step, setting up the orchestration, required a lot of testing and research, as there were not many reference implementations or adequate documentation on how to do it correctly.
Once these challenges are overcome, the benefits are huge:
Deployment is easy and fast; it works automatically from your CI process and you can be sure that what you deploy is actually working.
Onboarding new developers, starting new services, testing and building have all become far more efficient and easy.
As with any solution, there are topics that need close attention. A few in this case would be:
Generating the release notes for a service
Automating the testing of the generated build
Scalable and easy centrally manageable configuration of the services
The container landscape is developing very rapidly right now. New container-optimized OS, orchestration and management tools are being developed and published and some of our approaches might need to be revised to keep up with developments.
by Andrei Oneț
by Raul Boldea
by Paul Bodean , Eugen Meltis
by Dan Sabadis