Having an AI solution, beyond any software and techniques, is about time spent in obtaining meaningful results, and time translates into hardware requirements and management. These requirements may vary depending on the task at hand. In some cases, a model can be trained on a single CPU/GPU machine, while in other cases you will need a highly scalable infrastructure to cope with the load that can be either influenced by the amount of data, or the techniques used to train the model. Having an infrastructure that can quickly adapt to the load is essential in getting the results, and this is precisely what Kuberneetes is all about.
To explain what Kubernetes is and why it is relevant for AI, let's look at what building an AI solution requires:
training the model: it may take many hours (even days) and it may require specialized hardware
managing the models: usually more than one model needs to be trained, tested and deployed. Deploying a new model must be made in a non-disruptive way and you must be able to test, monitor and possibly revert your changes easily.
using the model: predictions need to be made quickly, using a multitude of models
availability and scalability: a good model is useless if it's not easily accessible by users and applications
efficiency and cost: you need to balance all the performance requirements with the budget constraints
There are many things to talk about Kubernetes on what it is or what it is not. In this article I'll walk you through general aspects of Kubernetes and how it is used to build fast, scalable and reliable solutions. This is by no means meant to be a Kubernetes tutorial but rather to give you a high-level view on why it is a platform suited for AI.
All things have a start and this start is the need for containerization. What is containerization and why do we need it? Well, those who remember the "good" old days of building and deploying applications will also remember the pains of managing and scaling these applications. One fundamental problem was that the processes that we deployed had access to all system resources with few mechanisms to constrain them. By resources I mean things like CPU, RAM, Disk, Network and so on. Why does this matter? Well, it turns out that this is critical because you want to use data centers in the most efficient and profitable way. Even though data centers use powerful machines (server-class machines) with lots of CPUs, RAM, DISK, GPUs, TPUs, FPGA, these resources are not infinite nor free.
What about using commodity hardware? These are less capable machines, but they are considerably cheaper and you could use lots of them. Well, this is not very practical because it leads to physical space problems and power consumption, which leads to increased costs and larger carbon footprints. This solution simply does not scale and practice has shown it.
Returning to the need of containers and managing resources, the very notion of public clouds wouldn't have existed today without solutions for resource management and security constraints.
We also cannot talk about containerization without talking about virtual machines (VM). VMs have been around for a long, long time but containers are slightly newer. VMs allow for partitioning the physical machine resources in a way that allows different operating systems (OS) on the same physical machine (aka bare-metal).
Fig 1 : VM vs Container
As shown in the picture above, each VM comes with its own OS. You can have multiple VMs with the same or different OS on the same physical machine. On the other hand, we see that containers share the same OS. Yes, a single OS manages multiple containers, so in this case you are bound to the same OS. Linux is most common in the container world, but Windows also supports containers.
Much like a VM, each container is bound to the system resource allocated, but unlike VMs they are cheaper to manage and significantly faster to start. This is why Containers quickly became so popular. In practice, VMs still play a crucial role for cluster level separation, as one can create clusters using different OS distribution and version. This means that consumers are still insulated from the bare-metal machines.
But what are containers exactly? We've still not answered this question. Without getting into details, Linux containers are based on CGroups, which is a Linux kernel feature that allows limiting system resources that processes can "see". However, almost nobody uses cgroups directly, as there are better technologies that allow us to use this capability, like Docker. Furthermore, there is the Open Container Initiative which is a specification for containerization which, in turn, makes Doker just an implementation. In reality, all cloud providers predominantly support and use Docker. You may have also seen articles that Kubernetes is deprecating the support for Docker. This is because, thanks to OCI (Open Container Initiative), Docker (built on top of containers) can be easily replaced by other container runtimes such CRI-O. The Docker files that everyone is used to are part of OCI, so there is no impact on existing applications. While we are on this topic, Kata Containers are very interesting to look at.
So, how does Kubernetes relate to all this? Isn't Docker enough if we want to deploy containers and manage resources? The short answer is no. Many service architectures rely on micro-services concepts or the solution is composed of various types of components. These components need to communicate with each other, to manage failover, roll updates, control security etc. Managing all these aspects at container level is difficult. Kubernetes allows us to define higher level constructs to better manage deploying, scaling and handling failures. Concepts such as PODs, Services, Jobs, Deployments and more allow us to better modularize and package the software components and the workloads.
There are plenty of materials and online tutorials about Kubernetes (k8s), so describing all these here may seem redundant. However, a brief overview cannot hurt anyone: :)
Provide a way to define one or more PODs that perform a certain task and the cluster resources are released. Examples ETL jobs, machine learning training jobs etc
Yes, Kubernetes allows you to define your own objects/resources (via yaml file) by first defining a CustomResourceDefinition (CRD) for your resource and then be able to create multiple CR resources. CRDs are to CRs what types are to values in programming languages :) These concepts represents the very foundation of Kube Operator pattern - which I'd urge you to look into if you want to dive in Kubernetes
There are many other types of Kubernetes resources like ReplicaSets, StatefulSets, DaemonSets, Secrets, ConfigMaps, Volumes, PersistentVolumeClaims etc that are beyond the scope of this article.
How do you create and then manage all these things? There are some simple ways:
Use the kubectl tool - a command line tool that you can use to communicate with your Kubernetes cluster. For instance, to create a POD you need:
A .yaml file that contains the POD specification
Example: kubectl apply -f ./my-pod.yaml
Use a programmatic way to communicate with the Kubernetes cluster. This can be done in various ways:
Use the Kube REST APIs
Now, we are better prepared to answer how this plays a role for AI. If we look at machine learning and AI, no matter what algorithm and techniques you are using, you still need to make them operational. This means a few things:
You need computing resources to perform expensive ML/DL training, to perform hyper parameters optimization and model selection etc. These are extremely compute intensive tasks that require multiple CPUs, lots of RAM, and very often GPU or TPU resources. With a high demand for resources, you need an efficient way to schedule these workloads and run them in a controlled and secure way.
In short, to make AI operational you need a good way to allocate hardware and software resources, as well as manage security in a coherent and flexible manner.
Almost all big cloud providers (Google, Amazon, IBM, Azure) provide this function. The idea is simple: deploy a model and use an API (REST/GRPC) to make predictions in near real time. A simplistic architectural view of such service can be:
Fig 2
We highlight 2 cases here:
Multiple models co-located in the same Container so multiple models are kept and managed in memory
The proxy has the role of routing requests to the correct runtime based on the metadata kept (in many cases in ECTD database). It can also perform tasks like authentication (like OpenId Connect- token validation) and/or authorization.
We can also observe that we have multiple runtimes materialized as multiple PODs. This means that we can easily use Python-based, native (C/C++ Rust), JVM-based, R etc runtimes. This is actually important because the landscape of ML frameworks is quite vast. Various programming languages are used and these require different dependencies.
Things seem simpler in this case because the idea is to:
Read data from a database
Predict each records
Once the job is finalized, the cluster resources are released.
The flow here is similar with batch:
Listen for incoming events from a stream of data (i.e. Kafka, AMQP)
Predict each record
Unlike a batch job, the process does not end by itself. It needs to be managed, as this is a continuous process. Therefore, in a way this is also similar to the "Models as service" case.
AI is much more than model training and model inference/scoring. AI applications use various machine learning techniques for solving problems. The users of those applications don't see and don't usually care what is behind the scenes, they only care about the decisions made by the application, decisions that look incredibly similar to those a human would take. The Kubernetes platform is becoming increasingly important for public / private clouds or enterprises from CI/CD perspective, scaling, monitoring, and operational costs - essentially making solutions available to consumers faster and more reliably.