TSM - High-performance Messaging Systems - Apache Kafka

Tiberiu Nagy - Senior developer

With the transition to event-based architectures, messaging systems have become the central component of the enterprise architecture. But as enterprises process more and more data, the performance of the messaging system becomes even more important, requiring fast, scalable solutions. Apache Kafka is a relatively new player in the messaging systems field, but it has already proven itself as one of the best-performing messaging solutions out there. It has been benchmarked to handle up to a 1 million messages per second on a 3 node cluster made of commodity hardware.

Kafka was created at LinkedIn, during a period when LinkedIn was transitioning from a large monolithic database to a set of specialised distributed systems, each with its own data store. One of the challenges they faced was shipping access logs from their front-end web servers to their analytics service. They needed a system which could deliver very large data volumes to multiple destinations. None of the existing messaging solutions met their performance requirements, so they started designing their own solution under the name Kafka. The project was later open sourced and donated to the Apache Software Foundation. It has since been successfully used at various companies that needed high-throughput messaging.

Speed vs Features

The main design goal behind Kafka was to make it as fast as possible. In order to achieve high throughput, Kafka takes a novel approach to messaging, sacrificing some of the traditional messaging features in the interest of speed.

One of the most important simplifications is in the way messages are retained: producers publish messages to the cluster which then become available for consumers to process, but consumers do not need to acknowledge the processed messages. Instead, Kafka retains every message for a fixed amount of time. Consumers are free to process any message that is still available in the cluster. While this might seem sub-optimal, it brings a number of advantages:

Because messages do not need to be selectively retained, Kafka can use a very simple and efficient storage model: the commit log. The commit log is an immutable sequence of messages that is continually appended to, but never modified. Because writes always happen at the end of a log, this structure is well suited to be used with conventional hard disks -- the disk would be written to sequentially as new messages arrive, avoiding costly seeks. If the consumers can keep up with the producers, Kafka can even serve messages out of the operating system's page cache, bypassing the disk completely.

Another important architectural simplification is in terms of messaging patterns. Traditionally, messaging systems offered two messaging patterns: queue and publish-subscribe. In the queue pattern, a set of consumers may read from a server, but only one consumer may receive a particular message. In the publish-subscribe pattern, each message is dispatched to each consumer. Kafka provides a single abstraction over the two modes: the consumer group. Each consumer must be part of a group(even if it is the only member of the group). Within a group, only one consumer can receive a message, however, a message is dispatched to all consumer groups. This simplification allows Kafka to use a single grouping abstraction for messages: the topic. The Kafka messaging model can then be perceived as a publish-subscribe model, in which consumer groups and not individual consumers subscribe to a topic. If all of the consumers belong to the same group, the topic acts like a queue in the traditional sense---only one consumer will receive the message. If, on the other hand, each consumer has its own group, the topic acts as a traditional publish-subscribe mechanism---every consumer receives every message. In practice, the number of consumer groups will be small, each group usually corresponding to a service wishing to consume the messages from Kafka. Withing each group, there will multiple consumers, usually one for each host running the service. Since all consumer groups receive every message, each service will receive the full message stream, however message processing will be load balanced between the hosts running the service.

Message consumption within a group of 3 producers and 2 groups of consumers. m1-m5 are messages send by the producers

 

Architectural overview

From a bird's eye view, a Kafka installation consists of a set of Kafka broker nodes and a Zookeeper cluster. Kafka uses Zookeeper for coordination and cluster management, while the brokers receive, store and serve messages.

The data stored for a topic might exceed the capacity of a single broker, so Kafka further subdivides topics into partitions. Each partition is a commit log, and must fully reside on a broker. However, partitions belonging to a topic are equally distributed among the brokers, so that each broker stores an approximately equal number of partitions.

 

Distribution of 4 partitions across 2 brokers

When a producer wishes to publish a message to a topic, it requests the topology of the cluster from a broker, then determines which partition to publish to (based on the contents of the message, randomly, or in a round-robin fashion), and sends the message to the broker on which the partition resides. Things are, however, a bit more complicated on the consumer side. The partitions of a topic are equally distributed between the consumers belonging to a group. Each consumer will process messages from the partitions it is assigned to. This guarantees that each message will be received by a single consumer within the group. The offset of the last consumed message for each partitions is retained in Zookeeper, so that if a consumer goes away, its partitions can be reassigned to other consumers, which would then start processing from where the leaving consumer left off.

Partitioning nicely distributes the load across brokers, and thus increases throughput. But what if one of the brokers fails or becomes inaccessible? The partitions on that broker would then be unavailable, or---depending on the type of failure---lost forever. To protect against this, Kafka introduces the concept of replica partitions. Each partition (from now on, called leader) will have a number of replica partitions. The replica partitions are always stored on a broker different from the one hosting the leader partition. A producer can never publish to a replica partition. Instead the broker holding the leader partition automatically publishes any messages it receives to the leader's replica partitions. This way, the replica partitions always contain the same data as the leader partition. If the broker holding the leader partition fails, a replica partition is automatically promoted to leader, so the cluster continues to operate normally. When the failed broker is brought up, it re-syncs its partitions from the current leader, after which an election is held and partition leadership is re-assigned among brokers.

Conclusions

Kafka can be a good solution for applications that require a high throughput, low latency messaging solution. It's speed, simple design and flexible messaging semantics make it an ideal fit for use cases such as log and metrics aggregation, stream processing, event sourcing.

Like any technology, Kafka has its set of limitations. The impossibility to re-consume individual messages can be a major drawback for certain type of applications. Another limitation is in terms of tooling and support. Kafka has only been around for a few years, so it does not have as rich of an ecosystem as other messaging solutions such as ActiveMQ or RabbitMQ. Kafka's reliance on Zookeeper can also be financial or administrative disadvantage, as it increases the number of machines that must be provisioned and maintained. Hopefully, some of these limitation might go away as the technology matures and gains wider adoption.