Message brokering is an important part of most modern applications and picking the right software has consequences for the flexibility and features of your application. I will be breaking down the differences between Apache Kafka and RabbitMQ and when they should be used. Each of these technologies has their own strengths, intended use cases, and developer libraries which help give an idea of if it's right for your situation.
Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation and was initially released in 2011. It is built as a streaming platform with three key capabilities:
- Publish and subscribe to streams of records
- Store streams of records
- Process streams of records as they occur
Now that you know its capabilities here are two of the generally used classes of applications:
- Real-time streaming data pipelines that get data between applications
- Real-time streaming applications that react to the streams of data
A key selling point is it's highly valuable for enterprise infrastructures to process streaming data at large loads. It can also connect to external applications via Kafka Connect and provides Kafka Streams.
RabbitMQ is an open source message broker software that originally implemented the Advanced Message Queuing Protocol (AMQP) in 2007. Notably, since then it has gotten support for STOMP, MQTT. RabbitMQ is written in Erlang and has client libraries available for all major programming languages.
It is also the self-proclaimed "most popular open source message broker."
Performance is an important consideration when choosing a message broker. Kafka is built to shine in this aspect. It can perform 2 million writes per second, with only "three cheap machines," which is a crucial driver for choosing Kafka.
Alternatively, RabbitMQ can push 1 million messages per second, with about 30 machines. The RabbitMQ queue is backed by a single Erlang thread that gets scheduled on a pool of native OS threads, which means it becomes a bottleneck because of that fact a single queue cannot do more work than it can get CPU cycles.
All this being said, it is tricky to measure messages per second since there are so many potential variables which will change on an application to application basis. It can depend on lots, including your environment, the workload, and the configured delivery guarantees. However, in general, it does seem like Kafka will give reasonably substantial performance gains compared to RabbitMQ.
Persistence is a good option to have when you're handling messages. Kafka relies heavily on the filesystem for storing and caching messages. The developers of Kafka believe that the filesystem can be very fast if used correctly, so they leverage it for all persistence. Because of this, Kafka allows for the possibility of consumers to periodically consume data in batch into an offline system such as Hadoop or a data warehouse.
Unlike Kafka, RabbitMQ tries not to rely on the filesystem. RabbitMQ has persistent and transient messages which can be written to disk. Persistent messages are written to disk when reaching the queue, while transient messages will be written to disk only under memory pressure. Persistent messages are also kept in memory and just removed from memory under memory pressure.
The main difference between the two is Kafka's ability to replay messages, whereas once a message is read by all the consumers in RabbitMQ, it is deleted. In most situations that is not an issue but if you're using event sourcing, or are interested in saving the message queue, Kafka is the right choice for you.
Scalability is critical when picking a message broker. You want the ability to scale across multiple nodes horizontally. Both Kafka and RabbitMQ have this ability, although Kafka is more built for it whereas RabbitMQ is often run on only one node.
Kafka clusters typically consist of multiple brokers. The Kafka brokers are stateless, and they use ZooKeeper for maintaining their cluster state. A single Kafka broker can generally handle hundreds of thousands of reads and writes per second. Kafka using has a master node which is dealt with via leader election with ZooKeeper.
Unlike Kafka which has a leader and follower nodes, RabbitMQ generally consists of equal peers. This means there are no special (leader) nodes. This does becomes more complicated when queue mirroring and plugins are taken into consideration but in the vast majority of situations all cluster nodes are equal.
Client libraries are essential to a messaging system. A clear, easy to understand library can make a big difference when comparing options for a message broker. Both Kafka and RabbitMQ have many languages covered, and all of the big languages. Below is the rundown of the available languages as of May 2018.
- Groovy and Grails
Both of these messaging systems are good and mature, large companies are using both in large-scale production environments. RabbitMQ is easier to set up for small applications, but you cannot beat the speed, persistence, and ease of clustering that Kafka provides. I will be using Kafka for the foreseeable future because I believe it can outshine RabbitMQ in all the important categories described above.
Let me know what you think about Apache Kafka vs RabbitMQ, and why you choose one over the other.