Brokers

4 min

Tim Berglund

VP Developer Relations

Gilles Philippart

Software Practice Lead

Kafka Brokers: The Backbone of a Distributed Cluster

In Kafka, brokers are the servers that store data and handle all data streaming requests. When you run Kafka, each instance of the Kafka server process is a broker. These brokers can be deployed on physical machines, cloud instances, or even Raspberry Pis. Each broker stores partitions of topics, allowing Kafka to distribute storage and processing across multiple servers for scalability and reliability.

A group of brokers forms a Kafka cluster. In this cluster, each broker is responsible for handling read and write requests from clients. When data is written to a topic, it's actually written to a specific partition on one of the brokers. Similarly, when consumers read from a topic, they pull data directly from the partition on the broker where it's stored. This distribution across multiple brokers is what enables Kafka to scale massively while maintaining high throughput.

Historically, Kafka used Apache ZooKeeper for managing metadata and coordinating the brokers. However, starting with Kafka 4.0, this changed. ZooKeeper was replaced by KRaft, a built-in metadata management system based on the Raft consensus protocol. This means brokers now handle their own metadata synchronization, simplifying the architecture and improving efficiency.

If you're using a cloud-native service like Confluent Cloud, the details of brokers are mostly abstracted away—you focus on topics and streams, while the service manages the brokers for you. But when running Kafka on your own, understanding brokers and how they manage partitions is crucial for optimizing performance and ensuring reliability.

In summary, brokers are the backbone of Kafka's distributed storage and processing, managing partitions, handling client requests, and now, as of Kafka 4.0, coordinating metadata directly through KRaft.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes KAFKA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud storage and skip credit card entry.

Get Started

Brokers

This is the introduction to Apache Kafka on brokers. Now, we've talked about events, topics, and partitions, but I have not formally introduced you to the actual computers that are doing the work here. From a physical infrastructure standpoint, Kafka is composed of a network of machines called brokers, each one of them running the Kafka server process. If you go and download an Apache Kafka tarball and uncompress it, you'll have a script that will spin up this JVM process that will be Kafka. Or you can do that in Docker. You can get a standard Docker image, set that up in Docker Compose, have a group of them that can all see each other, and those are Kafka brokers.

So they can be physical servers that you can see with blinky lights in your infrastructure. They can be instances in the cloud. They can be Raspberry Pi compute modules. You have options here. What matters is that historically, brokers have access to local storage. That's tightly coupled storage right next to the processor, usually SSDs. That's typical. Important to note here that I am talking about Apache Kafka. That's what this course is about. Even if you never intend to operate Kafka on your own, you're just gonna go straight to a fully managed service like Confluent Cloud, it's still important that you know the basics, the kind of abstractions and tools and terms and things that Kafka is based on.

If you do run Kafka, say, in Confluent Cloud, you are almost certainly never going to think about anything like a broker. In a modern cloud-native serverless implementation like Confluent Cloud, brokers are kind of abstracted away. It's stuff happening behind the curtains, and you're just thinking about topics and messages and connectors and all the stuff that matters to you. So brokers matter when you're getting started, when you're running locally, if you're running Apache Kafka by yourself, brokers matter just so you know how to think about this thing.

Collectively, these brokers, we have a big giant cluster of three of them, they form a Kafka cluster. Each broker hosts Kafka partitions. We talked about partitions in the previous module. And in that example, our topic had three partitions. So you can see this here, they're spread out over these three brokers. Each one of the brokers has one of the partitions. This is how Kafka scales. Of course, you're probably not gonna have a cluster of just one topic. So here's another one in a different color. It only has two partitions. In this case, the partitions, since there's only two of them, they're only hosted on two brokers. And topics can have different numbers of partitions depending on what you think their scale needs are gonna be.

Brokers are responsible for handling incoming requests to write new messages to those partitions and to read messages out of them. So those client applications on the outside there, they're writing into topics and reading from topics. And brokers are the ones that, if you will, broker those requests and do the actual IO on the partition logs. Brokers are also responsible for managing replication, which we're gonna take a look at in the next module.

Now, if you are not completely brand new to Kafka and you've maybe seen a previous version of this course or read something about it, read old docs, you may have heard about a project called Apache ZooKeeper that kind of sat alongside the brokers and helped maintain a consistent view of metadata. Well, bear in mind, you don't need that anymore. As of Apache Kafka 4.0, Kafka no longer includes or has the option of using ZooKeeper. That consistent metadata sort of database functionality is now provided by the brokers themselves using their own implementation of the Raft protocol. That's called creatively KRaft, another thing that we named by ourselves.

So as of 4.0, the brokers are storing that metadata, providing a consistent view of it for themselves using KRaft. Don't need to worry about ZooKeeper anymore.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 16