VP Developer Relations
It would not do if we stored each partition on only one broker. Whether brokers are bare metal servers or managed containers, they and their underlying storage are susceptible to failure, so we need to copy partition data to several other brokers to keep it safe. Those copies are called follower replicas, whereas the main partition is called the leader replica. When you produce data to the leader—in general, reading and writing are done to the leader—the leader and the followers work together to replicate those new writes to the followers.
This happens automatically, and while you can tune some settings in the producer to produce varying levels of durability guarantees, this is not usually a process you have to think about as a developer building systems on Kafka. All you really need to know as a developer is that your data is safe, and that if one node in the cluster dies, another will take over its role.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hey, Tim Berglund here to talk about Kafka replication. Now it will be no good at all, if we stored each partition on only one broker. And whether brokers are bare metal servers or manage containers or whatever, they and their underlying storage are susceptible to failure. So we need to copy partition data to several other brokers to keep it safe. Those copies are called follower replicas. Whereas the main partition is called the leader replica. And there is a difference between those kinds of replicas. Every partition that gets replicated has one leader and then N-1 followers. When you produce data to the partition you're really producing it to the leader in general, when you're writing data to a partition or reading data from a partition, you're talking to the leader. That's a good simplifying assumption. In fact, it's always true that when you're writing you write to the leader and after that the leader and the followers work together to get replication done, to make those new writes move to the followers and not just be on the leader. Now, the nice thing is you don't need to think about this when you're using Kafka as a developer when you're writing code against producer and consumer APIs and all these things that you do on a daily basis. You don't need to think about replication. Most of the time, you just need to know it's there. Now, if you're administering a Kafka cluster you might turn knobs having to do with replication a lot, but ideally somebody else is doing that, or you're using a fully managed service like conflict cloud. And this is the thing that you should just know happens but not be too worried about the details. And the nice thing is it is an automatic process. As long as you've got replication turned on which is the default, you write to a leader and the followers come up and get that new data automatically and it just happens for you. And there are some settings on write that can be tuned to produce varying levels of durability guarantee, but it's not usually a process you have to think about as a developer building systems on Kafka. All you really need to know, is that your data is safe and if one node in the cluster dies, another one will take over its role and the messages and the topic will still be there.