What is replication in Apache Kafka?
Kafka replication is a configuration parameter that controls how many copies of your data there should be, ultimately ensuring data durability and availability across your cluster.
How does replication work in Apache Kafka?
Configured at the topic-level, replication defines how many copies of an individual partition the Kafka brokers will maintain across the entire Kafka cluster. The primary copy of a partition is called the leader partition; the copies are called follower partitions.
How does replication impact durability of data in Kafka?
With replication enabled, Kafka producers first write events to the leader partition. Eventually (either synchronously or asynchronously depending on the
acks configuration parameter), the data will be sent to the follower partitions running on other nodes in the Kafka cluster.
How does replication impact availability of data in Kafka?
If the broker containing a leader partition goes down, there are still other perfect copies of the data on other nodes in the cluster. Kafka will elect a new leader partition from one of the other perfect copies and Kafka producers and consumers can resume writes and reads as usual.
What do I need to do to set replication?
When creating a new Kafka topic, set the replication factor to be greater than 1. Then, to enable synchronous replication, set
acks=’all’ within your producer configuration.