What is a partition in Apache Kafka?
Why are Kafka topics broken down into partitions?
When events are written to a Kafka topic, they are actually stored on one of the many partitions within that topic. It's a form of sharding. By breaking that topic down into smaller partitions, write capacity can increase, and the overall cluster throughput can be higher.
This is especially true when data is read out from a Kafka topic because the unit of processing parallelism for Kafka is the partition. Kafka consumers can join together as a part of a consumer group and Kafka will load-balance to distribute the topic's partitions amongst the consumers.
As an example, if there are 10 partitions in a topic and 5 Kafka consumers in a group consuming from that topic, then each consumer will receive 2 partitions to consume from to process those events.
How many partitions should I choose for my topic?
The number of partitions you chose depends on a number of factors, but there are some general guidelines to consider.