Q: How does Kafka work?

Kafka runs as a cluster of one or more servers, known as brokers . Brokers store event data on disk and provide redundancy through replication. Kafka scales horizontally, which gives resilience along with performance benefits, and allows clients to process data in parallel. Clients interact with the cluster using APIs defined in binary protocols , communicated over TCP/IP. Clients produce or consume event data, typically using language-based libraries. Client libraries are available in many languages and frameworks, including Java , C/C++ , C# , Python , Go , Node.js , and Spring Boot . There is also a REST API for interacting with Kafka.

Q: What is a Kafka broker?

Kafka runs in a cluster of nodes, called brokers . Brokers are designed to be simple. They do not process data themselves and they only view event data as opaque arrays of bytes. A broker's main job is to provide the core functionality of storing, replicating, and serving data to clients. Brokers replicate data amongst other brokers in the cluster for the purposes of resilience and availability. A producer application connects to a set of initial brokers in the cluster, and requests metadata information about the cluster. Using this information, the producer determines which broker to send events to, based on a partitioning strategy. Consumer applications often work in coordination, called a consumer group . One of the brokers in a cluster has the additional responsibility of coordinating applications that are part of a consumer group. For a full introduction to Kafka design, see this free Kafka 101 course .

Q: What is a Kafka consumer group?

A consumer group in Kafka is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. When a single consumer cannot keep up with the throughput of messages that Kafka is providing, you can horizontally scale that consumer by creating additional instances of it. The partitions of all the topics are divided among the consumers in the group. As new group members arrive and old members leave, the partitions are reassigned so that each member receives its proportional share of partitions. This is known as rebalancing the group. Kafka keeps track of the members of a consumer group and allocates data to them. To utilize all of the consumers in a consumer group, there must be as many partitions as consumer group members. If there are fewer partitions than consumers in the group, then there will be idle consumers. If there are more partitions than consumers in the group, then consumers will read from more than one partition. You can learn more about consumers in this free Apache Kafka 101 course .

Q: What is lag in Kafka?

Lag in Kafka can refer to two things: How far behind a consumer is in reading the available messages. Depending on the consumer’s purpose, it may need to read and process messages at a greater rate. A consumer group provides for stateless horizontal scalability. How far behind a follower partition is in replicating the messages from the leader partition . If a follower gets too far behind then it becomes an out-of-sync replica .

Question 1

How does Kafka work?

Accepted Answer

Kafka runs as a cluster of one or more servers, known as brokers.

Brokers store event data on disk and provide redundancy through replication. Kafka scales horizontally, which gives resilience along with performance benefits, and allows clients to process data in parallel.

Clients interact with the cluster using APIs defined in binary protocols, communicated over TCP/IP. Clients produce or consume event data, typically using language-based libraries.

Client libraries are available in many languages and frameworks, including Java, C/C++, C#, Python, Go, Node.js, and Spring Boot. There is also a REST API for interacting with Kafka.

Question 2

What is a Kafka broker?

Accepted Answer

Kafka runs in a cluster of nodes, called brokers. Brokers are designed to be simple. They do not process data themselves and they only view event data as opaque arrays of bytes. A broker's main job is to provide the core functionality of storing, replicating, and serving data to clients.

Brokers replicate data amongst other brokers in the cluster for the purposes of resilience and availability.

A producer application connects to a set of initial brokers in the cluster, and requests metadata information about the cluster. Using this information, the producer determines which broker to send events to, based on a partitioning strategy.

Consumer applications often work in coordination, called a consumer group. One of the brokers in a cluster has the additional responsibility of coordinating applications that are part of a consumer group.

For a full introduction to Kafka design, see this free Kafka 101 course.

Question 3

Where does Kafka store data?

Accepted Answer

Kafka brokers store data on local disk. Broker behavior is controlled by configuration parameters, and the directory that Kafka stores data in is configured by the log.dir value.

Moving "warm" data to other storage types yet keeping it accessible is possible using Tiered Storage, implemented in both Confluent Cloud and Confluent Platform. There are also community efforts underway as part of KIP-405.

You can read more about how Kafka stores data here.

Question 4

What is Kafka’s max message size?

Accepted Answer

In Kafka, events are produced and consumed in batches. The maximum size for a batch of events is controlled by the message.max.bytes configuration on the broker. This can be overridden on a per-topic basis by setting the max.message.bytes configuration for the topic.

Somewhat confusingly, these two parameters are indeed different on the broker (message.max.bytes) and topic (max.message.bytes)!

For more details see the configuration documentation. Be sure to check out the free Kafka 101 course on Confluent Developer.

Question 5

What is a Kafka consumer group?

Accepted Answer

A consumer group in Kafka is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience.

When a single consumer cannot keep up with the throughput of messages that Kafka is providing, you can horizontally scale that consumer by creating additional instances of it.

The partitions of all the topics are divided among the consumers in the group. As new group members arrive and old members leave, the partitions are reassigned so that each member receives its proportional share of partitions. This is known as rebalancing the group. Kafka keeps track of the members of a consumer group and allocates data to them.

To utilize all of the consumers in a consumer group, there must be as many partitions as consumer group members. If there are fewer partitions than consumers in the group, then there will be idle consumers. If there are more partitions than consumers in the group, then consumers will read from more than one partition.

You can learn more about consumers in this free Apache Kafka 101 course.

Question 6

What is Kafka rebalancing?

Accepted Answer

Kafka consumers may be subscribed to a set of topics as part of a consumer group. When consumers join or leave the group, a broker, acting as the coordinator, assigns partitions to the consumers in the group as a way of evenly spreading the partition assignments across all members. This is known as rebalancing the group. You can read more about it in this blog.

Question 7

What is offset in Kafka?

Accepted Answer

The offset in a Kafka topic is the location of any given event in a particular partition. In Kafka, events are stored in topics, which are linear and append only. Topics are further divided into partitions. Offsets are monotonically increasing and are represented by a 64-bit integer.

For more information on Kafka fundamentals, see this Kafka 101 course.

Question 8

What is Kafka ISR?

Accepted Answer

An in-sync replica (ISR) is a Kafka topic partition’s replica (known as a follower), which has enough of the data from the primary version of the partition (known as the leader), that it is said to be in sync.

The degree to which a follower can lag from the leader and still be deemed to be in sync is configured by the broker setting replica.lag.time.max.ms.

For more details see this deep-dive blog and the documentation.

Question 9

What is lag in Kafka?

Accepted Answer

Lag in Kafka can refer to two things:

How far behind a consumer is in reading the available messages. Depending on the consumer’s purpose, it may need to read and process messages at a greater rate. A consumer group provides for stateless horizontal scalability.
How far behind a follower partition is in replicating the messages from the leader partition. If a follower gets too far behind then it becomes an out-of-sync replica.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog