Kafka Storage & Processing Fundamentals

What is a stream? A topic? A partition? Learn the core fundamentals of Kafka's storage and processing layers, and how they are related.

Kafka is an event streaming platform designed from the ground up for data in motion. But what's an event? What's a stream, a table? And how can we put them to good use when building the real-time applications and microservices that businesses need in a digital, always-on world?

Figure: Aggregating a stream of events into a continuously updated table

In the blog series below, we'll explore the fundamentals of Kafka to answer these questions. We start with a primer on events, streams, and tables, and then walk through the bits and pieces of Kafka’s storage layer all the way up to Kafka’s processing layer, covering tools like the streaming database ksqlDB and the Kafka Streams library that allow us to build applications to work with our data in Kafka. Finally, we learn how these applications are made to be elastic, scalable, and fault tolerant. By the end of the series, you should have a solid understanding of event streaming with Kafka, and what you need to watch out for when putting events, streams, and tables to practice in your own applications and use cases.

If you have any questions or suggestions along the way, please head over to the Confluent Community Forum.

A Primer on Events, Streams, Tables

In part 1, we cover the basic elements of an event streaming platform: events, streams, and tables. We also introduce the stream-table duality and learn why it is a crucial concept for an event streaming platform like Kafka. We start with a definition and examples of what an event actually is.

Read part 1 of the blog series

The Storage Layer: Topics, Partitions, And More

In part 2, we explore topics, partitions, brokers, and storage formats, along with practical advice such as how to best partition your data. A solid understanding of partitions is elementary because their importance goes well beyond the storage layer: they enable Kafka’s scalability, elasticity, and fault tolerance across both the storage and processing layers. We start this deep dive with the most basic storage question: how do I store data in Kafka?

Read part 2 of the blog series

The Processing Layer: Distributed Applications, Parallelism with Partitions, Data Contracts

In part 3, we move beyond storing events to processing events by looking at Kafka’s distributed processing architecture. How can we build real-time applications and microservices that process all the interesting data we have stored in Kafka? We explore streams and tables in more detail, along with data contracts and consumer groups, and how all this enables us to implement distributed applications and microservices that process data in parallel at scale. We start with how events stored in Kafka topics are made accessible for processing by turning them into streams and tables.

Read part 3 of the blog series

Elasticity, Fault Tolerance, and Other Advanced Concepts

In part 4, we continue learning about building applications and microservices with Kafka. We dive into how elastic scaling and fault tolerance are architected and implemented—including a return to the stream-table duality discussed in part 1, which turns out to underpin many of these capabilities. We start with how fault-tolerant processing of streams and tables is achieved, after which we explore how Kafka provides elasticity for our applications. We’ll see that these subjects are actually two sides of the same coin. Along the way, we share practical advice such as how to deal with data skew, the impact of topic compaction, and recommendations for production deployments.

Read part 4 of the blog series