Q: How does stream processing work?

Stream processing, also known as event-stream processing (ESP), real-time data streaming , and complex event processing (CEP), is the continuous processing of real-time data—directly as it is produced or received. Both Kafka Streams and ksqlDB allow you to build applications that leverage stream processing. Learn more about Kafka Streams in this free course or get started with ksqlDB by taking its free course .

Question 1

What is Kafka Streams?

Accepted Answer

Kafka Streams is a Java library for building applications and microservices. It provides stream processing capabilities native to Apache Kafka.

StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> textLines = builder.stream("TextLinesTopic");
KTable<String, Long> wordCounts = textLines
    .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\W+")))
    .groupBy((key, word) -> word)
    .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"));
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));

Here are some of the things you can do with Kafka Streams:

Transformations
Filtering
Aggregations
Joining
Merging and splitting streams

Applications using Kafka Streams can be stateful, provide exactly-once semantics, and can be scaled horizontally in exactly the same way you would deploy and scale any other Java application.

Learn more about Kafka Streams in this free course.

Question 2

Why use Kafka Streams?

Accepted Answer

Because Kafka Streams is part of Apache Kafka, it has very good integration with Kafka itself. This means that things like exactly-once processing semantics are possible, and security is tightly integrated.

With Kafka Streams you use your existing development, testing, and deployment tools and processes. You don’t need to deploy and manage a separate stream processing cluster.

Learn more about Kafka Streams in this free course.

Question 3

How does stream processing work?

Accepted Answer

Stream processing, also known as event-stream processing (ESP), real-time data streaming, and complex event processing (CEP), is the continuous processing of real-time data—directly as it is produced or received.

Both Kafka Streams and ksqlDB allow you to build applications that leverage stream processing.

Learn more about Kafka Streams in this free course or get started with ksqlDB by taking its free course.

Question 4

How does Kafka Streams compare to other stream-processing frameworks?

Accepted Answer

Kafka Streams is a distributed processing framework similar to Apache Flink or Spark Streaming. But it offers some distinct advantages over these other stream-processing libraries:

Kafka Streams is simply a Java app. You create your application, build a JAR file and start it. No dedicated processing cluster is needed!
Kafka Streams can dynamically scale when needed. For more processing power you just start a new application instance. To scale down, you stop one or more instances. In either case, Kafka Streams will dynamically handle resource allocation and continue working.

Question 5

How do you split a stream?

Accepted Answer

To split a stream using Kafka Streams you use the KStream#split method, which returns a BranchedKStream.

The BranchedKStream allows you to create different branches based on predicates. For example:

myStream = builder.stream(inputTopic);
           myStream.split()
              .branch((key, appearance) -> "drama".equals(appearance.getGenre()),
                   Branched.withConsumer(ks -> ks.to("drama-topic")))
              .branch(
                   (key, appearance) -> "fantasy".equals(appearance.getGenre()),
                   Branched.withConsumer(ks -> ks.to("fantasy-topic")))
              .branch(
                   (key, appearance) -> true,
                   Branched.withConsumer(ks -> ks.to("default-topic")));

Here are some more resources to learn about splitting a stream:

Question 6

Does Kafka Streams run on Apache Kafka brokers?

Accepted Answer

No, Kafka Streams applications do not run inside the Kafka brokers.

Kafka Streams applications are normal Java applications that happen to use the Kafka Streams library. You would run these applications on client machines at the perimeter of a Kafka cluster. In other words, Kafka Streams applications do not run inside the Kafka brokers (servers) or the Kafka cluster—they are client-side applications.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog