Tutorial

How to compute the minimum or maximum value of a field with Kafka Streams

An aggregation in Kafka Streams is a stateful operation used to perform a "clustering" or "grouping" of values with the same key. An aggregation in Kafka Streams may return a different type than the input value. In this example the input value is a MovieTicketSales object but the result is a YearlyMovieFigures object used to keep track of the minimum and maximum total ticket sales by release year. You can also use windowing with aggregations to get discrete results per segment of time.

       builder.stream(INPUT_TOPIC, Consumed.with(Serdes.String(), movieSalesSerde))
              .groupBy((k, v) -> v.releaseYear(),
                      Grouped.with(Serdes.Integer(), movieSalesSerde))
              .aggregate(() -> new YearlyMovieFigures(0, Integer.MAX_VALUE, Integer.MIN_VALUE),
                      ((key, value, aggregate) ->
                              new YearlyMovieFigures(key,
                                      Math.min(value.totalSales(), aggregate.minTotalSales()),
                                      Math.max(value.totalSales(), aggregate.maxTotalSales()))),
                      Materialized.with(Serdes.Integer(), yearlySalesSerde))
              .toStream()
              .peek((key, value) -> LOG.info("Aggregation min-max results key[{}] value[{}]", key, value))
              .to(OUTPUT_TOPIC, Produced.with(Serdes.Integer(), yearlySalesSerde));

Let's review the key points in this example

   .groupBy((k, v) -> v.releaseYear(),

Aggregations must group records by key. Since the stream source topic doesn't define any, the code has a groupByKey operation on the releaseYear field of the MovieTicketSales value object.

        .groupBy((k, v) -> v.releaseYear(), Grouped.with(Serdes.Integer(), movieSalesSerde)

Since you've changed the key, under the covers Kafka Streams performs a repartition immediately before it performs the grouping.
Repartitioning is simply producing records to an internal topic and consuming them back into the application. By producing the records the updated keys land on the correct partition. Additionally, since the key-value types have changed you need to provide updated Serde objects, via the Grouped configuration object to Kafka Streams for the (de)serialization process for the repartitioning.

.aggregate(() -> new YearlyMovieFigures(0, Integer.MAX_VALUE, Integer.MIN_VALUE),
                      ((key, value, aggregate) ->
                              new YearlyMovieFigures(key,
                                      Math.min(value.totalSales(), aggregate.minTotalSales()),
                                      Math.max(value.totalSales(), aggregate.maxTotalSales()))),
                      Materialized.with(Serdes.Integer(), yearlySalesSerde))

This aggregation performs a running average of movie ratings. To enable this, it keeps the running sum and count of the ratings. The aggregate operator takes 3 parameters (there are overloads that accept 2 and 4 parameters):

An initializer for the default value in this case a new instance of the YearlyMovieFigures object which is a Java POJO containing current min and max sales.
An Aggregator instance which performs the aggregation action. Here the code uses a Java lambda expression instead of a concrete object instance.
A Materialized object describing how the underlying StateStore is materialized.

 .toStream()
 .to(OUTPUT_TOPIC, Produced.with(Serdes.Integer(), yearlySalesSerde));

Aggregations in Kafka Streams return a KTable instance, so it's converted to a KStream. Then results are produced to an output topic via the to DSL operator.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog