ksqlDB can filter, join, and enrich events as they happen. But you may wish to look beyond individual events to a composite view. This is where aggregation comes in.
You can use ksqlDB to build a materialized view of state, driven by the events in an Apache Kafka topic. This is done using SQL aggregation functions, such as
In the example of order data, you can look at how many orders were placed in a given timeframe, their average value, or the total value. These materialized views are stored within ksqlDB in an embedded RocksDB instance, and are also backed by Kafka topics. This means that you can query them directly using ksqlDB, and you can also use ksqlDB to precalculate aggregates that you want to consume from the topic and use in another system, such as for analytics.
Aggregates can be calculated over the entire stream of data, or more usefully, over a windowed period. ksqlDB supports a number of different window types, including hopping, tumbling, and session. The data returned from an aggregation is always a table, with the key being the field or fields in the
GROUP BY clause. You can wrap your query in a
CREATE TABLE AS statement to make it persistent.
Here’s an example of an aggregation in ksqlDB:
CREATE TABLE ORDERS_PER_HOUR_BY_MAKE AS SELECT MAKE, COUNT(*) AS ORDER_COUNT, CAST(SUM(TOTAL_ORDER_VALUE) AS DECIMAL(9,2)) AS TOTAL_ORDER_VALUE FROM ORDERS_ENRICHED WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY MAKE EMIT CHANGES;
In this example, aggregates will be bucketed by hour and will be updated every time a new event arrives.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.