Get Started Free
‹ Back to courses
course: ksqlDB 101

Stateful Aggregations (Materialized Views)

2 min
Allison

Allison Walther

Integration Architect (Presenter)

Stateful Aggregations (Materialized Views)

ksqlDB can filter, join, and enrich events as they happen. But you may wish to look beyond individual events to a composite view. This is where aggregation comes in.

You can use ksqlDB to build a materialized view of state, driven by the events in an Apache Kafka topic. This is done using SQL aggregation functions, such as COUNT and SUM.

In the example of order data, you can look at how many orders were placed in a given timeframe, their average value, or the total value. These materialized views are stored within ksqlDB in an embedded RocksDB instance, and are also backed by Kafka topics. This means that you can query them directly using ksqlDB, and you can also use ksqlDB to precalculate aggregates that you want to consume from the topic and use in another system, such as for analytics.

Aggregates can be calculated over the entire stream of data, or more usefully, over a windowed period. ksqlDB supports a number of different window types, including hopping, tumbling, and session. The data returned from an aggregation is always a table, with the key being the field or fields in the GROUP BY clause. You can wrap your query in a CREATE TABLE AS statement to make it persistent.

Here’s an example of an aggregation in ksqlDB:

CREATE TABLE ORDERS_PER_HOUR_BY_MAKE AS
  SELECT MAKE,
         COUNT(*) AS ORDER_COUNT,
         CAST(SUM(TOTAL_ORDER_VALUE) AS DECIMAL(9,2)) AS TOTAL_ORDER_VALUE
    FROM ORDERS_ENRICHED
          WINDOW TUMBLING (SIZE 1 HOUR)
  GROUP BY MAKE
  EMIT CHANGES;

In this example, aggregates will be bucketed by hour and will be updated every time a new event arrives.

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Stateful Aggregations (Materialized Views)

Hi, I'm Allison Walther with Confluent. Processing streams of data is just one of the things that ksqlDB can do. It's very powerful to be able to filter, join, and enrich events as they happen, but we often want to look beyond individual events and at the bigger picture too. That's where aggregates come in. We can answer questions like how many times has someone moved? What's the total value of orders placed in the last hour? KsqlDB supports those types of stateful aggregations. We call them stateful because even if a ksqlDB node is removed, the aggregation can be rebuilt on another node and retain its accurate state. With ksqlDB, we're not limited to the data in individual events. We can also perform aggregations with functions like count, count distinct, sum, average, min, max, and many more. In our examples, we can see that the total number of location changes a person has made by using count star. We can also find the number of different locations they have visited with count distinct location. The data returned by an aggregation is always a table with the key being the field or fields in the group by clause. So when we have the query fine tune to get the results we are looking for, we can then use a query in a create table as statement to make it persistent. Now that new table will always have the latest aggregated data for our events. That's it for this lesson. There are many ways to aggregate data and we'll go through some of those in our exercise.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.