OVER aggregations

5 min

David Anderson

Principal Software Practice Lead

OVER aggregations

For windows relative to the time-of-day clock (rather than each event), see the module on window aggregations.

For a guided exploration into how both window and over aggregations work, see the accompanying hands-on exercise.

The big idea, as illustrated with an example

The idea behind OVER windows in Flink SQL is that you can have a window that ends with the current row, and that stretches backwards through the history of the stream for a specific interval, either measured in time, or by some number of rows.

For example, this query computes a running average of the price for each product, averaged over the prices paid for the last 3 orders for that product. The count of orders in each window is also included so you can see what happens during the warm-up phase, when there haven't yet been 3 orders.

SELECT 
  product_id,
  price,
  COUNT(price) OVER w AS num_prices,
  AVG(price) OVER w AS avg_price
FROM `examples`.`marketplace`.`orders`
WINDOW w AS (
  PARTITION BY product_id
  ORDER BY $rowtime
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
);

If you want to experiment with this query, I suggest narrowing the scope to only a few products, so that it's easier to observe the results:

SELECT 
  product_id,
  price,
  COUNT(price) OVER w AS num_prices,
  AVG(price) OVER w AS avg_price
FROM `examples`.`marketplace`.`orders`
WHERE product_id < 1010
WINDOW w AS (
  PARTITION BY product_id
  ORDER BY $rowtime
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
);

The results are updated for every input row. This means that the product_id and price in output are referring to the values of those columns from the latest order, and the num_prices and avg_price columns reflect the values aggregated over the window w as it is defined (relative to the latest order).

In this case the windows are partitioned by product_id, since we want to compute the running average separately for each product. You could leave out the PARTITION BY clause if you wanted to compute global aggregates, instead of breaking down the results on a per-product basis.

Sorting the input with ORDER BY $rowtime is necessary so that window's interval is well-defined. Otherwise, if the input stream were somewhat out-of-order, it wouldn't be possible to produce consistent, meaningful results. OVER windows only support sorting on time attributes, and only when ascending.

Important special cases

Flink SQL also supports deduplication, top-n, and windowed top-n queries as special cases of OVER windows. For more on this, see the exercises and the documentation.

Documentation

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

We've got promo codes for you! Use CONFLUENTDEV1 to skip credit card entry during signup, and FLINKSQL25 for $25 of additional usage credit.

Get Started

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 15