Event Aggregator

Combining multiple events into a single encompassing event—e.g., to compute totals, averages, etc.—is a common task in event streaming and streaming analytics.

Problem

How can multiple related events be aggregated to produce a new event?

Solution

We use an Event Grouper followed by an event aggregator. The grouper prepares the input events as needed for the subsequent aggregation step, e.g. by grouping the events based on the data field by which the aggregation is computed (such as a customer ID) and/or by grouping the events into time windows (such as 5-minute windows). The aggregator then computes the desired aggregation for each group of events, e.g., by computing the average or sum of each 5-minute window.

Implementation

event-aggregator

For example, we can use Apache Flink® SQL and Apache Kafka® to perform an aggregation.

Assuming that we have a Flink SQL table called orders based on an existing Kafka topic:

CREATE TABLE orders (
    order_id INT,
    item_id INT,
    total_units INT,
    ts TIMESTAMP(3),
    WATERMARK FOR ts AS ts
);

Then we'll create a derived table containing the aggregated events from that stream. In this case, we create a table called item_stats that represents per-item order statistics over 1-hour windows:

CREATE TABLE item_stats AS
  SELECT item_id,
      COUNT(*) AS total_orders,
      AVG(total_units) AS avg_units,
      window_start,
      window_end
  FROM TABLE(TUMBLE(TABLE orders, DESCRIPTOR(ts), INTERVAL '1' HOURS))
  GROUP BY item_id, window_start, window_end;

This table will be continuously updated whenever new events arrive in the orders table.

Considerations

In event streaming, a key technical challenge is that—with few exceptions—it is generally not possible to tell whether the input data is "complete" at a given point in time. For this reason, stream processing technologies such as the Kafka Streams client library of Apache Kafka employ techniques such as slack time¹ and grace periods (e.g., see the Kafka Streams ofSizeAndGrace method for specifying a grace period in windowing operations). Apache Flink® watermarks and associated watermark strategies define cutoff points after which an Event Processor will discard any late-arriving input events from its processing, e.g, see the delayed watermark strategy Flink SQL examples here. See the Suppressed Event Aggregator pattern for additional information.

References

Related patterns: Suppressed Event Aggregator
More detailed examples of event aggregation can be seen in the following tutorials: How to count a stream of events with Kafka Streams and How to count a stream of events with Flink SQL.

Footnotes

¹Slack time: Beyond Analytics: The Evolution of Stream Processing Systems (SIGMOD 2020), Aurora: a new model and architecture for data stream management (VLDB Journal 2003)

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog