Combining multiple events into a single encompassing event—e.g., to compute totals, averages, etc.—is a common task in event streaming and streaming analytics.
How can multiple related events be aggregated to produce a new event?
We use an Event Grouper followed by an event aggregator. The grouper prepares the input events as needed for the subsequent aggregation step, e.g. by grouping the events based on the data field by which the aggregation is computed (such as a customer ID) and/or by grouping the events into time windows (such as 5-minute windows). The aggregator then computes the desired aggregation for each group of events, e.g., by computing the average or sum of each 5-minute window.
For example, we can use Apache Flink® SQL and Apache Kafka® to perform an aggregation.
Assuming that we have a Flink SQL table called orders based on an existing Kafka topic:
CREATE TABLE orders (
order_id INT,
item_id INT,
total_units INT,
ts TIMESTAMP(3),
WATERMARK FOR ts AS ts
);
Then we'll create a derived table containing the aggregated events from that stream. In this case, we create a table called item_stats that represents per-item order statistics over 1-hour windows:
CREATE TABLE item_stats AS
SELECT item_id,
COUNT(*) AS total_orders,
AVG(total_units) AS avg_units,
window_start,
window_end
FROM TABLE(TUMBLE(TABLE orders, DESCRIPTOR(ts), INTERVAL '1' HOURS))
GROUP BY item_id, window_start, window_end;
This table will be continuously updated whenever new events arrive in the orders table.
1Slack time: Beyond Analytics: The Evolution of Stream Processing Systems (SIGMOD 2020), Aurora: a new model and architecture for data stream management (VLDB Journal 2003)