Get Started Free

Event Grouper

An event grouper is a specialized form of an Event Processor that groups events together by a common field, such as a customer ID, and/or by event timestamps (often called windowing or time-based windowing).

Problem

How can we group individual but related events from the same Event Stream or Table, so that they can subsequently be processed as a whole?

Solution

event-grouper

For time-based grouping a.k.a. time-based windowing, we use an Event Processor that groups the related events into windows based on their event timestamps. Most window types have a pre-defined window size, such as 10 minutes or 24 hours. An exception is session windows, where the size of each window varies depending on the time characteristics of the grouped events.

For field-based grouping, we use an Event Processor that groups events by one or more data fields, irrespective of the event timestamps.

The two grouping approaches are orthogonal and can be composed. For example, to compute 7-day averages for every customer in a stream of payments, we first group the events in the stream by customer ID and by 7-day windows, and then compute the respective averages for each customer+window grouping.

Implementation

As an example, Apache Flink® SQL provides the capability to group related events by a column and group them into windows where all the related events have a timestamp within the defined time-window.

SELECT product_name, SUM(price) as total
FROM TABLE(TUMBLE(TABLE purchases, DESCRIPTOR(ts), INTERVAL '1' MINUTES))
GROUP BY product_name, window_start, window_end;

Considerations

When grouping events into time windows, there are various types of groupings possible.

  • Hopping Windows are based on time intervals. They model fixed-sized, possibly overlapping windows. A hopping window is defined by two properties: the window's duration and its advance or "hop", interval.
  • Tumbling Windows are a special case of hopping windows. Like hopping windows, tumbling windows are based on time intervals. They model fixed-size, non-overlapping, gap-less windows. A tumbling window is defined by a single property: the window's duration.
  • Session Windows aggregate events into a session, which represents a period of activity separated by a specified gap of inactivity, or "idleness". Any records with timestamps that occur within the inactivity gap of existing sessions are merged into the existing session. If a record's timestamp occurs outside of the session gap, a new session is created.

See the Flink SQL windowing table-valued functions and the Kafka Streams supported window types for details and diagrams explaining window types.

References

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free