Consistent time semantics are of particular importance in stream processing. Many operations in an Event Processor are dependent on time, such as joins, aggregations when computed over a window of time (e.g., five-minute averages), and handling out-of-order and "late" data. In many systems, developers have a choice between different variants of time for an Event:
Depending on the use case, developers need to pick one variant over the others.
How can Events from an Event Source be processed irrespective of the timestamps from when they were originally created by the Event Source?
Depending on the use case, Event Processors may use the time when the Event was originally created by its Event Source, the time when it was received on the Event Stream in the Event Streaming Platform, or a time derived from one or more data fields provided by the Event itself (i.e., from the Event payload).
As an example, Apache Flink® SQL exposes the wall-clock processing time as a computed column using the system PROCTIME() function.
CREATE TABLE device_readings (
device_id INT,
temperature DOUBLE,
wallclock_time AS PROCTIME()
);
SELECT device_id,
COUNT(*) AS reading_count,
window_start,
window_end
FROM TABLE(TUMBLE(TABLE device_readings, DESCRIPTOR(wallclock_time), INTERVAL '5' MINUTES))
GROUP BY device_id, window_start, window_end;