Once our data -- such as financial transactions, tracking information for shipments, IoT sensor measurements, etc. -- is set in motion as streams of events on an Event Streaming Platform, we want to put it to use and create value from it. How do we do this?
How can we process Events in an Event Streaming Platform?
We build an Event Processor, which is a component that reads Events and processes them, possibly writing new Events as the result of its processing. An Event Processor may act as an Event Source and/or Event Sink; in practice, both are common. An Event Processor can be distributed, which means that it has multiple instances running across different machines. In this case, the processing of Events happens concurrently across these instances.
An important characteristic of an Event Processor is that it should allow for composition with other Event Processors. In practice, we rarely use a single Event Processor in isolation. Instead, we compose and connect one or more Event Processors (via Event Streams) inside an Event Processing Application that fully implements one particular use case end-to-end, or that implements a subset of the overall business logic limited to the bounded context of a particular domain (for example, in a microservices architecture).
An Event Processor performs a specific task within the Event Processing Application. Think of the Event Processor as one processing node, or processing step, of a larger processing topology. Examples are the mapping of an event type to a domain object, filtering only the important events out of an Event Stream, enriching an event stream with additional data by joining it to another stream or database table, triggering alerts, or creating new events for consumption by other applications.
There are multiple ways to create an Event Processing Application using Event Processors. We will look at two.
Flink SQL provides a familiar standard SQL syntax for creating Event Processing Applications. Flink SQL parses SQL commands and constructs and manages the Event Processors that we define as part of an Event Processing Application.
In the following example, we create a Flink SQL query that reads data from the readings Event Stream and "cleans" the Event values. The query publishes the clean readings to a new table called clean_readings. Here, this query acts as an Event Processing Application comprising multiple interconnected Event Processors.
CREATE TABLE clean_readings AS
SELECT sensor,
reading,
UPPER(location) AS location
FROM readings;
With Flink SQL, we can view each section of the command as the construction of a different Event Processor:
The Kafka Streams DSL provides abstractions for Event Streams and Tables, as well as stateful and stateless transformation functions (map, filter, and others). Each of these functions can act as an Event Processor in the larger Event Processing Application that we build with the Kafka Streams library.
builder
.stream("readings");
.mapValues((key, value)->
new Reading(value.sensor, value.reading, value.location.toUpperCase())
.to("clean");
In the above example, we use the Kafka Streams StreamsBuilder to construct the stream processing topology.