Once our data -- such as financial transactions, tracking information for shipments, IoT sensor measurements, etc. -- is set in motion as streams of events on an Event Streaming Platform, we want to put it to use and create value from it. How do we do this?
How can we process Events in an Event Streaming Platform?
We build an Event Processor, which is a component that reads Events and processes them, possibly writing new Events as the result of its processing. An Event Processor may act as an Event Source and/or Event Sink; in practice, both are common. An Event Processor can be distributed, which means that it has multiple instances running across different machines. In this case, the processing of Events happens concurrently across these instances.
An important characteristic of an Event Processor is that it should allow for composition with other Event Processors. In practice, we rarely use a single Event Processor in isolation. Instead, we compose and connect one or more Event Processors (via Event Streams) inside an Event Processing Application that fully implements one particular use case end-to-end, or that implements a subset of the overall business logic limited to the bounded context of a particular domain (for example, in a microservices architecture).
An Event Processor performs a specific task within the Event Processing Application. Think of the Event Processor as one processing node, or processing step, of a larger processing topology. Examples are the mapping of an event type to a domain object, filtering only the important events out of an Event Stream, enriching an event stream with additional data by joining it to another stream or database table, triggering alerts, or creating new events for consumption by other applications.
There are multiple ways to create an Event Processing Application using Event Processors. We will look at two.
The streaming database ksqlDB provides a familiar SQL syntax, which we can use to create Event Processing Applications. ksqlDB parses SQL commands and constructs and manages the Event Processors that we define as part of an Event Processing Application.
In the following example, we create a ksqlDB query that reads data from the readings
Event Stream and "cleans" the Event values. The query publishes the clean readings to a new stream called clean_readings
. Here, this query acts as an event processing application comprising multiple interconnected Event Processors.
CREATE STREAM clean_readings AS
SELECT sensor,
reading,
UCASE(location) AS location
FROM readings
EMIT CHANGES;
With ksqlDB, we can view each section of the command as the construction of a different Event Processor:
CREATE STREAM
defines the new output Event Stream to which this application will produce Events.SELECT ...
is a mapping function, taking each input Event and "cleaning" it as defined. In this example, cleaning simply means uppercasing the location
field in each input reading.FROM ...
is a source Event Processor that defines the input Event Stream for the overall application.EMIT CHANGES
is ksqlDB syntax that defines our query as continuously running, and specifies that incremental changes will be produced as the query runs perpetually.The Kafka Streams DSL provides abstractions for Event Streams and Tables, as well as stateful and stateless transformation functions (map
, filter
, and others). Each of these functions can act as an Event Processor in the larger Event Processing Application that we build with the Kafka Streams library.
builder
.stream("readings");
.mapValues((key, value)->
new Reading(value.sensor, value.reading, value.location.toUpperCase())
.to("clean");
In the above example, we use the Kafka Streams StreamsBuilder
to construct the stream processing topology.
stream
function. This creates an Event Stream from the designated Apache Kafka® topic.mapValues
function. This function accepts an Event and returns a new Event with any desired transformations to the original Event's values.to
function. This function terminates our stream processing topology.