Once our data -- such as financial transactions, tracking information for shipments, IoT sensor measurements, etc. -- is set in motion as streams of events on an Event Streaming Platform, we want to put that data to use and create value from it. Event Processors are the building blocks for achieving this, but they solve only a specific part or step of a use case.
How can we build a full-fledged application for data in motion, an application that creates, reads, processes, and/or queries Event Streams to solve a use case end-to-end?
We build an Event Processing Application by composing one or more Event Processors into an interconnected processing topology for Event Streams and Tables. Here, the continuous output streams of one processor are the continuous input streams to one or more downstream processors. The combined functionality of the application then covers our use case end-to-end, or at least covers as much of the use case as we want. (The question of how many applications should implement a use case is an important design decision, which we are not covering here.) The event processors -- which make up the larger Event Processing Application -- are typically distributed, running across multiple instances, to allow for elastic, parallel, fault-tolerant processing of data in motion at scale.
For example, an application can read a stream of customer payments from an Event Store in an Event Streaming Platform, then filter payments for certain customers, and then aggregate those payments per country and per week. The processing mode is stream processing; that is, data is continuously processed 24/7. As soon as new Events are available, they are processed and propagated through the topology of Event Processors.
Apache Kafka® is the most popular Event Streaming Platform. There are several options for building Event Processing Applications when using Kafka. We'll cover two here.
ksqlDB is a streaming database with which we can build Event Processing Applications using SQL syntax. It has first-class support for Streams and Tables.
When we create Tables and Streams in ksqlDB, Kafka topics are used as the storage layer behind the scenes. In the example below, the ksqlDB table movies
is backed by a Kafka topic of the same name.
CREATE TABLE movies (ID INT PRIMARY KEY, title VARCHAR, release_year INT)
WITH (kafka_topic='movies', partitions=1, value_format='avro');
CREATE STREAM ratings (MOVIE_ID INT KEY, rating DOUBLE)
WITH (kafka_topic='ratings', partitions=1, value_format='avro');
As we would expect, we can add new Events using INSERT
:
INSERT INTO movies (id, title, release_year) VALUES (294, 'Die Hard', 1998);
INSERT INTO ratings (movie_id, rating) VALUES (294, 8.2);
We can also perform stream processing using ksqlDB's SQL syntax. In the following example, the command CREATE STREAM .. AS SELECT ..
continuously joins the ratings
stream and the movies
table to create a new stream of enriched ratings.
CREATE STREAM rated_movies
WITH (kafka_topic='rated_movies',
value_format='avro') AS
SELECT ratings.movie_id as id, title, rating
FROM ratings
LEFT JOIN movies ON ratings.movie_id = movies.id;
With the Kafka Streams client library of Apache Kafka, we can implement an event processing application in Java, Scala, or other JVM languages. Here is a Kafka Streams example similar to the ksqlDB example above:
KStream<Integer, Rating> ratings = builder.table(<blabla>);
KTable<Integer, Movie> movies = builder.stream(<blabla>);
MovieRatingJoiner joiner = new MovieRatingJoiner();
KStream <Integer, EnrichedRating> enrichedRatings = ratings.join(movies, joiner);
See the tutorial How to join a stream and a lookup table for a full example of using Kafka Streams to build an Event Processing Application.