Once our data -- such as financial transactions, tracking information for shipments, IoT sensor measurements, etc. -- is set in motion as streams of events on an Event Streaming Platform, we want to put that data to use and create value from it. Event Processors are the building blocks for achieving this, but they solve only a specific part or step of a use case.
How can we build a full-fledged application for data in motion, an application that creates, reads, processes, and/or queries Event Streams to solve a use case end-to-end?
We build an Event Processing Application by composing one or more Event Processors into an interconnected processing topology for Event Streams and Tables. Here, the continuous output streams of one processor are the continuous input streams to one or more downstream processors. The combined functionality of the application then covers our use case end-to-end, or at least covers as much of the use case as we want. (The question of how many applications should implement a use case is an important design decision, which we are not covering here.) The event processors -- which make up the larger Event Processing Application -- are typically distributed, running across multiple instances, to allow for elastic, parallel, fault-tolerant processing of data in motion at scale.
For example, an application can read a stream of customer payments from an Event Store in an Event Streaming Platform, then filter payments for certain customers, and then aggregate those payments per country and per week. The processing mode is stream processing; that is, data is continuously processed 24/7. As soon as new Events are available, they are processed and propagated through the topology of Event Processors.
Apache Kafka® is the most popular Event Streaming Platform. There are several options for building Event Processing Applications when using Kafka. We'll cover two here.
Flink enables developers to build Event Processing Applications using SQL syntax.
In the example below, the movies and ratings tables are backed by Kafka topics.
CREATE TABLE movies (
movie_id INT NOT NULL,
title STRING,
release_year INT
);
CREATE TABLE ratings (
movie_id INT NOT NULL,
rating FLOAT
);
As we would expect, we can add new Events using INSERT:
INSERT INTO movies VALUES (928, 'Dune: Part Two', 2024);
INSERT INTO ratings VALUES (928, 9.6);
We can also perform stream processing using SQL syntax. In the following example, the command CREATE TABLE .. AS SELECT .. continuously joins the ratings and movies tables to populate a new table of ratings enriched with metadata about the rated movie.
CREATE TABLE rated_movies AS
SELECT ratings.movie_id as id, title, rating
FROM ratings
LEFT JOIN movies ON ratings.movie_id = movies.movie_id;
With the Kafka Streams client library of Apache Kafka, we can implement an event processing application in Java, Scala, or other JVM languages. Here is a Kafka Streams example similar to the Flink example above:
KStream<Integer, Rating> ratings = builder.table(<blabla>);
KTable<Integer, Movie> movies = builder.stream(<blabla>);
MovieRatingJoiner joiner = new MovieRatingJoiner();
KStream <Integer, EnrichedRating> enrichedRatings = ratings.join(movies, joiner);
See the tutorial How to join a stream and a lookup table in Kafka Streams for a full example of using Kafka Streams to build an Event Processing Application.