Event Joiner

Event Streams may need to be joined (i.e. enriched) with a Table or another Event Stream in order to provide more comprehensive details about their Events.

Problem

How can I enrich an Event Stream or Table with additional context?

Solution

event joiner

We can combine Events in a stream with a Table or another Event Stream by performing a join between the two. The join is based on a key shared by the original Event Stream and the other Event Stream or Table. We can also provide a window buffering mechanism based on timestamps, so that we can produce join results when Events from both Event Streams aren't immediately available. Another approach is to join an Event Stream and a Table that contains more static data, resulting in an enriched Event Stream.

Implementation

With the streaming database ksqlDB, we can create a stream of Events from an existing Kafka topic (in this example, note the similarity to fact tables in data warehouses):

CREATE STREAM ratings (MOVIE_ID INT KEY, rating DOUBLE)
    WITH (KAFKA_TOPIC='ratings');

We can then create a Table from another existing Kafka topic that changes less frequently. This Table serves as our reference data (similar to dimension tables in data warehouses).

CREATE TABLE movies (ID INT PRIMARY KEY, title VARCHAR, release_year INT)
    WITH (KAFKA_TOPIC='movies');

To create a stream of enriched Events, we perform a join between the Event Stream and the Table.

SELECT ratings.movie_id AS ID, title, release_year, rating
   FROM ratings
   LEFT JOIN movies ON ratings.movie_id = movies.id
   EMIT CHANGES;

Considerations

  • In ksqlDB, joins between an Event Stream and a Table are driven by the Event Stream side of the join. Updates to the Table only update the state of the Table. Only a new Event in the Event Stream will cause a new join result. For example, if we're joining an Event Stream of orders to a Table of customers, a new order will be enriched if there is a customer record in the Table. But if a new customer is added to the Table, that will not trigger the join condition. The ksqlDB documentation contains more information about stream-table join semantics.
  • We can perform an inner or left-outer join between an Event Stream and a Table.
  • Joins are also useful for initiating subsequent processing when two or more corresponding Events arrive on different Event Streams or Tables.

References