Course: Inside ksqlDB

How Streaming Joins Work

3 min
Michael DrogalisPrincipal Product Manager

How Streaming Joins Work

One of the powerful capabilities of ksqlDB is combining rows from one stream (or table) with those of another. This can be used to enrich events with reference information (denormalize), to join between event streams, or to join between tables.

Stream-Table Joins: Inner

A stream-table join keeps just one internal buffer, one for the table (the stream is of a theoretically infinite size, so it has no buffer). When a row from a table is read, that row goes into the internal buffer to be joined against a row from the stream, which continuously sends data. Rows are sent based on timestamp: the lowest timestamp in “stream-time” is sent first.

In this example, the join type is INNER, which means that if there’s a match between the stream and the table, a new row is created. If there is no match, the row is dropped. Note that updates to the table don’t cause the join to fire; only updates to the stream do so.

CREATE STREAM enriched_readings AS
    SELECT r.reading, 
           r.area, 
           b.brand_name
    FROM readings r
    INNER JOIN brands b,
    on  b.sensor = r.sensor
    EMIT CHANGES;

Stream-Table Joins: Left Outer

Now consider the same join but with a left outer clause, which will always fire a join on the stream side:

CREATE STREAM enriched_readings AS
    SELECT r.reading, 
           r.area, 
           b.brand_name
    FROM readings r
    LEFT OUTER JOIN brands b,
    on  b.sensor = r.sensor
    EMIT CHANGES;

So whenever a record appears in the stream, it will always get passed through. If there was no match in the table, the resulting row simply has a null column.

Preloading Table Data

If you’re joining against a table of reference data, it’s a good practice to preload data, so that timestamps from your table are always smaller than timestamps from your stream. This makes ksqlDB load the largest amount of data right at the start, and it’s useful if the joined data is known ahead of time and won’t change very often.

Table-Table Joins: Inner

An inner table-table join produces a table, unlike the stream-table joins you have seen so far. It has two buffers—one for each table—and it fires when there is an update on either side of the expression.

Use the promo code KSQLDB101 to get $101 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.