How Streaming Joins Work

3 min

Michael Drogalis

Principal Product Manager (Presenter)

How Streaming Joins Work

One of the powerful capabilities of ksqlDB is combining rows from one stream (or table) with those of another. This can be used to enrich events with reference information (denormalize), to join between event streams, or to join between tables.

Stream-Table Joins: Inner

A stream-table join keeps just one internal buffer, one for the table (the stream is of a theoretically infinite size, so it has no buffer). When a row from a table is read, that row goes into the internal buffer to be joined against a row from the stream, which continuously sends data. Rows are sent based on timestamp: the lowest timestamp in “stream-time” is sent first.

In this example, the join type is INNER, which means that if there’s a match between the stream and the table, a new row is created. If there is no match, the row is dropped. Note that updates to the table don’t cause the join to fire; only updates to the stream do so.

CREATE STREAM enriched_readings AS
    SELECT r.reading, 
           r.area, 
           b.brand_name
    FROM readings r
    INNER JOIN brands b,
    on  b.sensor = r.sensor
    EMIT CHANGES;

Stream-Table Joins: Left Outer

Now consider the same join but with a left outer clause, which will always fire a join on the stream side:

CREATE STREAM enriched_readings AS
    SELECT r.reading, 
           r.area, 
           b.brand_name
    FROM readings r
    LEFT OUTER JOIN brands b,
    on  b.sensor = r.sensor
    EMIT CHANGES;

So whenever a record appears in the stream, it will always get passed through. If there was no match in the table, the resulting row simply has a null column.

Preloading Table Data

If you’re joining against a table of reference data, it’s a good practice to preload data, so that timestamps from your table are always smaller than timestamps from your stream. This makes ksqlDB load the largest amount of data right at the start, and it’s useful if the joined data is known ahead of time and won’t change very often.

Table-Table Joins: Inner

An inner table-table join produces a table, unlike the stream-table joins you have seen so far. It has two buffers—one for each table—and it fires when there is an update on either side of the expression.

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

How Streaming Joins Work

Hi, I'm Michael Drogalis with Confluent and in this module, we're gonna be looking at how streaming joins work in ksqlDB. Joins are an essential ingredient in stream processing. When you have many independent streams of data, it's almost certainly the case that you'll wanna combine them, but how do you do that when the data is both in motion and at rest? KsqlDB helps you join streams-to-streams, streams-to-tables and tables-to-tables. In this module, we'll look at stream-table joins and table-table joins. The first one that we're gonna look at is a stream-table join. In this depiction we'll see just the change log for the table. It's just easier to read since it takes up less space. Here I have an inner join. Just like a SQL database, an inner join will fire when both the left and the right side of the expression match. When the persistent query runs, it pulls in rows as usual, but how does ksqlDB choose which row to pull in next? It selects the row across all partitions with the earliest timestamp. The lowest timestamp is called stream time. A stream-table join keeps just one internal buffer, one for the table. When a row from a table is red, it's populated into that internal buffer to be joined against. That is why the row fades away in this animation. When a row for the stream is red, it tries to match against the buffer. If there's a match, the join fires and it produces a new row. If it doesn't match, the row is dropped. Note that updates to the table don't cause the join to fire. There's no buffer for the stream since the stream is of infinite size. Next, let's look at the same join, but now with a left outer clause. A left outer clause will always fire a join on the stream side. You can see how whenever a record shows up in the stream side, it's always passed through. If there was no match in the table, the resulting row simply has a null column. Now, if you're joining against a table of reference data, it's common to preload that data to make sure that the timestamps of the table side are always smaller than that of the stream side. That effectively makes ksqlDB load a batch of data once at the start, which is useful if the join data is known ahead of time and probably won't change very much. Lastly, let's look at table-table joins. Here again, this is an inner join. A table-table join has two buffers, one for each table. What's unique about a table-table join is that it fires when there's an update on either side of the expression. By contrast to a stream-table join, a table-table join produces a table, not a stream. So that's stream-table joins and table-table joins in ksqlDB.

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups & Events

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups & Events

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWMastering Production Data Streaming Systems with Apache Kafka®

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWLearn More

Language Guides

Modules: Start from lesson 1
Total 7