Principal Product Manager (Presenter)
One of the powerful capabilities of ksqlDB is combining rows from one stream (or table) with those of another. This can be used to enrich events with reference information (denormalize), to join between event streams, or to join between tables.
A stream-table join keeps just one internal buffer, one for the table (the stream is of a theoretically infinite size, so it has no buffer). When a row from a table is read, that row goes into the internal buffer to be joined against a row from the stream, which continuously sends data. Rows are sent based on timestamp: the lowest timestamp in “stream-time” is sent first.
In this example, the join type is INNER, which means that if there’s a match between the stream and the table, a new row is created. If there is no match, the row is dropped. Note that updates to the table don’t cause the join to fire; only updates to the stream do so.
CREATE STREAM enriched_readings AS
SELECT r.reading,
r.area,
b.brand_name
FROM readings r
INNER JOIN brands b,
on b.sensor = r.sensor
EMIT CHANGES;
Now consider the same join but with a left outer clause, which will always fire a join on the stream side:
CREATE STREAM enriched_readings AS
SELECT r.reading,
r.area,
b.brand_name
FROM readings r
LEFT OUTER JOIN brands b,
on b.sensor = r.sensor
EMIT CHANGES;
So whenever a record appears in the stream, it will always get passed through. If there was no match in the table, the resulting row simply has a null column.
If you’re joining against a table of reference data, it’s a good practice to preload data, so that timestamps from your table are always smaller than timestamps from your stream. This makes ksqlDB load the largest amount of data right at the start, and it’s useful if the joined data is known ahead of time and won’t change very often.
An inner table-table join produces a table, unlike the stream-table joins you have seen so far. It has two buffers—one for each table—and it fires when there is an update on either side of the expression.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Michael Drogalis with Confluent and in this module, we're gonna be looking at how streaming joins work in ksqlDB. Joins are an essential ingredient in stream processing. When you have many independent streams of data, it's almost certainly the case that you'll wanna combine them, but how do you do that when the data is both in motion and at rest? KsqlDB helps you join streams-to-streams, streams-to-tables and tables-to-tables. In this module, we'll look at stream-table joins and table-table joins. The first one that we're gonna look at is a stream-table join. In this depiction we'll see just the change log for the table. It's just easier to read since it takes up less space. Here I have an inner join. Just like a SQL database, an inner join will fire when both the left and the right side of the expression match. When the persistent query runs, it pulls in rows as usual, but how does ksqlDB choose which row to pull in next? It selects the row across all partitions with the earliest timestamp. The lowest timestamp is called stream time. A stream-table join keeps just one internal buffer, one for the table. When a row from a table is red, it's populated into that internal buffer to be joined against. That is why the row fades away in this animation. When a row for the stream is red, it tries to match against the buffer. If there's a match, the join fires and it produces a new row. If it doesn't match, the row is dropped. Note that updates to the table don't cause the join to fire. There's no buffer for the stream since the stream is of infinite size. Next, let's look at the same join, but now with a left outer clause. A left outer clause will always fire a join on the stream side. You can see how whenever a record shows up in the stream side, it's always passed through. If there was no match in the table, the resulting row simply has a null column. Now, if you're joining against a table of reference data, it's common to preload that data to make sure that the timestamps of the table side are always smaller than that of the stream side. That effectively makes ksqlDB load a batch of data once at the start, which is useful if the join data is known ahead of time and probably won't change very much. Lastly, let's look at table-table joins. Here again, this is an inner join. A table-table join has two buffers, one for each table. What's unique about a table-table join is that it fires when there's an update on either side of the expression. By contrast to a stream-table join, a table-table join produces a table, not a stream. So that's stream-table joins and table-table joins in ksqlDB.