Integration Architect (Presenter)
When processing data, a common requirement is to enrich it with other data. These lookups can be done in ksqlDB using the SQL JOIN syntax. Joins can be between streams of events, between streams and tables, or between tables and tables.
Instead of making calls out to external systems to enrich the data with lookups (which introduces complexity, and can be inefficient), the data is ingested, using Kafka Connect, from the external system into an Apache Kafka topic, from which the join can be performed natively using ksqlDB.
Here’s an example of joining order events to a table that contains information about the item being ordered:
CREATE STREAM ORDERS_ENRICHED AS
SELECT O.*,
I.*,
O.ORDERUNITS * I.UNIT_COST AS TOTAL_ORDER_VALUE,
FROM ORDERS O
LEFT OUTER JOIN ITEMS I
ON O.ITEMID = I.ID ;
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Allison Walther with Confluent. Let's talk about lookups and joins in ksqlDB. Let's say we've got an inbound stream of raw order events. Each one has a code that denotes information about the item that was bought. In relational terms, this is the foreign key. To make the stream of events useful, we want to add in the information about the item to each order as the event is received in Kafka. The item data includes the primary key and several attributes of the item. This information is in a separate topic and could be populated directly by a producer, but more likely it's been ingested into a Kafka topic using Kafka Connect, which we will cover in more detail in another course. Connect brings in the information from an external system, such as a relational database. We can use ksqlDB to do a join between the two topics in Kafka. The orders are treated as a stream whilst the items are a table, both are Kafka topics. Here, we're using a left outer join, which means that any order that's received with an item code that isn't found on the item data will still be processed, but written with no values for the item information. The enriched order information is written to a new Kafka topic. ksqlDB can process every existing event in the orders topic, and then every single new order event that arrives in near real-time. That's it for lookups and joins. We're gonna move on to an exercise so you can further learn this information.