Staff Technologist, Office of the CTO (Presenter)
When producing an event stream, it’s important to consider how your consumers will resolve relationships represented in the events. In an event stream with a normalized model, you may find that the data your service needs is spread across several different events in different event streams. You would then need to resolve the relationships by performing a series of stream-based joins or queries to external data stores to obtain the data you need for your system.
In contrast, denormalized event streams tend to be easier to use for consumers—they do not have to query other streams or state stores to get the data they need. Denormalized streams typically come at the expense of additional preprocessing by either the producer of the event, or by creating a purpose-built process to join the data together after the fact.
In either case, ksqlDB provides the capabilities for you to resolve any stream joins you need to make for your applications. In this hands-on exercise, you will use ksqlDB to denormalize several event streams that mirror the upstream relational model of an event source by resolving the foreign-key joins.
Create and Use a ksqlDB Table of Item Facts
Let’s start by creating the brands, tax_status, and item_dim2 tables, and populate them with events. Note the dim2 qualifier to keep our data separate from the Dimension 1.
Open URL https://confluent.cloud and log in to the Confluent Cloud console.
Navigate to the default environment, the event-streams cluster, and the Editor for the event-streams-ksqDB cluster.
Create the brands table:
CREATE TABLE brands (
id BIGINT PRIMARY KEY,
name STRING
) WITH (
KAFKA_TOPIC = 'brands',
VALUE_FORMAT = 'AVRO',
PARTITIONS = 6
);
Create the tax_status table:
CREATE TABLE tax_status (
id BIGINT PRIMARY KEY,
state_tax DECIMAL(3, 2),
country_tax DECIMAL(3, 2)
) WITH (
KAFKA_TOPIC = 'tax_status',
VALUE_FORMAT = 'AVRO',
PARTITIONS = 6
);
Create an items_dim2 table:
CREATE TABLE items_dim2 (
id BIGINT PRIMARY KEY,
price DECIMAL(10, 2),
name STRING,
description STRING,
brand_id BIGINT,
tax_status_id BIGINT
) WITH (
KAFKA_TOPIC = 'items_dim2',
VALUE_FORMAT = 'AVRO',
PARTITIONS = 6
);
Next, you will join the items_dim2, brands, and tax_status tables together to create a new enriched_items table. You will need to do this using separate joins since ksqlDB requires this when doing foreign-key joins.
First, let’s join items_dim2 with brands creating items_brands.
CREATE TABLE items_and_brands AS
SELECT items_dim2.id AS id,
items_dim2.price,
items_dim2.name AS name,
items_dim2.description,
brands.name AS brand_name,
items_dim2.tax_status_id
FROM items_dim2
JOIN brands ON brand_id = brands.id
EMIT CHANGES;
Now let’s join the new items_and_brands table with tax_status.
Create a new enriched_items table by joining the items_and_brands table with the tax_status table.
CREATE TABLE enriched_items AS
SELECT items_and_brands.id AS id,
items_and_brands.price,
items_and_brands.name,
items_and_brands.description,
items_and_brands.brand_name,
tax_status.state_tax,
tax_status.country_tax
FROM items_and_brands
JOIN tax_status ON items_and_brands.tax_status_id = tax_status.id
EMIT CHANGES;
Insert sample data in the brands and tax_status tables:
INSERT INTO brands (id, name) VALUES (400, 'ACME');
INSERT INTO tax_status (id, state_tax, country_tax) VALUES (777, 0.05, 0.10);
Check the contents of the enriched_items table (make sure you start from EARLIEST):
select * from ENRICHED_ITEMS EMIT CHANGES;
You’ll find that it is still empty. You need to create a single item that matches both the brand and the tax status to have an enriched event come through.
Insert a matching item into the items_dim2 table:
INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
VALUES (321, 29.99, 'Anvil', 'Sturdy Iron Anvil, 400 lbs', 400, 777);
You will now see an enriched join result in the enriched_items table:
SELECT * FROM enriched_items EMIT CHANGES;
Let’s create three more items of the same brand with the same tax status.
INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
VALUES (9000, 19.99, 'Ball', 'Rubber Ball', 400, 777);
INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
VALUES (9001, 39.99, 'Baseball Glove', 'Leather Baseball Glove', 400, 777);
INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
VALUES (9002, 2.99, 'Tennis Ball', 'Green Tennis Ball', 400, 777);
Now let’s see what happens when you update items of the same brand at the same time. Let’s say that ACME has decided to rebrand to a new name. We’re going to upsert to overwrite the old brand name.
Change the brand from ACME to ACME Sports:
INSERT INTO brands (id, name) VALUES (400, 'ACME Sports');
Verify that all items were updated with the new brand:
SELECT * FROM enriched_items EMIT CHANGES;
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Adam from Confluent. In this module we're gonna try our hand at denormalizing some event streams. When producing an event stream it's important to consider how your consumers will resolve relationships within the events. In a normalized model, you may find that data your service needs is spread across different events in different events streams. You would then need to resolve the relationships by performing a series of stream-based joins to obtain the data you need for your system. In contrast, denormalized event streams tend to be easier to use for consumers. They do not have to query other streams or state stores to get the data they need. Denormalized streams typically come at the expense of additional pre-processing, by either the producer of the event or by creating a purpose built process to join the data together after the fact. In either case, ksqlDB provides the capabilities for you to resolve any stream joins you need to make for your application. In this hands-on exercise, you will use ksqlDB to denormalize several event streams that mirror the upstream relational model of an event source by resolving the foreign key joins. You will start by creating the brands, tax_status and items_dim2 tables, and populate them with events. Next comes the process to denormalize the tables. You will first join the items table with the brands table. You will then join the new items_and_brands table with the tax_status table. The exercise concludes with steps to verify the behavior of the new enriched_items denormalized table. First, you will insert sample data into the items, brands and tax status tables, and verify that the data makes its way to the enriched_items table. Then you will update a row in the brands table and verify that the updated brand name appears in the enriched_items table. And there you have it. We've denormalized items, brand and tax statuses together into a single enriched item event.