Get Started Free
‹ Back to courses
course: Designing Events and Event Streams

Hands-On: Modeling as Normalized vs. Denormalized

2 min
bellemare-headshot-zoomed

Adam Bellemare

Staff Technologist, Office of the CTO (Presenter)

When producing an event stream, it’s important to consider how your consumers will resolve relationships represented in the events. In an event stream with a normalized model, you may find that the data your service needs is spread across several different events in different event streams. You would then need to resolve the relationships by performing a series of stream-based joins or queries to external data stores to obtain the data you need for your system.

In contrast, denormalized event streams tend to be easier to use for consumers—they do not have to query other streams or state stores to get the data they need. Denormalized streams typically come at the expense of additional preprocessing by either the producer of the event, or by creating a purpose-built process to join the data together after the fact.

In either case, ksqlDB provides the capabilities for you to resolve any stream joins you need to make for your applications. In this hands-on exercise, you will use ksqlDB to denormalize several event streams that mirror the upstream relational model of an event source by resolving the foreign-key joins.

Create and Use a ksqlDB Table of Item Facts

Let’s start by creating the brands, tax_status, and item_dim2 tables, and populate them with events. Note the dim2 qualifier to keep our data separate from the Dimension 1.

item-table

  1. Open URL https://confluent.cloud and log in to the Confluent Cloud console.

  2. Navigate to the default environment, the event-streams cluster, and the Editor for the event-streams-ksqDB cluster.

  3. Create the brands table:

    CREATE TABLE brands (
      id BIGINT PRIMARY KEY,
      name STRING
    ) WITH (
      KAFKA_TOPIC = 'brands',
      VALUE_FORMAT = 'AVRO',
     PARTITIONS = 6
    );
  4. Create the tax_status table:

    CREATE TABLE tax_status (
      id BIGINT PRIMARY KEY,
      state_tax DECIMAL(3, 2),
      country_tax DECIMAL(3, 2)
    ) WITH (
      KAFKA_TOPIC = 'tax_status',
      VALUE_FORMAT = 'AVRO',
      PARTITIONS = 6
    );
  5. Create an items_dim2 table:

    CREATE TABLE items_dim2 (
      id BIGINT PRIMARY KEY,
      price DECIMAL(10, 2),
      name STRING,
      description STRING,
      brand_id BIGINT,
      tax_status_id BIGINT
    ) WITH (
      KAFKA_TOPIC = 'items_dim2',
      VALUE_FORMAT = 'AVRO',
      PARTITIONS = 6
    );

Next, you will join the items_dim2, brands, and tax_status tables together to create a new enriched_items table. You will need to do this using separate joins since ksqlDB requires this when doing foreign-key joins.

First, let’s join items_dim2 with brands creating items_brands.

join-items

  1. Create a new items_and_brands table by joining the items_dim2 table with the brands table.
    CREATE TABLE items_and_brands AS
    SELECT items_dim2.id AS id,
    items_dim2.price,
    items_dim2.name AS name,
    items_dim2.description,
    brands.name AS brand_name,
    items_dim2.tax_status_id
    FROM items_dim2
    JOIN brands ON brand_id = brands.id
    EMIT CHANGES;

Now let’s join the new items_and_brands table with tax_status.

join-items-2

  1. Create a new enriched_items table by joining the items_and_brands table with the tax_status table.

    CREATE TABLE enriched_items AS
    SELECT items_and_brands.id AS id,
    items_and_brands.price,
    items_and_brands.name,
    items_and_brands.description,
    items_and_brands.brand_name,
    tax_status.state_tax,
    tax_status.country_tax
    FROM items_and_brands
    JOIN tax_status ON items_and_brands.tax_status_id = tax_status.id
    EMIT CHANGES;
  2. Insert sample data in the brands and tax_status tables:

    INSERT INTO brands (id, name) VALUES (400, 'ACME');
    INSERT INTO tax_status (id, state_tax, country_tax) VALUES (777, 0.05, 0.10);
  3. Check the contents of the enriched_items table (make sure you start from EARLIEST):

    select * from ENRICHED_ITEMS EMIT CHANGES;

You’ll find that it is still empty. You need to create a single item that matches both the brand and the tax status to have an enriched event come through.

  1. Insert a matching item into the items_dim2 table:

    INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
    VALUES (321, 29.99, 'Anvil', 'Sturdy Iron Anvil, 400 lbs', 400, 777);
  2. You will now see an enriched join result in the enriched_items table:

    SELECT * FROM enriched_items EMIT CHANGES;

Let’s create three more items of the same brand with the same tax status.

  1. Create three items of the ACME brand:
    INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
    VALUES (9000, 19.99, 'Ball', 'Rubber Ball', 400, 777);
    INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
    VALUES (9001, 39.99, 'Baseball Glove', 'Leather Baseball Glove', 400, 777);
    INSERT INTO items_dim2 (id, price, name, description, brand_id, tax_status_id)
    VALUES (9002, 2.99, 'Tennis Ball', 'Green Tennis Ball', 400, 777);

Now let’s see what happens when you update items of the same brand at the same time. Let’s say that ACME has decided to rebrand to a new name. We’re going to upsert to overwrite the old brand name.

  1. Change the brand from ACME to ACME Sports:

    INSERT INTO brands (id, name) VALUES (400, 'ACME Sports');
  2. Verify that all items were updated with the new brand:

    SELECT * FROM enriched_items EMIT CHANGES;

Use the promo code EVENTDESIGN101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Hands-On: Modeling as Normalized vs. Denormalized

Hi, I'm Adam from Confluent. In this module we're gonna try our hand at denormalizing some event streams. When producing an event stream it's important to consider how your consumers will resolve relationships within the events. In a normalized model, you may find that data your service needs is spread across different events in different events streams. You would then need to resolve the relationships by performing a series of stream-based joins to obtain the data you need for your system. In contrast, denormalized event streams tend to be easier to use for consumers. They do not have to query other streams or state stores to get the data they need. Denormalized streams typically come at the expense of additional pre-processing, by either the producer of the event or by creating a purpose built process to join the data together after the fact. In either case, ksqlDB provides the capabilities for you to resolve any stream joins you need to make for your application. In this hands-on exercise, you will use ksqlDB to denormalize several event streams that mirror the upstream relational model of an event source by resolving the foreign key joins. You will start by creating the brands, tax_status and items_dim2 tables, and populate them with events. Next comes the process to denormalize the tables. You will first join the items table with the brands table. You will then join the new items_and_brands table with the tax_status table. The exercise concludes with steps to verify the behavior of the new enriched_items denormalized table. First, you will insert sample data into the items, brands and tax status tables, and verify that the data makes its way to the enriched_items table. Then you will update a row in the brands table and verify that the updated brand name appears in the enriched_items table. And there you have it. We've denormalized items, brand and tax statuses together into a single enriched item event.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.