Course: ksqlDB 101

Introduction to ksqlDB

2 min
Allison WaltherIntegration Architect (Course Presenter)
Robin MoffattStaff Developer Advocate (Course Author)

ksqlDB Introduction

ksqlDB allows you to build stream processing applications on top of Apache Kafka with the ease of building traditional applications on a relational database. Using SQL to describe what you want to do rather than how, it makes it easy to build Kafka-native applications for processing streams of real-time data. Some key ksqlDB use cases include:

  • Materialized caches
  • Streaming ETL pipelines
  • Event-driven microservices

ksqlDB is designed from the principle of simplicity. While many streaming architectures require a grab-bag of components pieced together from many projects, ksqlDB provides a single platform on which you can build streaming ETL and streaming applications, and all with just a single dependency—Apache Kafka.

KSQLDB

Stream Processing

Regardless of how we choose to store it later, much of the data that we work with in our systems begins life as part of an unbounded event stream. Consider some common examples of event streams:

  • Customer interactions with a shopping cart on an e-commerce site
  • Network packets on a router
  • IoT device sensor readings
  • Manufacturing telemetry

We want to use this data for various things. Businesses drive processes in reaction to events, and they also want to analyze that data retrospectively.

Stream Processing in Action

Consider the example of a manufacturing plant, and the kind of data stream emitted by a production line:

{
    "reading_ts": "2021-06-24T09:30:00-05:00",
    "sensor_id": "aa-101",
    "production_line": "w01",
    "widget_type": "acme94",
    "temp_celcius": 23,
    "widget_weight_g": 100
}

Using this data, we may want to do multiple things:
  • Alert if the line produces an item that is over a threshold weight
  • Monitor the rate of item production
  • Detect anomalies in the equipment
  • Store the data for analytics dashboards and ad hoc querying

Stream processing is a method of processing events in an event stream, as they are happening. It is distinct from batch processing, in which events are processed at some delayed point in time, after they are created. With stream processing, we can react to events happening in the real world when they occur, rather than afterwards.

In the example of a production line, it is more desirable to alert as soon as the equipment shows abnormal temperatures. If the monitoring system waited to batch process the readings, the equipment may already be damaged or the production process impacted.

With ksqlDB, we can apply this concept of stream processing using SQL. The ability to use SQL, unlike other options that require complex coding to perform even simple tasks, makes stream processing accessible to many more developers and analysts.

  • Alert if the line produces an item that is over a threshold weight

       SELECT *
       FROM WIDGETS
       WHERE WEIGHT_G > 120
  • Monitor the rate of item production

       SELECT COUNT(*)
       FROM WIDGETS
       WINDOW TUMBLING (SIZE 1 HOUR)
       GROUP BY PRODUCTION_LINE
  • Detect anomalies in the equipment

       SELECT AVG(TEMP_CELCIUS) AS TEMP
       FROM WIDGETS
       WINDOW TUMBLING (SIZE 5 MINUTES)
       GROUP BY SENSOR_ID
       HAVING TEMP>20
  • Store the data for analytics dashboards and ad-hoc querying

       CREATE SINK CONNECTOR dw WITH (
       connector.class = S3Connector,
       topics = widgets
       []
       );

How does ksqlDB work?

ksqlDB separates its distributed compute layer from its distributed storage layer, for which it uses Apache Kafka.

ksqldb-kafka

ksqlDB allows us to read, filter, transform, or otherwise process streams and tables of events, which are backed by Kafka topics. We can also join streams and/or tables to meet the needs of our application. And we can do all of this using familiar SQL syntax.

ksqldb-computer

ksqlDB can also build stateful aggregations on event streams. How many orders have been placed in the last hour? Errors in the last five minutes? Current balance on an account? These aggregates are held in a state store within ksqlDB, and external applications can query the store directly.

Use the promo code KSQLDB101 to get $101 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.