Integration Architect (Presenter)
Principal Developer Advocate (Author)
ksqlDB allows you to build stream processing applications on top of Apache Kafka with the ease of building traditional applications on a relational database. Using SQL to describe what you want to do rather than how, it makes it easy to build Kafka-native applications for processing streams of real-time data. Some key ksqlDB use cases include:
ksqlDB is designed from the principle of simplicity. While many streaming architectures require a grab-bag of components pieced together from many projects, ksqlDB provides a single platform on which you can build streaming ETL and streaming applications, and all with just a single dependency—Apache Kafka.
Regardless of how we choose to store it later, much of the data that we work with in our systems begins life as part of an unbounded event stream. Consider some common examples of event streams:
We want to use this data for various things. Businesses drive processes in reaction to events, and they also want to analyze that data retrospectively.
Consider the example of a manufacturing plant, and the kind of data stream emitted by a production line:
{
"reading_ts": "2021-06-24T09:30:00-05:00",
"sensor_id": "aa-101",
"production_line": "w01",
"widget_type": "acme94",
"temp_celcius": 23,
"widget_weight_g": 100
}
Stream processing is a method of processing events in an event stream, as they are happening. It is distinct from batch processing, in which events are processed at some delayed point in time, after they are created. With stream processing, we can react to events happening in the real world when they occur, rather than afterwards.
In the example of a production line, it is more desirable to alert as soon as the equipment shows abnormal temperatures. If the monitoring system waited to batch process the readings, the equipment may already be damaged or the production process impacted.
With ksqlDB, we can apply this concept of stream processing using SQL. The ability to use SQL, unlike other options that require complex coding to perform even simple tasks, makes stream processing accessible to many more developers and analysts.
Alert if the line produces an item that is over a threshold weight
SELECT *
FROM WIDGETS
WHERE WEIGHT_G > 120
Monitor the rate of item production
SELECT COUNT(*)
FROM WIDGETS
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY PRODUCTION_LINE
Detect anomalies in the equipment
SELECT AVG(TEMP_CELCIUS) AS TEMP
FROM WIDGETS
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY SENSOR_ID
HAVING TEMP>20
Store the data for analytics dashboards and ad-hoc querying
CREATE SINK CONNECTOR dw WITH (
connector.class = S3Connector,
topics = widgets
[…]
);
ksqlDB separates its distributed compute layer from its distributed storage layer, for which it uses Apache Kafka.
ksqlDB allows us to read, filter, transform, or otherwise process streams and tables of events, which are backed by Kafka topics. We can also join streams and/or tables to meet the needs of our application. And we can do all of this using familiar SQL syntax.
ksqlDB can also build stateful aggregations on event streams. How many orders have been placed in the last hour? Errors in the last five minutes? Current balance on an account? These aggregates are held in a state store within ksqlDB, and external applications can query the store directly.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.