Integration Architect (Presenter)
Principal Developer Advocate (Author)
ksqlDB allows you to build stream processing applications on top of Apache Kafka with the ease of building traditional applications on a relational database. Using SQL to describe what you want to do rather than how, it makes it easy to build Kafka-native applications for processing streams of real-time data. Some key ksqlDB use cases include:
ksqlDB is designed from the principle of simplicity. While many streaming architectures require a grab-bag of components pieced together from many projects, ksqlDB provides a single platform on which you can build streaming ETL and streaming applications, and all with just a single dependency—Apache Kafka.
Regardless of how we choose to store it later, much of the data that we work with in our systems begins life as part of an unbounded event stream. Consider some common examples of event streams:
We want to use this data for various things. Businesses drive processes in reaction to events, and they also want to analyze that data retrospectively.
Consider the example of a manufacturing plant, and the kind of data stream emitted by a production line:
{
"reading_ts": "2021-06-24T09:30:00-05:00",
"sensor_id": "aa-101",
"production_line": "w01",
"widget_type": "acme94",
"temp_celcius": 23,
"widget_weight_g": 100
}
Stream processing is a method of processing events in an event stream, as they are happening. It is distinct from batch processing, in which events are processed at some delayed point in time, after they are created. With stream processing, we can react to events happening in the real world when they occur, rather than afterwards.
In the example of a production line, it is more desirable to alert as soon as the equipment shows abnormal temperatures. If the monitoring system waited to batch process the readings, the equipment may already be damaged or the production process impacted.
With ksqlDB, we can apply this concept of stream processing using SQL. The ability to use SQL, unlike other options that require complex coding to perform even simple tasks, makes stream processing accessible to many more developers and analysts.
Alert if the line produces an item that is over a threshold weight
SELECT *
FROM WIDGETS
WHERE WEIGHT_G > 120
Monitor the rate of item production
SELECT COUNT(*)
FROM WIDGETS
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY PRODUCTION_LINE
Detect anomalies in the equipment
SELECT AVG(TEMP_CELCIUS) AS TEMP
FROM WIDGETS
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY SENSOR_ID
HAVING TEMP>20
Store the data for analytics dashboards and ad-hoc querying
CREATE SINK CONNECTOR dw WITH (
connector.class = S3Connector,
topics = widgets
[…]
);
ksqlDB separates its distributed compute layer from its distributed storage layer, for which it uses Apache Kafka.
ksqlDB allows us to read, filter, transform, or otherwise process streams and tables of events, which are backed by Kafka topics. We can also join streams and/or tables to meet the needs of our application. And we can do all of this using familiar SQL syntax.
ksqlDB can also build stateful aggregations on event streams. How many orders have been placed in the last hour? Errors in the last five minutes? Current balance on an account? These aggregates are held in a state store within ksqlDB, and external applications can query the store directly.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Allison Walther with Confluent. Welcome to the ksqlDB course. ksqlDB is a database purpose built for building real-time applications that leverage stream processing. Its architecture separates its distributed compute layer from its distributed storage layer, for which it uses and tightly integrates with Apache Kafka. Using a lightweight SQL syntax ksqlDB provides everything that a developer needs to quickly and efficiently create a complete real-time application, enabling you to unlock real-time business insights, and rich customer experiences. Let's discuss a bit more about what ksqlDB offers. Let's say you have a stream or Kafka topic that captures when a widget is manufactured by your company. And each data event in the stream also captures the color of the widget created. Now, perhaps you want a separate stream that filters only events that are built for blue widgets. ksqlDB would allow you to do that in real time. Perhaps you also want to merge two topics together, which is common for many stream processing use cases. ksqlDB can easily join topics to one another. Not all that dissimilar to how you could join two tables in a relational database. In this example, we can join that original topic to another topic that focuses on green and yellow widgets, and then filter out the messages that we care less about. ksqlDB can be used to aggregate streams into tables and capture summary statistics over a window of time within the stream. These tables can then be subsequently joined with other tables and streams. In this example, we're using stream processing to get a count of the number of widgets by color type within the topic. Again, these examples are very elementary and stream processing is of course capable of much more advanced logic, but this provides you with a high level idea of how ksqlDB can help you build applications that action on new events happening within your business immediately. Let's take a look at what this might look like in action. With ksqlDB, developers can use a lightweight SQL syntax to build a complete real-time application. This includes the ability to enrich input streams, to derive new event streams and creating aggregations of event streams that are continuously updated as new events occur. And that can serve both push and pull queries to applications. We will dive more into the various constructs that make up ksqlDB in a bit. By bringing the same feel of traditional application development on a relational database, ksqlDB opens up stream processing and real time application development up to many more developers within an organization than if only Kafka streams was used. Here's a look at the ksqlDB UI in Confluent Cloud. We have a graphical editor that allows us to quickly try out new queries or to create streams and tables. There are also pages for viewing details about streams, tables and the persistent queries that make them up. One very powerful feature is the flow page. With the flow page, we get an at-a-glance view of the components of our application and how they are interacting. And if you're building these event streaming applications, it's likely that you're going to want to work with data that exists in a variety of other data stores. Confluent provides 100 plus pre-built connectors to easily start moving your data in and out of Kafka, but it still requires deploying a separate Kafka connect cluster, which can take time. That's why ksqlDB supports running connectors directly on its servers, rather than needing to run a separate Kafka connect cluster for capturing events, ksqlDB can run pre-built connectors in embedded mode. Best of all, you can do this entirely with SQL syntax, making it much easier to work with data throughout your organization. Let's get started, but first here's a promo code for $101 of free Confluent Cloud usage. Head over there and get signed up. Then meet me back here to continue onward. Now let's get you signed up on Confluent Cloud. You'll land on the initial page and enter things like your name and your email. Click Agree, and the Email Me checkboxes. And now you can click the Submit button. You should check your email for confirmation. It should pop up within about a second. We'll be prompted next about what type of cluster we want. Let's use standard for now. Then we'll be asked about what cloud provider we would like to launch our cluster in. For this course let's use Google Cloud US West 4 in a single AZ. Now we'll navigate our way to the credit card page and fill in your own information. You can go ahead and expand the promo code field because we do have a promo code for you today. It's 101KSQLDB, Review your information that you've put in and go ahead and click Launch and look, a new cluster has appeared.