Course: Apache Kafka® 101

ksqlDB

6 min
Tim BerglundSr. Director, Developer Advocacy (Course Presenter)

ksqlDB

Kafka Streams works very well as a Java-based stream processing API, both to build scalable, stand-alone stream processing applications and to enrich Java applications with stream processing functionality that complements their other functions. But what if you don’t have an existing commitment to Java? Or what if you find it advantageous from an architectural or operational perspective to deploy a pure stream processing job without its own web interface or API to expose results to the front end? This is where ksqlDB comes in.

ksqlDB is a highly specialized kind of database that is optimized for stream processing applications. It runs on a scalable, fault-tolerant cluster of its own, exposing a REST interface to applications, which can then submit new stream processing jobs to run and query the results. The language in which those stream processing jobs and queries are defined is SQL. With REST and command line interface options, it doesn’t matter what language you use to build your applications. And it’s easy to get started within development mode, either running in Docker or on a single node running natively on a development machine.

Here’s some example ksqlDB code that does substantially the same thing as the Kafka Streams code we looked at up above:

CREATE TABLE rated_movies AS
   SELECT  title,
           release_year,
           sum(rating) / count(rating) AS avg_rating
   FROM ratings
   INNER JOIN movies ON ratings.movie_id = movies.movie_id
   GROUP BY title,
            release_year;

This query would result in a table whose key would be the composite of movie title and release year, and the value would be the average rating for the movie—and ksqlDB would provide query access to that table over its REST API. ksqlDB also provides an integration with Kafka Connect, allowing you to connect to external data sources from within the ksqlDB interface, running Connect either embedded in the cluster or in its own standalone cluster.

Overall, you can think of ksqlDB as a standalone, SQL-powered stream processing engine that performs continuous processing of event streams and exposes the results to applications in a database-like way. It aims to provide one mental model for most Kafka-based stream processing application workloads.

Use the promo code KAFKA101 to get $101 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.