Under the Covers of ksqlDB

2 min

Allison Walther

Integration Architect (Presenter)

Under the Covers of ksqlDB

You have learned about ksqlDB's syntax and functionality, but it can be beneficial to get a sense of its internals.

When multiple ksqlDB servers are organized as a cluster, partitions are used as the basis for scaling the workload. Just like Apache Kafka consumers or Kafka Streams, ksqlDB can parallelize throughput by allocating work on partitions to different nodes. In Confluent Cloud, both your ksqlDB applications and Kafka clusters are fully managed. But you can also run ksqlDB in self-managed mode to run a custom user-defined function, or for other reasons. In this case, you can still take advantage of Confluent Cloud to manage the Kafka cluster used by your ksqlDB applications.

ksqldb-kafka-db

ksqlDB consumes and produces from Kafka topics, using SQL declarations to build and execute Kafka Streams topologies.

RocksDB is used internally to store state for tables and aggregations, with Kafka topics providing a changelog for RocksDB.

Multiple ksqlDB servers can be grouped together to form a cluster, for parallel processing and resilience.

kafka-cluster

Organizations will usually deploy multiple clusters of ksqlDB servers, forming a topology much more akin to application deployments than to a behemoth RDBMS.

Read more in the ksqlDB documentation and within the Confluent Developer course Inside ksqlDB. You can also find some deep-dives on ksqlDB functionality on the Confluent blog, including:

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Under the Covers of ksqlDB

Hi, I'm Allison Walther with Confluent. Let's take just a peak under the covers of ksqlDB to understand a bit about how it runs. We're not going to go into a deep dive here. There's lots of good deep-dive material about ksqlDB over on the Confluent Blog. ksqlDB runs separately from the Kafka brokers, which by the way, all of your Kafka applications should be doing. Kafka brokers are just for Kafka brokers. ksqlDB is built on top of Kafka streams and like Kafka streams, it uses RocksDB as its state store. Just like Kafka, ksqlDB is a distributed system that uses partitions as its basis for scaling its workload. We can have a single node or we can scale out the nodes and they form a cluster. Just like Kafka consumers, or Kafka streams, ksqlDB can parallelize throughput by allocating work on partitions to different nodes. But don't think of it as a single huge database cluster. ksqlDB queries are continuously running in processing events, their applications. So when it comes to deploying ksqlDB, think in terms of deploying applications. In Confluent Cloud, both your Kafka clusters and ksqlDB applications are fully managed. You only need to think about your queries and topics. But sometimes, you may need to run ksqlDB in self-managed mode. For example, in order to run a custom user-defined function. In this case, you can still take advantage of Confluent Cloud to manage the Kafka cluster that your ksqlDB applications are hitting. Again, allowing you to put more focus on solving business problems. If you wanna know more about ksqlDB internals, we have a course called Advanced ksqlDB. Go check it out.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 28