Integration Architect (Presenter)
You have learned about ksqlDB's syntax and functionality, but it can be beneficial to get a sense of its internals.
When multiple ksqlDB servers are organized as a cluster, partitions are used as the basis for scaling the workload. Just like Apache Kafka consumers or Kafka Streams, ksqlDB can parallelize throughput by allocating work on partitions to different nodes. In Confluent Cloud, both your ksqlDB applications and Kafka clusters are fully managed. But you can also run ksqlDB in self-managed mode to run a custom user-defined function, or for other reasons. In this case, you can still take advantage of Confluent Cloud to manage the Kafka cluster used by your ksqlDB applications.
ksqlDB consumes and produces from Kafka topics, using SQL declarations to build and execute Kafka Streams topologies.
RocksDB is used internally to store state for tables and aggregations, with Kafka topics providing a changelog for RocksDB.
Multiple ksqlDB servers can be grouped together to form a cluster, for parallel processing and resilience.
Organizations will usually deploy multiple clusters of ksqlDB servers, forming a topology much more akin to application deployments than to a behemoth RDBMS.
Read more in the ksqlDB documentation and within the Confluent Developer course Inside ksqlDB. You can also find some deep-dives on ksqlDB functionality on the Confluent blog, including:
Hi, I'm Allison Walther with Confluent. Let's take just a peak under the covers of ksqlDB to understand a bit about how it runs. We're not going to go into a deep dive here. There's lots of good deep-dive material about ksqlDB over on the Confluent Blog. ksqlDB runs separately from the Kafka brokers, which by the way, all of your Kafka applications should be doing. Kafka brokers are just for Kafka brokers. ksqlDB is built on top of Kafka streams and like Kafka streams, it uses RocksDB as its state store. Just like Kafka, ksqlDB is a distributed system that uses partitions as its basis for scaling its workload. We can have a single node or we can scale out the nodes and they form a cluster. Just like Kafka consumers, or Kafka streams, ksqlDB can parallelize throughput by allocating work on partitions to different nodes. But don't think of it as a single huge database cluster. ksqlDB queries are continuously running in processing events, their applications. So when it comes to deploying ksqlDB, think in terms of deploying applications. In Confluent Cloud, both your Kafka clusters and ksqlDB applications are fully managed. You only need to think about your queries and topics. But sometimes, you may need to run ksqlDB in self-managed mode. For example, in order to run a custom user-defined function. In this case, you can still take advantage of Confluent Cloud to manage the Kafka cluster that your ksqlDB applications are hitting. Again, allowing you to put more focus on solving business problems. If you wanna know more about ksqlDB internals, we have a course called Advanced ksqlDB. Go check it out.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.