Get Started Free
‹ Back to courses
course: Inside ksqlDB

ksqlDB's Architecture

4 min
Michael Drogalis

Michael Drogalis

Principal Product Manager (Presenter)

ksqlDB’s Architecture

A typical streaming pipeline consists of source databases and apps creating events that feed into Apache Kafka via connectors, a stream processor that filters and aggregates the events, and finally storage facilities that store the events for access by analytics engines.

ksqldb-architecture

This is a standard system design, and it works well, but it does consist of a whole host of moving parts that are prone to breakage or failure and that make scaling, securing, monitoring, debugging, and operating all as one difficult.

An architecture with ksqlDB is much simpler, as many of those external parts have been consolidated into the tool itself: ksqlDB has primitives for connectors, and it performs stream processing. It also features materialized views, so that your data can be queried just like a database table directly in ksqlDB, without needing to be sent to an external source:

ksqldb-database

In fact, ksqlDB tucks away complexity in a manner similar to classic relational databases such as PostgreSQL, which hides query execution, indexing, concurrency control, and crash recovery behind an intuitive interface.

Because ksqlDB relies on Kafka to maintain state, you ultimately end up with a simple two-tier architecture: compute and storage—ksqlDB and Kafka. And each can be elastically scaled independently of the other.

You can interact with a single ksqlDB server or multiple servers using a client-server CLI setup, like with a relational database, or via a full-featured web UI, a REST API, or a client for a programming language such as Java.

But by far the easiest way to try out ksqlDB is Confluent Cloud, and you can use the promo code KSQLDB101 for $25 of free usage (details).

Errata

  • The Confluent Cloud signup process illustrated in this video includes a step to enter payment details. This requirement has been eliminated since the video was recorded. You can now sign up without entering any payment information.

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

ksqlDB's Architecture

Hi, I'm Michael Drogalis of Confluent. And in this module, we're gonna be looking at ksqlDB's Architecture. The best way to understand ksqlDB's architecture is to look at what it replaces. Here I have a typical architecture for working with data in motion. There are some databases and applications on the left, they're creating events. There are connectors that are harvesting those events and moving them into Kafka for storage and their stream processing for continuously transforming and aggregating that event data as it arrives. And finally, there's a set of sync connectors that move that aggregated data into a database that my applications can run queries over it. This works, but it's rather complicated. There are a lot of moving parts. It's hard to scale, secure, monitor, debug, and operate all of these components is one. What if this approach could be replaced by a simpler architecture? In fact, what if it looked something like Postgres? Now this might sound like a strange question. What does Postgres have anything to do with stream processing? Quite a lot, actually. Relational databases and in particular SQL have been an incredibly effective abstraction for creating simple architectures, SQL hides all of the underlying complexity of query execution, indexing, concurrency control, crash recovery, replication, and so on. Stream processing it turns out needs to implement many of the same pieces of technology to be a complete solution. That is why ksqlDB builds on this timeless abstraction and tucks away the complexity we just looked at behind the interface of a database. Inside you have primitives for stream processing, but you also have primitives for connectors, which make it easy to transport data to and from the outside world. And lastly, you have support for materialized views, which create data sets that can be queried directly by your application, just like a database table, because ksqlDB relies on Kafka to maintain all durable state, you get a simple two tier architecture, compute and storage. KsqlDB handles compute and Kafka handles storage. Both can be elastically scaled independently from one another. The way you interact with this architecture is with a simple client server deployment, just like a relational database. You have a CLI that you issue commands from, and that CLI sends those commands to one or more servers that can cluster together. Those servers execute those commands and communicate results back to the clients as needed. And beyond the CLI you can also use a full featured UI, a REST API or a programming language client like Java. The easiest place to try out ksqlDB is Confluent Cloud. As we work through the lessons in this course, you should sign up and do just that. Here's a promo code to get you started for free. To get started, go to the URL on the screen and click the Try Free button. Then enter your name, email and password. This email and password will be used to log into Confluent Cloud later. So be sure to remember it, click the try free button and watch your inbox for a confirmation email to continue. The link in your confirmation email will lead you to the next step where you can choose between a basic standard or dedicated cluster. The associated costs are listed, but the start-up amount freely provided to you will more than cover everything you need for this course. Click Begin Configuration to choose your preferred cloud provider, region and availability zone. Cost will vary with these choices, but they're clearly shown on the bottom of the screen. Continue to set up billing information, here you'll see that you receive $200 of free usage each month for your first three months. Also by entering the promo code KSQLDB101 you receive an additional $101 to give you plenty of room to try out the things that we'll be talking about. Click Review, to get one last look at the choices you've made, then launch your new cluster. While your cluster is provisioning. Join me in the next module in this course.