Get Started Free
‹ Back to courses
course: Inside ksqlDB

High Availability

3 min
Michael Drogalis

Michael Drogalis

Principal Product Manager (Presenter)

High Availability

Up until this point, the architectures you have seen in this course share a potential problem scenario: ksqlDB rebooting from a cold state. ksqlDB does this by playing in changelogs, and the process can be made fast, but it isn’t immediate, so there will be inevitable downtime.

You can avoid this scenario, however, by adding more servers and configuring them for high availability. This means that if you lose servers, the remaining working nodes will be able to continue serving queries.

Consider two servers, a and b, where a has a replica of b, and b has a replica of a (replicas are on the right and their names begin with r):

The scenario depicted is a stateful operation where data is read from a stream and state is reflected in a table, which is backed by a changelog. But instead of the changelog being the end of the pipeline, a replica is constantly, aggressively, and optimistically replaying the changelog into its local stores. This is actually an identical action to replaying the changelog for a cold start, but it’s happening constantly and proactively, as opposed to all at once after an emergency. This means that when a node goes down, the replica can be substituted immediately.

This is an elegant solution: it’s easy to understand, and it builds upon existing ksqlDB functionality. It’s an eventually consistent system, meaning that the values eventually propagate from the primary nodes to the replicas. But you are able to control the staleness of pull queries from the replicas. So, for example, you can instruct ksqlDB to fail any request to a replica when its staleness is above a specific threshold.

Use the promo code KSQLDB101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

High Availability

Hi, I'm Michael Drogalis with Confluent and in this module, we're gonna be looking at how high availability works in ksqlDB. You might have noticed so far that one tricky thing with this architecture is what happens when ksqlDB has a completely cold reboot. It plays in these change logs, which can be made fast, but it's not immediate. There will be some downtime while a server recovers. What you really want is high availability. If you have a query and you have many servers and you lose a few, you want the remaining nodes to be able to continue to serve queries uninterrupted. You can do this by adding more servers and configuring them with high availability. Here I have two servers, A and B, and you'll notice that A has a replica of B and B has a replica of A, which is what the R indicates. This is going to depict a stateful operation where I'm reading data from a stream. Let's follow what happens as the data comes in. I have sensor two with a reading of 64. And just as before, it's doing a stateful operation. The data moves into the change log, as usual. But what happens when I have high availability configured is that the replica server is proactively playing in the change log to its local stores. It's doing this aggressively and optimistically so that it can keep up. This is combining what we just learned about change logs with its recovery mechanism. What it's really doing is the cold recovery mechanism, but it's doing it constantly and proactively. Playing this back again, notice that sensor two has an average of 64. We want this to be reflected in the replica as fast as possible. The data moves into the change log. Then it moves into the replica. So we have 64 over here. And by the time I look back at the primary server, the key has actually been updated, but you see the value up here in the replica. This is how ksqlDB replicates data for high availability. It's elegant because it's easy to understand and it builds on concepts that it already has. And it's able to be made very efficient. Because this is an eventually consistent system, the values eventually replicate to the other servers. But you are able to bound the staleness of pull queries against replica servers. So imagine if I stop the animation and I have a slightly still value, I can instruct ksqlDB to fail any requests to the replica when it's staleness is beyond a point that's just too much. So that's a quick view into high availability in ksqlDB.