High Availability

3 min

Michael Drogalis

Principal Product Manager (Presenter)

High Availability

Up until this point, the architectures you have seen in this course share a potential problem scenario: ksqlDB rebooting from a cold state. ksqlDB does this by playing in changelogs, and the process can be made fast, but it isn’t immediate, so there will be inevitable downtime.

You can avoid this scenario, however, by adding more servers and configuring them for high availability. This means that if you lose servers, the remaining working nodes will be able to continue serving queries.

Consider two servers, a and b, where a has a replica of b, and b has a replica of a (replicas are on the right and their names begin with r):

The scenario depicted is a stateful operation where data is read from a stream and state is reflected in a table, which is backed by a changelog. But instead of the changelog being the end of the pipeline, a replica is constantly, aggressively, and optimistically replaying the changelog into its local stores. This is actually an identical action to replaying the changelog for a cold start, but it’s happening constantly and proactively, as opposed to all at once after an emergency. This means that when a node goes down, the replica can be substituted immediately.

This is an elegant solution: it’s easy to understand, and it builds upon existing ksqlDB functionality. It’s an eventually consistent system, meaning that the values eventually propagate from the primary nodes to the replicas. But you are able to control the staleness of pull queries from the replicas. So, for example, you can instruct ksqlDB to fail any request to a replica when its staleness is above a specific threshold.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo code KSQLDB101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

High Availability

Hi, I'm Michael Drogalis with Confluent and in this module, we're gonna be looking at how high availability works in ksqlDB. You might have noticed so far that one tricky thing with this architecture is what happens when ksqlDB has a completely cold reboot. It plays in these change logs, which can be made fast, but it's not immediate. There will be some downtime while a server recovers. What you really want is high availability. If you have a query and you have many servers and you lose a few, you want the remaining nodes to be able to continue to serve queries uninterrupted. You can do this by adding more servers and configuring them with high availability. Here I have two servers, A and B, and you'll notice that A has a replica of B and B has a replica of A, which is what the R indicates. This is going to depict a stateful operation where I'm reading data from a stream. Let's follow what happens as the data comes in. I have sensor two with a reading of 64. And just as before, it's doing a stateful operation. The data moves into the change log, as usual. But what happens when I have high availability configured is that the replica server is proactively playing in the change log to its local stores. It's doing this aggressively and optimistically so that it can keep up. This is combining what we just learned about change logs with its recovery mechanism. What it's really doing is the cold recovery mechanism, but it's doing it constantly and proactively. Playing this back again, notice that sensor two has an average of 64. We want this to be reflected in the replica as fast as possible. The data moves into the change log. Then it moves into the replica. So we have 64 over here. And by the time I look back at the primary server, the key has actually been updated, but you see the value up here in the replica. This is how ksqlDB replicates data for high availability. It's elegant because it's easy to understand and it builds on concepts that it already has. And it's able to be made very efficient. Because this is an eventually consistent system, the values eventually replicate to the other servers. But you are able to bound the staleness of pull queries against replica servers. So imagine if I stop the animation and I have a slightly still value, I can instruct ksqlDB to fail any requests to the replica when it's staleness is beyond a point that's just too much. So that's a quick view into high availability in ksqlDB.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 7