Get Started Free
course: Inside ksqlDB

Scaling and Fault Tolerance

3 min
Michael Drogalis

Michael Drogalis

Principal Product Manager (Presenter)

Scaling and Fault Tolerance

So far in this course, you’ve considered a single server and the way that it recovers when a new node takes its place. The next scenario to think about is a cluster with multiple nodes, and how the cluster scales compute while achieving fault tolerance.

Stateless Scaling

When you have a single server in a stateless scenario, it is assigned all of the partitions from the streams or tables it’s reading from. In the image below, a single server is reading from eight different partitions and working hard to process all of the data that is available.

Similar to the way that ksqlDB enjoys Apache Kafka® consumer guarantees, when you add servers to a cluster, ksqlDB inherits the benefits of the Kafka consumer group protocol. If you add a second server to the cluster above, the work for all of the partitions is now scaled out across two nodes, (a) and (b).

The work rebalances for you automatically, and effectively, ksqlDB processes the data twice as fast as it did with one server.

And the same also holds true when you add a much larger number of servers—for example, eight: data is still processed with the maximum amount of parallelism, because there are eight partitions. Notice how much faster the data is processed in the eight-server image below compared to the single-server image above.

As in the prior scenarios, ksqlDB takes care of assigning the right processing work to the right servers at the right time, and it happens automatically.

Stateful Scaling

Stateful scaling in ksqlDB works similarly to stateless scaling, except that in stateful scaling, state needs to be sharded across servers. You learned about data locality in How Stateful Operations Work, and stateful scaling is a prime example of how a server needs to have local access to the data that it is going to process. You can shard state by key:

So there are four nodes, each of which runs a consumer (a-d), and the data is partitioned by sensors 1-8; ksqlDB reshuffles the data to make sure that the same rows with the same key always go to the same partition, and thus to the same server in the cluster. Similar to the scenario with materialized views, the backing state for each piece of data here is written to a changelog. Thus, if a node in the cluster is lost, a new node can step in and read the changelog to recover. For example, if server b went down, its partitions might be assigned to server a and server c, each of which would read from the changelog to repopulate.

ksqlDB effectively combines the client protocol, the consumer group protocol, and the changelog design to scale state in a fault-tolerant way.

Use the promo code KSQLDB101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.