May 11, 2022 | Episode 214

Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli

  • Transcript
  • Notes

How much can Apache Kafka® scale horizontally, and how can you automatically balance, or rebalance data to ensure optimal performance?

You may require the flexibility to scale or shrink your Kafka clusters based on demand. With experience engineering cluster elasticity and capacity management features for cloud-native Kafka, Ajit Yagaty (Confluent Cloud Control Plane Engineering) and Aashish Kohli (Confluent Cloud Product Management) join Kris Jenkins in this episode to explain how the architecture of Confluent Cloud supports elasticity. 

Kris suggests that optimal elasticity is like water from a faucet—you should be able to quickly obtain as many resources as you need, but at the same time you don't want the slightest amount to go wasted. But how do you specify the amount of capacity by which to adjust, and how do you know when it's necessary?

Aashish begins by explaining how elasticity on Confluent Cloud has come a long way since the early days of scaling via support tickets. It's now self-serve and can be accomplished by dialing up or down a desired number of CKUs, or Confluent Units of Kafka. A CKU corresponds to a specific amount of Kafka resources and has been made to be consistent across all three major clouds. You can specify the number of CKUs you need via API, CLI or Confluent Cloud UI. 

Ajit explains in detail how, once your request has been made, cluster resizing is a two-step process. First, capacity is added, and then your data is rebalanced. Rebalancing data on the cluster is critical to ensuring that optimal performance is derived from the available capacity. The amount of time it takes to resize a Kafka cluster depends on the number of CKUs being added or removed, as well as the amount of data to be rebalanced. 

Of course, to request more or fewer CKUs in the first place, you have to know when it's necessary for your Kafka cluster(s). This can be challenging as clusters emit a large variety of metrics. Fortunately, there is a single composite metric that you can monitor to help you decide, as Ajit imparts on the episode.  

Other topics covered by the trio include an in-depth explanation of how Confluent Cloud achieves elasticity under the hood (separate control and data planes, along with some Kafka dogfooding), future plans for autoscaling elasticity, scenarios where elasticity is critical, and much more.

Continue Listening

Episode 215May 17, 2022 | 6 min

Apache Kafka 3.2 - New Features & Improvements

Apache Kafka 3.2 delivers new KIPs in three different areas of the Kafka ecosystem: Kafka Core, Kafka Streams, and Kafka Connect. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent), shares release highlights

Episode 216May 19, 2022 | 33 min

Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB

Apache Kafka isn’t just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too! Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.

Episode 217May 26, 2022 | 55 min

Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free