Use CCLOUD50 to get an additional $50 of free Confluent Cloud- (details)
April 12, 2021 | Episode 152

Automated Cluster Operations in the Cloud ft. Rashmi Prabhu

  • Transcript
  • Notes

If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these clusters. 

But running operations on the cloud in a scaling organization can be time consuming, error prone, and tedious. This episode addresses manual upgrades and rolling restarts of Confluent Cloud clusters during releases, fixes, experiments, and the like, and more importantly, the progress that’s been made to switch from manual operations to an almost fully automated process. You’ll get a sneak peek into what upcoming plans to make cluster operations a fully automated process using the Cluster Upgrader, a new microservice in Java built with Vertx. This service runs as part of the control plane and exposes an API to the user to submit their workflows and target a set of clusters. It performs statement management on the workflow in the backend using Postgres.

So what’s next? Looking forward, there will be the selection phase will be improved to support policy-based deployment strategies that enable you to plan ahead and choose how you want to phase your deployments (e.g., first Azure followed by part of Amazon Web Services and then Google Cloud, or maybe Confluent internal clusters on all cloud providers followed by customer clusters on Google Cloud, Azure, and finally AWS)—the possibilities are endless! 

The process will become more flexible, more configurable, and more error tolerant so that you can take measured risks and experience a standardized way of operating Cloud. In addition, expanding operation automations to internal application deployments and other kinds of fleet management operations that fit the “Select/Apply/Monitor” paradigm are in the works.

Continue Listening

Episode 1June 20, 2018 | 22 min

Ask Confluent #1: Kubernetes, Confluent Operator, Kafka and KSQL

Tim Berglund and Gwen Shapira discuss Kubernetes, Confluent Operator, Kafka, KSQL and more.

Episode 2July 2, 2018 | 24 min

Ask Confluent #2: Consumers, Culture and Support

Gwen Shapira answers your questions and interviews Sam Hecht (Head of Support, Confluent).

Episode 3July 20, 2018 | 22 min

Ask Confluent #3: Kafka Upgrades, Cloud APIs and Data Durability

Tim Berglund and Gwen Shapira answer your questions and have a discussion with Koelli Mungee (Customer Operations Lead, Confluent).

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.