Getting Apache Kafka to work across data centers or across cloud regions has always been a little bit of a thing. Now, Confluent has a really cool approach to this called cluster linking. On today's episode of Streaming Audio, I talked to Nikhil Bhatia. He's an engineer on the team, who's built cluster linking. We dig into a little bit about how it works. Streaming Audio is brought to you by Confluent Developer. At the time I'm recording this, we're just relaunching a brand new version of that site with all kinds of great content. You should check it out at developer.confluent.io. That's developer.confluent.io. Just keep saying it. Let's get to today's show.
Hello, and welcome to another episode of Streaming Audio. I'm your host, once again, Tim Berglund. I'm joined in the virtual studio today by my colleague, Nikhil Bhatia. Nikhil is a principal engineer who works, I believe, on the core Kafka team. Nikhil, welcome to the show.
Thank you, Tim. Thank you for the opportunity to talk to you today.
You got it. I want to talk about cluster linking. Now, that's a feature of the Confluent platform and Confluent Cloud that went into preview and early access last summer, I want to say. I recorded a video on it and 2020 just kind of blurs together, but I'm pretty sure it was late summer. Robin Moffatt and I did a demo, where we did some linking. It'll be linked in the show notes if you want to watch that video. But I want to talk about the current state of that feature and what it means and how we use it, how it works, but first, tell us about you. How did you come to be where you are? How did you come to do what you do?
Yeah. So, a brief introduction about me, I started my career when I was in India. I worked in a couple of startup companies, building networks switches, network routers. I even did some kind of hardware design at that point.
Love it.
And then I ventured into trying to open my own startup, building cameras for phones when they didn't exist, but I failed. So I decided to come and do my master's in the U.S. At that point, I interned at Microsoft. I was part of the operating system division working on application compatibility and they decided to hire me. There, I worked on some of the features, like Switchback, which allows older versions of applications to run on newer versions of the operating system. I worked on the search space in Bing, in distributed systems areas like Bing, serving almost 20% of queries of the world. And then I was part of Azure Cloud, where I was a part of the making sure VMs are reliable, our cloud is reliable and things like that. And that's when I joined Confluent. I started working on infinite storage features that we shipped in the cloud on the Confluent platform. My latest project is on global Kafka of building this amazing cluster linking feature that we are going to talk about today.
Cool. Set up for us, if you would, the basic problem that this is solving. I think of it just broadly as multi-data center Kafka. So, am I right about that? What are the actual problem and use case, and why is it hard?
Yeah. So I think replicating topics between Kafka clusters has been such an important problem that has seen some number of solutions in terms of MirrorMaker, Confluent Replicator. Also, there have been other approaches of using a stretched cluster to [inaudible 00:04:17] MRC to provide some kind of guarantees, but there has been no cloud-native in the box Kafka solution, which allows no external component, provides this seamless cross-cluster topic replication experience that we built as part of cluster linking. So what this actually means is we essentially have no external components that are needed to do this topic application. We introduce a new concept of a mirror topic, which allows remote clusters to replicate topic data partition for partition, byte for byte, as well as important method-
Offset for offset.
Offset for offset. Exactly. And allowing scenarios such as global reads, allowing to provide those APIs for disaster recovery, allowing us to do the important migration to the cloud.
Now, if you're stumbling into this podcast and are brand new to Kafka or very new to Kafka, one quick little bit of background, so you got a topic, which is an ordered collection of messages. It's split into partitions. Each partition is really a log. Every message in a partition has a unique offset that is just a monotonically increasing number. That is an integer that identifies each message. And so, when I said offset for offset, historically, that's been a hard thing because if I got a topic at one place, I can read messages from it and write them to another place. There's my multi-datacenter replication, but the offsets don't come along for the ride there necessarily. You don't get offset replication in that case for free, so that's kind of been a big deal.
So Kafka, as originally conceived, feels like a thing that is expected to live inside one data center. It runs in the cloud just fine. You can go spin up instances and manage your own Kafka cluster in compute resource cloud of your choice, but there isn't... In Apache Kafka, from its beginnings, there hasn't been any baked in multi-datacenter kind of thing. So that's always been a thing that people have come up with. You mentioned external components to solve the problem. Before we get into cluster linking and how it works and all that kind of stuff, what historically are the other solutions people have used for this? I know when I ask that question, some of you were like, "You mean you're going to make him say MirrorMaker 1 and you're going to act like you don't know what it is." No, of course, I know what it is and I know there's some pain here, but take us through the legacy solutions for this problem, if you will.
Yeah. I think we had a few tools, I would say, that are available. As you mentioned, we started with MirrorMaker 1 and now we have MirrorMaker 2, which is upstream in Apache Kafka, and similarly-
That's right. It's a part of open-source AK, right?
Yeah. Yeah, exactly. That's part of AK. And then there is a Confluent Replicator, which is a Confluent proprietary tool that does similar external synchronization data replication between two clusters. And then as I mentioned, this is for the connected cluster topology. And then there is the other typology, where you have a stretched cluster where we have Confluent providing an MRC solution, which is a multi-region cluster solution.
Yes, we recently, just a few weeks ago, when you and I are recording this in the middle of April, so just a few weeks ago, Anna McDonald's springtime episode aired, where we talked about the anti-pattern and observer promotion and Confluent platform 6.1 things with clusters of that kind. We will also link to that in the show notes because it was super important for understanding that use case. Actually, I got feedback from a number of people that are like, "Yes, this is the perfect explanation." So, that's good. We're not talking about that here. We're talking about something different. I would just want to put stretched clusters off to the side. Finish listening to this episode, go listen to the Anna episode from a few weeks ago and learn about multi or learn about stretched clusters and MRC that way. Could you, Nikhil, tell us about how MirrorMaker and Confluent Replicator work basically architecturally? Because they're similar in approach. So, what's the solution? What's that solution?
Yeah. Yeah, I think these tools, as I mentioned, are these external components that sit outside the two clusters, where we are replicating this data from source to destination. So they use producer and consumer APIs and use a Connect framework to synchronize this data, read the data from the source cluster, and write the data to the destination cluster. And then for offset preservation, they use external interceptors at the client, which allow to do timestamp-based offset preservation. There are no guarantees that it comes up with the challenges of maintaining this extra distributed system that lives external to Kafka as a separate component. It comes with I know data not being exactly the same byte by byte, but it also gives... From the pros, it gives you out of the toolbox to do different kinds of things if you want to do SMTs on a message. So it has its own benefits of living outside of the cluster, but then it has its own challenges of maintenance, reliability, and providing that seamless solution to the customer.
Yeah. I mean, I'm talking about in part, a piece of the Confluent platform product here, which is Confluent Replicator, so I don't want to trash talk it because it's a viable solution that has worked and is still working, but it's always felt to me like this was a step along a journey. It's basically a connector, right? It's a connector and some interceptors to make this work, and that's elegant and that it builds this really important piece of functionality on top of stuff that's already there. As I like to say, Kafka is built out of Kafka as much as possible, and that's an example of Confluent building some IP out of Kafka and MirrorMaker 2 building Kafka out of Kafka, so that's cool.
But to your point, yeah, there's this other thing out there running. Connect is great. You should use Connect. Probably you have a need for Connect, but is that the right place for this? It just feels like when you were 10, there was that frozen pizza that your parents got you from the store and you ate it. It was good and you liked it. It got the job done. You go back to that frozen pizza as an adult and you're like, "Did I actually eat that as food? How did I do that?" It feels like a step and we're maturing. So, where is that? Where is maturity for us?
Yeah, I think I like your analogy of the frozen pizza there. So I think over the period of time, as we have seen customers use this technology, we identified key areas where we can make this just tremendously better as a next step of removing this extra component altogether, providing the same reliability and scalability, availability, offer application protocol that lives within the cluster the way how Kafka provides reliability at the cluster level, but actually provided across clusters. So we can go into more details of how we basically introduced [inaudible 00:12:45], for example, which is monotonically increasing a number essentially that every broker gets, and that helps us define the data lineage. So we apply the same concepts of replication, but we also apply them across the two clusters, which provides cluster linking, that byte to byte replication, and offset preservation.
Yes. Epic's a tool in the toolbox for handling various kinds of consistency problems. There are certain consistency problems that Epic has just show up and like, "Oh, it's better now. Okay. That's good." Yeah. Walk us through how it works. I guess to be clear, we use phrases like global Kafka, and we talked about this being a Kafka thing. The stuff we're talking about right now is not a part of the open-source Apache Kafka project. This is a feature in Confluent Cloud. It's a feature in the Confluent Platform. Those things are based on Apache Kafka, but this is stuff that you get in one of those places or the other, just to be clear with... I'm always delicate with the use of the name Kafka itself, so everybody understands there.
But walk us through how it works. I mean, I guess the simple case is a cluster in one region and a cluster in another region. Maybe they're the same cloud provider. Maybe they're not. Substitute anything you want. If one of them is an on-prem data center, where they're both geographically diverse on-prem data centers, whatever, they are far apart. The latency is not local area network latency. Cluster here, cluster there, I want them to have the same things in them. How does this work?
Yeah. So I think you put it right. So I just wanted to add one aspect that this is cloud-native, which also differentiates it from other solutions, which require an extra component. So, how does it start? So basically, they provide an API in the cloud. When you say ccloud Kafka topic-create, so you take a topic and you then create a middle topic on the destination. What that does is it creates this read-only topic, which is the middle topic, which essentially as you create the link, you can define the security properties for that. You can define what source it's going to talk to and what topic it's trying to mirror. Once you create that link, it will start this asynchronous replication in the background and begin replicating the source topic.
So now, let me stop there. A couple of questions before I forget the questions, I think is really what I mean. The mirror topic, do I just create that as a topic? Is there any special setting or metadata? Do I have to manage the read-onlyness of it? What about that makes it a mirror topic other than our concept in our minds?
Yeah. So basically, we expose API, which takes care of all the details of how the topic is managed, how the topic is, how the properties are, what the configuration is. All we need when you create the link is the source bootstrap server and your security credentials of how can we read the data from the source cluster.
Nice. Okay. So, I've created this mirror topic. What's the other question I had? See, I told you I have to ask them right away or they're gone. It'll come back to me. You got the mirror topic. You got the source topic and we run. It's a Confluent ccloud command line command to establish the connection.
Yeah. So, we use the API to create the topic on the destination, and the destination cluster, which has cluster linking enabled will be able to automatically create this topic, create this in-sync replica set, create it based on the number of partitions of the topic. It will be able to start replicating this data, just like any other replica, like how leaders and followers work like internal cluster protocol.
It's, in other words, a topic and it has partitions and it has replicas and it works like replication works in that destination cluster. I thought those mechanisms are just the same.
Yes. So I think that's exactly what provides cluster linking this offset preservation and byte to byte replication, and it guarantees that a replication protocol provides for the leaders and the followers.
I want to get to byte for byte and offset preservation in a minute because those are super important. The other thing you said, I just wanted to underscore, I remember I was going to say now, asynchronous, so this destination cluster, pardon me, destination topic in the remote cluster is messages are replicated asynchronously. I want to clarify that. As always, tell me if I'm getting it right, but when we say asynchronous, we mean with respect to the producers on the source cluster. So somebody is writing to that topic and there are acts and there's a min ISR setting. All that stuff is happening as it normally does, which defines what synchronous means for that producer. But the presence of cluster linking here, of linking to this remote cluster does not affect the number of acts and the min ISR settings and all of the synchronous message producing behavior is exactly the way it is normally. Am I right?
Exactly. I think you hit the bull's eye there.
Good. It felt like that was maybe not my best explanation of that whole thing, but I think we got through it together because it is just important to define what asynchronous basically means.
Yeah. I think from the client's perspective, as you mentioned, the producers are talking to the local cluster or what we call the source cluster. They have their own act setting, say, accessible to all. They have, say, a replication factor of three, which means that they will have three replicas where... I mean, instead of two, which means that they need to replicate this data to the minimum number of replicas to be able to acknowledge that the write has happened. That does not have any effect if you link that topic to another destination cluster, because the destination manages its own reliability separately. Cluster linking essentially is replicating this data asynchronously, which means your producers are not impacted. So that's the produce side, but this also means that you can actually have consumers on the destination, which can also perform read. So say if you have a cluster in Seattle, the other one is in Tokyo, the customers in Tokyo can consume this data from the mirror topic on the data that has been replicated. So that's why we call this mirror topic as a read-only topic, which allows you to consume the data.
Got it. If I want to write and I happen to be far away at large latency with respect to the source cluster, still I write to that one and I deal with the fact that the latency is large. I never write to the link, to the destination topic.
On the same topic. Yes.
Yeah. Got it. Okay. That makes sense. Okay. Is there a five-minute dive under the covers that you can do about how offset replication works and just what the protocol looks like?
Yeah. So there are two aspects to it. One is the offset itself, how the Kafka log works, is you have a concept of followers that live their followers and leaders. The way replication works are the followers fetch the log based on an offset that they have. There is a data lineage metadata that is available, which allows the leader to truncate the log and have the correct offset. So in terms of failures, when the broker goes down and comes back up, it's able to reliably continue replicating data the right offset. So the same concept applies to cluster linking, where the destination broker, which could be the leader of that topic would replicate this data from the source cluster and basically have the same replication concept of being able to truncate this data the right offsets, be able to understand if the source cluster broker has a failure and be able to accommodate for that.
Got it. Okay. So it's the same replication, the basic replication protocol as intra-cluster replication?
Yes.
I understand. It makes sense.
In addition, I think there is another concept of how offsets are preserved for the clients. So in Kafka, we have a concept of consumer offsets topic, which allows consumers to be reliable. If they go down, they are able to continue from where they left off. That's the other thing that cluster linking API does, is basically makes sure that all your topic configs, ACLs, and your consumer offsets are sync. So that in terms of when you have a disaster, you can switch your consumers to the destination and they can continue from where they left off.
Okay. So all that metadata is replicated as well. It's not just the topic data moving from source cluster to destination cluster, but parts of the consumer offset topic have replicated those messages associated with the consumers that are subscribed to the link to the topic.
Yes.
Okay. That is nice, and that lets this actually be a valid disaster recovery kind of situation. You can actually switch over and you don't have to somehow magically bring your consumer offsets with you.
Yeah. You can move your consumer groups from one source cluster to other disaster to your destination cluster and they can continue where they left off.
Nice. You mentioned the recent addition of some DR APIs or disaster recovery APIs. Now, again, this is a feature that I think as we record this, it's still in preview in Confluent platform and limited availability in Confluent Cloud, so it's like an early thing. It may or may not be the case when this episode goes live. We'll see. You should check. Maybe we'll put a note in the description if that has changed. But you mentioned that disaster recovery API potentially being a part of the GA thing. Tell us about that.
Yeah. So essentially, we provide for the mirror topic two APIs. One is the promote and the other one is failover. So what promote does is basically saying, "Hey, I am ready to make my current topic right on the destination." So remember, mirror topics are read-only topics, but at some point, if say for migration scenarios, where you want to actually move your workloads from source to destination, you can promote your topic to be right on the destination, and then you can move your consumer groups seamlessly, which also ensures that you have replicated the data correctly and you are able to promote your consumer groups safely to the other side.
Awesome. When you say API, are those documented APIs or is this the proper user interface, the ccloud command-line command?
Yeah. These are like ccloud command command-line APIs that could be used on the cloud.
Nice.
And then we have the same features on the Confluent platform.
Oh, very nice. Okay. So we can do it in both places. That's pretty cool.
The second API I think we were talking about is the failover, which is that, hey, if I had a disaster on my source cluster, I may not have replicated the full set of data, but I want to failover. So basically, remember that we're replicating the data, as well as the consumer offsets, so there is a logic that we ensure that when you sync over your consumer groups from your source to destination, after the failover, they are able to start from consistently in terms of data and offsets available on the destination. That's the second API that we provide for disaster recovery.
Excellent. Since you were working on implementing this, what was one of the hardest things about it that surprised you when you're in the process of building it, if you could talk about that?
Yeah, I think there were plenty. I will maybe highlight a couple of them. So I think global replication is absolutely within Kafka is a new concept in the sense that these clusters can be really far away, which kind of differentiates this from any other known solution. For example, in MRC, in stretched cluster case, you don't want these clusters to be absolutely far away because of consistency and ZooKeeper, how your architecture is. With cluster linking, these clusters can actually be, as I mentioned, Seattle, Tokyo, and you can enable different kinds of scenarios with cluster linking. So we had to make sure that we provide those guarantees. Distant clusters work well with this cluster linking technology.
The other thing I think is the whole how data and the configs replicate and how do we provide that consistency across these application tenants in one sense, right? So making sure that consumers are able to start correctly on the other side after failover was something that required a deep analysis. Performance was another area, where we wanted to make sure that cluster linking does not affect the performance of a regular cluster. We're implementing the right safety nets around not impacting the local produce compared to the cluster linking global replication traffic.
My guest today has been Nikhil Bhatia. Nikhil, thanks for being a part of Streaming Audio.
Thank you so much, Tim. A pleasure talking to you about this technology. I really encourage everybody to come and try out cluster linking in the cloud. It's available. It's early access. Let us know how we can help.
And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer. That's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture design patterns, executable tutorials, covering ksqlDB, Kafka streams, and core Kafka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code PODCAST100 to get an extra $100 of free Confluent Cloud usage.
Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening, or reach out in our community Slack or forum. Both are linked in the show notes. While you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. If you subscribe through Apple Podcast, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So, thanks for your support and we'll see you next time.
Note: This episode was recorded when Cluster Linking was in preview mode. It’s now generally available as part of the Confluent Q3 ‘21 release on August 17, 2021.
Infrastructure needs to react in real time to support globally distributed events, such as cloud migration, IoT, edge data collection, and disaster recovery. To provide a seamless yet cloud-native, cross-cluster topic replication experience, Nikhil Bhatia (Principal Engineer I, Product Infrastructure, Confluent) and the team engineered a solution called Cluster Linking. Available on Confluent Cloud, Cluster Linking is an API that enables Apache Kafka® to work across multi-datacenters, making it possible to design globally available distributed systems.
As industries adopt multi-cloud usage and depart from on-premises and single cluster operations, we need to rethink how clusters operate across regions in the cloud. Cluster Linking as an inter-cluster replication layer into Confluent Server, allowing you to connect clusters together and replicate topics asynchronously without the need for Connect.
Cluster Linking requires zero external components when moving messages from one cluster to another. It replicates data into its destination by partition and byte for byte, preserving offsets from the source cluster. Different from Confluent Replicator and MirrorMaker2, Cluster Linking simplifies failover in high availability and disaster recovery scenarios, improving overall efficiency by avoiding recompression. As a great cost-effective alternative to Multi-Region Cluster, Cluster Linking reduces traffic between data centers and enables inter-cluster replication without the need to deploy and manage a separate Connect cluster.
With low recovery point objective (RPO) and recovery time objective (RTO), Cluster Linking enables scenarios such as:
Find out more about Cluster Linking architecture and set your data in motion with global Kafka.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us