Senior Curriculum Developer
Cluster Linking allows you to directly connect clusters and perfectly mirror topics, consumer offsets, and ACLs from one cluster to another.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
The newest entry to data replication is Cluster Linking.
For most use cases, Cluster Linking is the best solution for data replication as it takes the lessons and frustrations that were learned from years of working with MirrorMaker 2 and Confluent Replicator and improves on all of them.
Cluster Linking allows you to directly connect clusters and perfectly mirror topics, consumer offsets, and ACLs from one cluster to another.
Since the clusters are directly connected, offsets are preserved between clusters as the data is copied byte-for-byte.
Unlike Replicator and MirrorMaker 2, Cluster Linking does not require running Kafka Connect.
Messages on the source topics are mirrored precisely on the destination cluster, with the same partitions and offsets.
No duplicated records will appear in a mirror topic with regard to what the source topic contains.
Cluster Linking replicates topics from one Kafka or Confluent cluster to another, providing the following capabilities:
Global Replication, or the unifying of data and applications from regions and continents worldwide.
High Availability and Disaster Recovery allowing you to build a multi-region disaster recovery strategy that achieves low recovery times and minimal data loss by replicating topic data and metadata to another cluster.
Cloud Migration and Modernization, for when you need to migrate from an older cluster to one in a newer environment, region, or cloud, or maybe you’d like to modernize your existing architecture.
And Data sharing and aggregation where you're exchanging data between different teams, lines of business, and organizations, or combining data from smaller clusters into one larger one.
With all of these capabilities working across your on-premise environments, whether that is Apache Kafka and Confluent Platform, and Confluent Cloud whether you’re using Amazon web services, Google Cloud, or Azure.
Compared to other Kafka replication options, Cluster Linking offers these advantages:
Cluster Linking is built into Confluent Server and Confluent Cloud, so it does not depend on additional components, connectors, virtual machines, or custom processes.
It creates exact mirrors of topics, including offsets, to enable migration and failover without worrying about offset translation or building custom tooling.
Cluster Linking can be dynamically updated via REST APIs, the command line interface, and Kubernetes custom resource definitions.
For compressed messages, byte-to-byte replication achieves faster throughput by avoiding decompression and recompression.
No duplicate records will appear in a mirror topic with regard to what the source topic contains Cluster Linking is designed to be resilient in the face of failures such as broker outages, partition reassignment, or network issues.
This helps ensure that your data replication remains reliable even in the face of infrastructure flakiness.
For those concerned with security and who are familiar with Confluent replicator, Cluster Linking allows you to originate the cluster link from the source cluster.
This means the days of trying to achieve good performance with a Privatelink or firewall rule are over.
Cluster Linking detects new topics and partitions, automatically syncs topic configurations between clusters, and manages your downstream topic ACLs.
From a monitoring perspective, cluster linking provides the number of Cluster Links, mirror topics, throughput, and mirror lag as metrics.
Last, but not least, Cluster Linking also pairs well with Schema Linking keeping all your data and schemas in sync across clusters.
Let's take a look at how Cluster Linking makes it easy to create a bridge from Confluent Platform clusters in an on-premise environment to Confluent Cloud clusters in cloud environments.
There are two steps involved in cluster linking. First, a link is defined between a source and a destination cluster.
Then, a mirror topic will be created on the destination cluster and associated with the cluster link.
In one command or API call, you can create a cluster link from one cluster to another.
A cluster link acts as a persistent bridge between the two clusters.
To mirror data across the cluster link, you create mirror topics on your destination cluster.
You can also create a cluster link from the Confluent Cloud UI, REST API, Confluent for Kubernetes, and Terraform.
Mirror topics are a special kind of topic: they are read-only copies of their source topic.
Any messages produced to the source topic are mirrored “byte-for-byte,” meaning that the same messages go to the same partition and the same offset on the mirror topic.
Mirror topics can be consumed just like any other topic.
You can create a secure, seamless hybrid data bridge between your on-premises Confluent Platform cluster and your Confluent Cloud cluster using Cluster Linking.
Cluster linking is by far the easiest architecture to configure, scale, and maintain.
While we covered the replication patterns in an earlier module, I wanted to show some advanced replication patterns and how Cluster Linking handles each.
The first one is chaining. Chaining is when you have a mirror topic acting as a source.
This allows you to “chain” multiple Cluster Links and mirror topics together.
In the example on the screen, we see that we have the topic ‘credit_scores’ that is being mirrored to a centralized cluster which in turn has a cluster link to an additional cluster.
Spoke-and-hub allows you to mirror your topics to a centralized “hub” and then create additional cluster links out to other clusters, in this case, additional locations.
I recommend checking out the official documentation page for cluster linking for information, use cases, and additional tutorials on cluster linking.
Cluster linking is by far the easiest architecture to configure, scale, and maintain.
We’ll take a look at Cluster Linking in a hands-on later in this course.
I also recommend you take a look at the other hands-on sections and compare each option.
You’ll also want to check out the documentation and many blog posts detailing key use cases.
If you aren't already on Confluent Developer head there now using the link in the video description to access the rest of this course, the hands-on exercises, and additional resources.