Welcome back to another Kafka internals exercise. In this exercise, we’ll explore Cluster Linking and see how to mirror a topic from a source cluster to a destination cluster.
In order to follow along, you’ll need to complete a few simple steps to set up your environment.
Complete the following steps to set up the environment used for this exercise.
If you don’t already have a Confluent Cloud account, you can create one and receive $400 of free Confluent Cloud usage (this will cover the time to actively finish the exercises in the course, but make sure to delete your cluster if you are finished with it or aren't using it for a day or so).
Let’s start with creating the cluster that will be used during the exercise:
NOTE: If the Get started with Confluent Cloud tutorial appears in the right side of the console, click Leave tutorial.
Now we will create the cluster that will be the source part of the Cluster Link.
Now that we have our source cluster, let’s create the topic that will be mirrored from the source cluster to the destination cluster.
The next thing to do is write events to the topic using the Datagen connector.
Expand Data Integration and click Connectors.
Enter Datagen in the filter field.
Click the Datagen Source connector.
Configure the connector.
We only need to produce a small number of events for this exercise.
We now have data in
link-topic that we want to mirror to the destination cluster. We also need to create a client configuration file for the source cluster that will be needed by the
confluent kafka cluster mirror command during the exercise.
Since we are using a Java consumer, we need to click the corresponding Java client type.
Here we see the client connection configs which we can copy using the provided button. As you can see, one of the parameters needs to be updated so that it includes a valid cluster API key and secret value. We can create these using the option that is conveniently available on this page.
As you can now see, clicking this option opens a new window containing the new API key and secret.
After creating the API key and secret, notice the client configuration file has been updated to include these values. We can now copy this config file and use it to create it on our Java client machine.
We will now create the local client configuration file containing the Confluent Cloud connection settings.
The sample client configuration file includes properties that are needed if the client is using Schema Registry as well as a couple of other properties that are needed by earlier versions of the Java client.
We won’t be using these properties so we will delete them.
Next, we will create the cluster that will be the destination part of our cluster link.
NOTE: When the destination is a Confluent Cloud cluster, it must be the dedicated cluster type.
We will use the confluent CLI during the exercise so it needs to be installed on the machine you plan to run the exercise on. 33. In a browser, navigate to the Confluent CLI documentation and follow the installation instructions. https://docs.confluent.io/confluent-cli/current/install.html
This concludes the exercise setup.
In this exercise, we’ll be using the Confluent CLI to complete many of the steps. To get started with the CLI, we’ll first log into Confluent Cloud. Do this with the
--save flag in order to save some time later by storing your credentials and using them for subsequent commands.
confluent login --save
Enter email and password
These credentials will stay active until you run the logout command.
Alright, now let’s check out the details for our source and destination clusters. First, we’ll list out the environments available to us and use the environment ID to list out the clusters.
confluent environment list confluent kafka cluster list
Run the describe command to get more details for the source cluster that we made.
confluent kafka cluster describe <source-cluster ID>
It’s a basic cluster located in a Google Cloud Platform west region. Make note of the source cluster endpoint—we’ll need it in order to create the cluster link. Note that your provider and region may vary, based on selections made when creating your cluster.
Now let’s run the describe command on the destination cluster.
confluent kafka cluster describe <dest-cluster ID>
Note that it’s a dedicated cluster located in a Google Cloud Platform east region. As previously mentioned, if the cluster link destination cluster is located in Confluent Cloud, it must be a dedicated cluster.
Alright, so we now have the Confluent Cloud environment ID, as well as the cluster IDs, and the source cluster endpoint. For the linking step, we’ll also need the API key and secret from the source cluster—this can either be specified as part of the command or in a configuration file. Since we’ve already set up a configuration file with this key and secret in it, we’ll use that. If you didn’t already set up the configuration file, review the exercise setup steps before continuing.
We’ll do one more check to make sure this configuration file exists and contains all of the details we need.
Everything is there, so we’re ready to link the two clusters!
confluent kafka link create my-cluster-link \ --source-cluster-id <source-cluster ID> \ --source-bootstrap-server <source-cluster endpoint> \ --config-file source.config \ --environment <environment ID> \ --cluster <dest-cluster ID>
With that one command, the clusters are linked. It was that easy! We can now create a mirror topic in the destination cluster based on the topic that we want to link from the source cluster. Again this is a simple command.
confluent kafka mirror create link-topic \ --link my-cluster-link \ --environment <environment ID> \ --cluster <dest-cluster ID>
Now that the mirror topic has been created in the destination cluster, all data that is written to the topic in the source cluster will also be written at the same offset in the mirrored topic within the destination cluster.
Let’s use the Confluent Cloud console to verify that this is the case.
As we navigate to the overview page, notice that the source topic shows production and consumption metrics—this is expected. But when we check out the overview for the mirrored topic, it will appear slightly different.
Let’s find the most recent event written to partition 1 in the topic in the source cluster.
See here that the last event in partition 1 is assigned offset 127 and it has a
transaction.id of 436.
With Cluster Linking, the same event in the mirrored topic should be exactly duplicated, meaning that the partition and offset should be the same across the linked topics. This is the Cluster Linking guarantee.
When we check the overview page of the mirrored topic, we’ll see “mirroring” instead of “production.” This is because the mirrored topic is read-only. Events are only being written to it by the Cluster Linking mirror process.
Let’s check the most recent event written to partition 1 of this topic.
Again, you’ll see that the last event has offset 127 and
transaction.id 436. This is exactly what we saw in the source topic, meaning that we’ve confirmed the guarantee given by Confluent Cluster Linking.
When events are mirrored from a topic in a source cluster to a topic in the destination cluster, the result is an exact copy of the event. We’ll see the event written to the same partition in the mirror topic, and it will be assigned the same offset that it had in the source topic.
In this exercise, we saw just how simple it is to set up Cluster Linking.
If you run this exercise on your own, you should tear down the exercise environment by deleting the source and destination clusters which will prevent them from unnecessarily accruing cost and exhausting your promotional credit.
Let’s walk through that tear down process for this exercise environment.
Using the environment ID, we’ll list the clusters available to us.
In the terminal window, list the clusters in the environment and their IDs:
confluent kafka cluster list \ --environment <environment ID>
You should see the two clusters that we used for this exercise—both source and destination. Let’s delete these one by one.
Delete the source cluster:
confluent kafka cluster delete <source-cluster ID> \ --environment <environment ID>
Delete the destination cluster:
confluent kafka cluster delete <dest-cluster ID> \ --environment <environment ID>
As a final check, confirm that the clusters no longer exist.
Confirm the clusters no longer exist in the environment:
confluent kafka cluster list \ --environment <environment ID>
No clusters are listed so the environment tear down is complete.
Confluent Cluster Linking is an awesome feature that allows you to mirror topics from source clusters to linked destination clusters. This functionality opens doors for uses such as cross-region cluster replication, hybrid cluster situations, and cluster migration.
We hope you enjoyed following along in these Kafka internals modules and exercises. We’ve learned so much together, and we can’t wait to see what you build!
Disagree? If you believe that any of these rules do not necessarily support our goal of serving the Apache Kafka community, feel free to reach out to your direct community contact in the group or firstname.lastname@example.org
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.