course: Apache Kafka® Internal Architecture

Hands On: Cluster Linking

6 min

Danica Fine

Senior Developer Advocate (Presenter)

Hands On: Cluster Linking

Welcome back to another Kafka internals exercise. In this exercise, we’ll explore Cluster Linking and see how to mirror a topic from a source cluster to a destination cluster.

In order to follow along, you’ll need to complete a few simple steps to set up your environment.

Exercise Setup

Complete the following steps to set up the environment used for this exercise.

If you don’t already have a Confluent Cloud account, you can create one and receive $400 of free Confluent Cloud usage (this will cover the time to actively finish the exercises in the course, but make sure to delete your cluster if you are finished with it or aren't using it for a day or so).

Let’s start with creating the cluster that will be used during the exercise:

Open URL https://confluent.cloud and log in to the Confluent Cloud console.
Navigate to the default environment.

NOTE: If the Get started with Confluent Cloud tutorial appears in the right side of the console, click Leave tutorial.

Now we will create the cluster that will be the source part of the Cluster Link.

Create a basic cluster named source-cluster.

Now that we have our source cluster, let’s create the topic that will be mirrored from the source cluster to the destination cluster.

Click Topics.
Click Create topic.
Enter link-topic in the Topic name field.
Enter 3 in the Number of partitions field.
Click Create with defaults.

The next thing to do is write events to the topic using the Datagen connector.

Expand Data Integration and click Connectors.
Enter Datagen in the filter field.
Click the Datagen Source connector.
Configure the connector.
- Enter transactions in the Name field.
- Click Generate Kafka API key and secret.
- Assign description of Datagen Connector.
- Confirm the values are saved (no need to do so for this exercise) and click Continue.
- Select link-topic from Topic name dropdown list.
- Select JSON from the Output message format dropdown list.
- Select the TRANSACTIONS template.
- Enter 1 in the Tasks field.
- Click Next.
Click Launch.

We only need to produce a small number of events for this exercise.

On the Connectors page, click the transactions connector.
Refresh the browser occasionally until the Messages quantity shows a value of several hundred.
Once this is true, click the delete icon located in the upper right corner.
Enter transactions in the confirmation field and click Confirm.

We now have data in link-topic that we want to mirror to the destination cluster. We also need to create a client configuration file for the source cluster that will be needed by the confluent kafka cluster mirror command during the exercise.

Click Cluster overview.
Click Configure a client.

Since we are using a Java consumer, we need to click the corresponding Java client type.

Click Java

Here we see the client connection configs which we can copy using the provided button. As you can see, one of the parameters needs to be updated so that it includes a valid cluster API key and secret value. We can create these using the option that is conveniently available on this page.

Click Create Kafka cluster API key.

As you can now see, clicking this option opens a new window containing the new API key and secret.

Assign a name of source-cluster.
Click the Continue button.

After creating the API key and secret, notice the client configuration file has been updated to include these values. We can now copy this config file and use it to create it on our Java client machine.

Click the Copy button.
Minimize the Confluent Cloud console.

We will now create the local client configuration file containing the Confluent Cloud connection settings.

Open a terminal window.
Run command nano source.config.
Right-click the nano edit window and click Paste.

The sample client configuration file includes properties that are needed if the client is using Schema Registry as well as a couple of other properties that are needed by earlier versions of the Java client.

We won’t be using these properties so we will delete them.

Delete lines 6 thru 15.
Save and close source.config.

Next, we will create the cluster that will be the destination part of our cluster link.

Restore the Confluent Cloud console window.
Create a dedicated cluster named dest-cluster.

NOTE: When the destination is a Confluent Cloud cluster, it must be the dedicated cluster type.

We will use the confluent CLI during the exercise so it needs to be installed on the machine you plan to run the exercise on. 33. In a browser, navigate to the Confluent CLI documentation and follow the installation instructions. https://docs.confluent.io/confluent-cli/current/install.html

This concludes the exercise setup.

Exercise Steps

In this exercise, we’ll be using the Confluent CLI to complete many of the steps. To get started with the CLI, we’ll first log into Confluent Cloud. Do this with the --save flag in order to save some time later by storing your credentials and using them for subsequent commands.

Run command:
```
confluent login --save 
```
Enter email and password

These credentials will stay active until you run the logout command.

Alright, now let’s check out the details for our source and destination clusters. First, we’ll list out the environments available to us and use the environment ID to list out the clusters.

Run commands:

confluent environment list
confluent kafka cluster list

Run the describe command to get more details for the source cluster that we made.

Run command:
```
confluent kafka cluster describe <source-cluster ID>
```
It’s a basic cluster located in a Google Cloud Platform west region. Make note of the source cluster endpoint—we’ll need it in order to create the cluster link. Note that your provider and region may vary, based on selections made when creating your cluster.

Now let’s run the describe command on the destination cluster.

Run command:

confluent kafka cluster describe <dest-cluster ID>

Note that it’s a dedicated cluster located in a Google Cloud Platform east region. As previously mentioned, if the cluster link destination cluster is located in Confluent Cloud, it must be a dedicated cluster.

Alright, so we now have the Confluent Cloud environment ID, as well as the cluster IDs, and the source cluster endpoint. For the linking step, we’ll also need the API key and secret from the source cluster—this can either be specified as part of the command or in a configuration file. Since we’ve already set up a configuration file with this key and secret in it, we’ll use that. If you didn’t already set up the configuration file, review the exercise setup steps before continuing.

We’ll do one more check to make sure this configuration file exists and contains all of the details we need.

Run command:
```
cat source.config
```

Everything is there, so we’re ready to link the two clusters!

Run command:

confluent kafka link create my-cluster-link \
    --source-cluster-id <source-cluster ID> \
    --source-bootstrap-server <source-cluster endpoint> \
    --config-file source.config \
    --environment <environment ID> \
    --cluster <dest-cluster ID>

With that one command, the clusters are linked. It was that easy! We can now create a mirror topic in the destination cluster based on the topic that we want to link from the source cluster. Again this is a simple command.

Run command:

confluent kafka mirror create link-topic \
    --link my-cluster-link \
    --environment <environment ID> \
    --cluster <dest-cluster ID>

Now that the mirror topic has been created in the destination cluster, all data that is written to the topic in the source cluster will also be written at the same offset in the mirrored topic within the destination cluster.

Let’s use the Confluent Cloud console to verify that this is the case.

Open the Confluent Cloud console.
Navigate to the Cluster overview page for source-cluster.
Click Topics.
Click link-topic.

As we navigate to the overview page, notice that the source topic shows production and consumption metrics—this is expected. But when we check out the overview for the mirrored topic, it will appear slightly different.

Let’s find the most recent event written to partition 1 in the topic in the source cluster.

Click the Messages tab.
In the offset field, type -1 and when presented with the 3 partitions, select -1 / Partition: 1.

See here that the last event in partition 1 is assigned offset 127 and it has a transaction.id of 436.

With Cluster Linking, the same event in the mirrored topic should be exactly duplicated, meaning that the partition and offset should be the same across the linked topics. This is the Cluster Linking guarantee.

Open a second instance of Confluent Cloud console in another browser window.
Navigate to the Cluster overview page for dest-cluster.
Click Topics.
Click link-topic.

When we check the overview page of the mirrored topic, we’ll see “mirroring” instead of “production.” This is because the mirrored topic is read-only. Events are only being written to it by the Cluster Linking mirror process.

Let’s check the most recent event written to partition 1 of this topic.

Click the Messages tab.
In the offset field, type -1 and when presented with the 3 partitions, select -1 / Partition: 1.
Drag the console window so the event in both console windows is visible.

Again, you’ll see that the last event has offset 127 and transaction.id 436. This is exactly what we saw in the source topic, meaning that we’ve confirmed the guarantee given by Confluent Cluster Linking.

When events are mirrored from a topic in a source cluster to a topic in the destination cluster, the result is an exact copy of the event. We’ll see the event written to the same partition in the mirror topic, and it will be assigned the same offset that it had in the source topic.

In this exercise, we saw just how simple it is to set up Cluster Linking.

Note

If you run this exercise on your own, you should tear down the exercise environment by deleting the source and destination clusters which will prevent them from unnecessarily accruing cost and exhausting your promotional credit.

Let’s walk through that tear down process for this exercise environment.

Using the environment ID, we’ll list the clusters available to us.

In the terminal window, list the clusters in the environment and their IDs:
```
confluent kafka cluster list \
    --environment <environment ID> 
```

You should see the two clusters that we used for this exercise—both source and destination. Let’s delete these one by one.

Delete the source cluster:

confluent kafka cluster delete <source-cluster ID> \
    --environment <environment ID>

Delete the destination cluster:

confluent kafka cluster delete <dest-cluster ID> \
    --environment <environment ID>

As a final check, confirm that the clusters no longer exist.

Confirm the clusters no longer exist in the environment:

confluent kafka cluster list \
    --environment <environment ID>

No clusters are listed so the environment tear down is complete.

Confluent Cluster Linking is an awesome feature that allows you to mirror topics from source clusters to linked destination clusters. This functionality opens doors for uses such as cross-region cluster replication, hybrid cluster situations, and cluster migration.

We hope you enjoyed following along in these Kafka internals modules and exercises. We’ve learned so much together, and we can’t wait to see what you build!

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo code INTERNALS101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Hands On: Cluster Linking

Welcome back to another Kafka internals exercise. Today, we'll be exploring cluster linking and we'll see how to mirror a topic from a source cluster to a destination cluster. In order to follow along, there are a couple prerequisites that you need to go through just to make sure your environment is properly set up. The instructions are listed below this video in the writeup, so make sure you review that before we dive in. Prerequisites In this exercise, we'll be using the Confluence CLI to complete many of the steps. To get started with the CLI, we'll first log into Confluent Cloud. Do this with the save flag in order to save some time later by storing your credentials and using them for subsequent commands. These credentials will stay active until you run the logout command. Cluster Details All right, now let's check out the details of our source and destination clusters. First, we'll list out the environment available to us and use the environment ID to list out the clusters. Run the describe command to get more details for the source cluster that we made. It's a basic cluster located in a Google cloud platform, west region. Make note of the source cluster endpoint. We'll need it in order to create the cluster link. Now let's run the describe command on the destination cluster. Linking Note that it's a dedicated cluster located in a Google cloud platform, east region. As previously mentioned, if the cluster link destination cluster is located in Confluent Cloud, it must be a dedicated cluster. All right, so we now have the Confluent Cloud environment ID as well as the cluster IDs, and the source cluster endpoint. For the linking step, we'll also need the API Key and secret from the source cluster. This can either be specified as part of the command or in a configuration file. Since we've already set up a configuration file with the Key and secret in it, we'll use that. If you didn't already set up the configuration file, review the environment preparation steps in the writeup below before continuing. We'll do one more check to make sure this configuration file exists and contains all of the details that we need. Everything is there, so we're ready to link the two clusters. With that one command, the clusters are linked. It was that easy. We can now create a mirror topic in the destination cluster based on the topic that we want to link from the source cluster. Again, this is a simple command. Now that the mirror topic has been created in the destination cluster, all data that is written to the topic in the source cluster will also be written at the same offset in the mirrored topic within the destination cluster. Let's use the Confluent Cloud console to verify that this is the case. As we navigate to the Overview page, notice that the source topic shows production and consumption metrics. This is expected. But when we check out the overview for the mirrored topic, it will appear slightly different. Let's find the most recent event written to Partition 1 in the topic of the source cluster. See here that the last event in Partition 1 is assigned to offset 127 and it has a transaction ID of 436. With cluster linking, the same event in the mirrored topic should be exactly duplicated, meaning that the partition and the offset should be the same across the linked topics. This is the cluster linking guarantee. When we check the Overview page of the mirrored topic we'll Mirroring instead of Production. This is because the mirrored topic is read-only. Events are only being written to it by the cluster linking mirror process. Let's check the most recent event written to Partition 1 of this topic. Mirroring Events Again, you'll see that the last event has offset 127 and transaction ID 436. This is exactly what we saw in the source topic, meaning that we've confirmed the guarantee given by Confluent cluster linking. When events are mirrored from a topic in a source cluster to a topic in the destination cluster, the result is an exact copy of the event. We'll see the event written to the same partition in the mirrored topic and it will be assigned the same offset and transaction ID that it had in the source topic. In this exercise, we saw just how simple it is to set up cluster linking. If you are following along and running this exercise on your own, you should tear down the exercise environment by deleting the source and destination clusters which will prevent them from unnecessarily accruing cost and exhausting your promotional credit. Let's walk through that tear down process for this exercise environment. We'll list the clusters available to us. Conclusion You should see the two clusters that we used for this exercise, both source and destination. Let's delete these one by one. As a final check, confirm that the clusters no longer exist. No clusters are listed, so the environment tear down is complete. Confluent cluster linking is an awesome feature that allows you to mirror topics from source clusters to linked destination clusters. This functionality opens so many doors for use cases such as cross-region cluster replication, hybrid cluster situations, and cluster migration. I hope you enjoyed following along in the Kafka internals modules and exercises. We've learned so much together and I can't wait to see what you build.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 15