Get Started Free
‹ Back to courses
course: Hybrid and Multicloud Architecture with Apache Kafka

Confluent Replicator for Hybrid Clouds

4 min
dan-weston

Dan Weston

Senior Curriculum Developer

Confluent Replicator allows you to replicate topics from one Kafka cluster to another. This module walks you through the features of using Replicator for replication.

Use the promo code HYBRCLOUD101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Confluent Replicator for Hybrid Clouds

Confluent Replicator allows you to replicate topics from one Kafka cluster to another.

In addition to copying the messages, Replicator will create topics as needed preserving the topic configuration in the source cluster.

This includes preserving the number of partitions, the replication factor, and any configuration overrides specified for individual topics.

Replicator uses the built-in Kafka connect framework to function. Replicator can be used for replication of topic data.

Confluent Replicator supports the following features:

Topic selection using whitelists, blacklists, and regular expressions.

Dynamic topic creation in the destination cluster with matching partition counts, replication factors, and topic configuration overrides Automatic resizing of topics when new partitions are added in the source cluster.

Automatic reconfiguration of topics when topic configuration changes in the source cluster.

Timestamp Preservation:

At least once delivery, meaning Replicator guarantees that each record will be delivered to the destination topic, but there might be duplicates on the destination topic.

Replicator natively exposes JMX metrics for monitoring including replication lag and throughput Any changes you need to make to Replicator can be done via the RESTful API.

Replicator connects to the origin cluster and replicates the data by way of the connect framework.

While it can be deployed either near the destination cluster or near the origin, we recommend deploying near the destination cluster when possible.

For cloud instances, like with Confluent Cloud, we recommend that replicator is deployed in the same region as your destination cluster.

The only time you’d need to deploy near the origin is if the origin cluster does not support external connections.

Similar to what we saw with MirrorMaker 2, each source cluster requires a separate instance of Replicator with each instance being self-managed.

With only a couple of instances, this isn’t an issue, however, once you start to scale your clusters it quickly becomes difficult to manage each one.

There are a couple of things you should keep in mind before selecting Replicator:

Replicator only supports consumer offset sync if you are using a Java client.

Our suggestion is to use Cluster Linking, or start consumers from the latest, earliest, or rewind them to a known good point in the log.

And replicator does not support the syncing of access control lists.

Confluent Replicator has been around for the longest amount of time and therefore has the most robust documentation out of all three options.

If you are considering using Replicator to replicate your data I highly recommend checking out the numerous blog posts, documentation, and tutorials.

In a lot of ways, MirrorMaker 2 and Confluent Replicator are very similar.

One of the main differences is that by going with Replicator you receive support from Confluent as well as the numerous publications helping you to manage your architecture.

While we haven’t included a hands-on module for Confluent Replicator in this course, be sure to check out the Replicator documentation where you'll find multiple hands-on examples for configuring and maintaining Confluent Replicator.

In the end, Replicator is a battle-tested solution for replicating data which requires some engineering effort to dot the i's and cross the t's.

However, as we saw earlier in the course, Cluster Linking is by far the most robust method for most individuals looking to replicate their data, primarily in how easy it is to set up, monitor, and scale.

Next, we’ll take a look at some of the primary use cases for replication and how each solution handles them.

If you aren't already on Confluent Developer head there now using the link in the video description to access the rest of this course, the hands-on exercises, and additional resources.