Running Kafka Connect

5 min

Danica Fine

Senior Developer Advocate (Presenter)

Connectors

connectors

When running Kafka Connect, instances of connector plugins provide the integration between external data systems and the Kafka Connect framework. These connector plugins are reusable components that define how source connectors ought to capture data from data sources to a Kafka topic and also how sink connectors should copy data from Kafka topics to be recognized by a target system. By taking care of all of this boilerplate logic for you, the plugins allow you to hit the ground running with Kafka Connect and focus on your data.

There are hundreds of connector plugins available for a variety of data sources and sinks. There are dozens of fully managed connectors available for you to run entirely through Confluent Cloud. Plus, connectors can also be downloaded from Confluent Hub for use with self-managed Kafka Connect.

Let’s dive a little bit more into the fully managed and self-managed connectors and what those mean to you.

Confluent Cloud Managed Connectors

cloud-managed-connectors

Confluent Cloud offers pre-built, fully managed, Apache Kafka connectors that make it easy to instantly connect to popular data sources and sinks. With a simple UI-based configuration and elastic scaling with no infrastructure to manage, Confluent Cloud connectors make moving data in and out of Kafka an effortless task, giving you more time to focus on application development.

To start, you simply select the connector and fill in a few configuration details about your source or target system. This can be done using the Confluent Cloud console, the Confluent CLI, or the Confluent Connect API.

From there, Confluent takes care of the rest on your behalf:

Using the configuration settings you specified, your connector instance is provisioned and run
The execution of the connector instance is monitored
Should the connector fail, you’ll have access to troubleshooting to help identify the root cause, correct the issue, and restart the connector and its tasks

All in all, you can relax knowing that all of these tasks are being handled for you.

That said, there are a few limitations regarding managed connectors:

Some self-managed connectors that are available on Confluent Hub for installation in self-managed Kafka Connect clusters are not yet available in Confluent Cloud
Some fully managed Confluent Cloud connectors are not available for all cloud providers
Some configuration settings available for self-managed connectors may not be available for Confluent managed connectors
Some single message transformations (SMTs) that are available for use in self-managed Kafka Connect clusters are not available in Confluent Cloud

Be sure to keep those things in mind as you choose which connector options are best for you.

Self-Managed Kafka Connect

self-managed-kafka-connectors

So long as you have access to a Kafka cluster, Kafka Connect can also be run as a self-managed Kafka Connect cluster, but as you can see from the diagram, there is a lot more involved with doing so:

Self-managed Kafka Connect consists of one or more Connect clusters depending upon the requirement
Each cluster consists of one or more Connect worker machines on which the individual connector instances run

Regardless of how you choose to run Kafka Connect, it’s helpful to understand the individual Kafka Connect components and how they work together.

Kafka Connect Workers

Ultimately, Kafka Connect workers are just JVM processes that you can deploy on bare metal or containers.

A few options present themselves:

You’re free to run a bare-metal, on-premises install of Confluent Platform
For those leveraging infrastructure as a service, you may install Confluent Platform on those resources
Terraform is an option on a couple cloud providers
And of course, there’s Docker which you can use for both on-prem and cloud-based installations

Managing a Kafka Connect Cluster

Once your Kafka Connect cluster is up and running, there’s a bit of management that needs to be done:

Connect workers have a number of default configuration settings that you may need to alter
Depending on the needs of your systems, you might need to scale the Connect cluster up or down to suit demand changes
And of course, you’ll be monitoring for problems and fixing those that occur

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo code 101CONNECT & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Running Kafka Connect

Hi, Danica Fine here. Let's learn how to run Kafka Connect. When running Kafka Connect, instances of connector plugins provide the integration between external systems and the Kafka Connect framework. Connector Plugins These connector plugins are reusable components that define how source connectors ought to capture data from the data sources to a Kafka topic. And also how sink connectors should copy data from Kafka topics to be recognized by a target system. By taking care of all of this boiler plate logic for you, the plugins allow you to hit the ground running with Kafka Connect, and really focus on your data. There are hundreds of connector plugins available for a variety of data sources and sinks. There are dozens of fully managed connectors available for you to run entirely through Confluent Cloud. Plus, connectors can also be downloaded from Confluent Hub to use with self-managed Kafka Connect. Let's dive a little bit more into the fully managed, and self-managed connectors, and what those mean to you. Fully Managed Connectors Confluent Cloud offers pre-built, fully managed Apache Kafka Connectors that make it easy to instantly connect to popular data sources and sinks. With a simple UI-based configuration, and elastic scaling with no infrastructure to manage, Confluent Cloud Connectors make moving data in and out of Kafka an effortless task, giving you more time to focus on application development. To start, you simply select the connector, and fill in a few configuration details about your source or target system. This can be done using the Confluent Cloud Console, the Confluent CLI, or the Confluent Connect API. From there Confluent takes care of the rest on your behalf. Using the configuration settings you specified your connector instance is provisioned and run. The execution of the connector instance is then monitored. And in the event that the connector fails you'll have access to troubleshooting to help identify the root cause, correct the issue, and then restart the connector and its tasks. All in all, you can relax knowing that all of these tasks are being handled for you. That said there are a few limitations regarding connectors. Some connectors that are available for installation in self-managed Kafka Connect clusters, are not yet available in Confluent Cloud. Some fully managed Confluent Cloud Connectors are not available for all cloud providers. Some configuration settings available for self-managed connectors may not be available for Confluent managed connectors. Some single message transformations, or SMTs, that are available for use, and self-managed Kafka Connect clusters are not available in Confluent Cloud. Be sure to keep those things in mind as you choose which connector options are best for you. So long as you have access to a Kafka cluster, Kafka Connect can also be run as a self-managed Kafka Connect cluster, Self Managed Connectors but as you see from the diagram, there's a lot more involved with doing so. Self-managed Kafka Connect consists of one or more Connect clusters depending upon your requirements. Each cluster consists of one or more Connect worker machines on which the individual Connector instances then run. Regardless of how you choose to run Kafka Connect, it's helpful to understand the individual Kafka Connect components, and how they all work together. Ultimately, Kafka Connect workers are just JVM processes. You can deploy on bare metal or containers. A few options present themselves. You're free to run a bare-metal on-premises install of Confluent Platform. For those leveraging Infrastructure as a Service, you may install Confluent Platform on those resources. Terraform is an option on a couple of cloud providers, and, of course, there's Docker, which you can use for both on-prem, and cloud-based installations. Once your Kafka Connect cluster is up and running, there's a bit of management that needs to be done though. Connect workers have a number of default configuration settings that you might need to alter, depending on what your use case is. Also depending on the needs of your systems you might need to scale the Connect cluster up or down to suit demand changes. And, of course, you'll be monitoring for problems, and how to fix them. Now, of course, there's a lot more to learn as you dive into Kafka Connect and practice, but between the self-managed and fully managed options you should now have everything you need to decide how to run Kafka Connect.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 15