Running Kafka Connect in Docker

3 min

Danica Fine

Senior Developer Advocate (Presenter)

Kafka Connect Images on Docker Hub

images-docker-hub

You can run a Kafka Connect worker directly as a JVM process on a virtual machine or bare metal, but you might prefer the convenience of running it in a container, using a technology like Kubernetes or Docker. Note that containerized Connect via Docker will be used for many of the examples in this series.

Confluent maintains its own image for Kafka Connect, cp-kafka-connect-base, which provides a basic Connect worker to which you can add your desired JAR files for sink and source connectors, single message transforms, and converters.

Adding Connectors to a Container

adding-connectors-container

You can use Confluent Hub to add your desired JARs, either by installing them at runtime or by creating a new Docker image. Of course, there are pros and cons to either of these options, and you should choose based on your individual needs.

Add a Connector Instance with ksqlDB

Adding your dependencies at runtime means that you don’t have to create a new image, but it does increase installation time each time your container is run, and it also requires an internet connection. It’s a good option for prototyping work but probably not for a production deployment.

Your JARs should be in a location that causes them to be class loadable by the Connect process, and you’ll need to add an environmental variable that identifies their location (note that in production you will likely have many more environmental variables than just this one). Also make sure to specify the correct version of the Connect base image. Finally, you should add a command that overrides the base image’s default command so that you can call the Confluent Hub utility, which will install the connectors specified (in this case, the Neo4j connector).

kafka-connect:
  image: confluentinc/cp-kafka-connect:7.1.0-1-ubi8
  environment:
    CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components

  command:
    - bash
    - -c
    - |
      confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2
      /etc/confluent/docker/run

Build a New Container Image

The second way to add dependencies, and the option probably most often used in production deployments, is to build a new image.

Make sure to use the correct Confluent base image version and also check the specific documentation for each of your connectors.

FROM confluentinc/cp-kafka-connect:7.1.0-1-ubi8

ENV CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"

RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2

Add Connector Instance at Container Launch

Typically, you will add connector instances once the worker process is running by manually submitting the configuration or via an external automation. However, you may find—perhaps for demo purposes—that you want a self-sufficient container that also adds the connector instance when it starts. To do this, you can use a launch script that looks like this:

# Launch Kafka Connect
/etc/confluent/docker/run &
#
# Wait for Kafka Connect listener
echo "Waiting for Kafka Connect to start listening on localhost ⏳"
while : ; do
  curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)
  echo -e $$(date) " Kafka Connect listener HTTP state: " $$curl_status " (waiting for 200)"
  if [ $$curl_status -eq 200 ] ; then
    break
  fi
  sleep 5 
done

echo -e "\n--\n+> Creating Data Generator source"
curl -s -X PUT -H  "Content-Type:application/json" http://localhost:8083/connectors/source-datagen-01/config \
    -d '{
    "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "kafka.topic": "ratings",
    "max.interval":750,
    "quickstart": "ratings",
    "tasks.max": 1
}'
sleep infinity

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo code 101CONNECT & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Running Kafka Connect in Docker

Hi, Danica Fine here. Kafka Connect runs as a JVM process so there are ton of options for how and where we can run it but let's see how to run Kafka Connect in Docker. You can run a Kafka Connect worker directly as a JVM process on a virtual machine or bare metal but you might prefer the convenience of running it in a container using a technology like Kubernetes or Docker. Note that containerized Connect via Docker will be used for many of these examples in this series. Confluent maintains its own image for Kafka Connect, cp-kafka-connect, which provides a basic connect worker to which you can add your desired JAR files for sink and source connectors, single message transforms and converters. You can use Confluent Hub to add your desired JARs, either by installing them at runtime or by creating a new Docker image. Of course, there are pros and cons to either of these options and you should choose based on your individual needs. Adding your dependencies at runtime means that you don't have to create a new image but it does increase installation time each time your container is run. And it also requires an internet connection. It's a good option for prototyping work but probably not for a production deployment. Your JARs should be in a location that causes them to be class loadable by the Connect process. And you'll need to add an environmental variable that identifies their location and note that in production, you'll likely have many more environmental variables than just this one. Also, make sure to specify the correct version of the Connect base image that you need. And finally, you should add a command that overrides the base image's default command so that you can call the Confluent Hub utility, which will then install the connectors that you specified. In this case, we're using the Neo4j connector. The second way to add dependencies and the option probably most often used in production deployments is to build a new image. Make sure to use the correct Confluent base image version and also check the specific documentation for each of your connectors. Typically, you'll add connector instances once the worker process is running by manually submitting the configuration or via an external automation. However, you might find, perhaps for demo purposes, that you want a self-sufficient container that also adds the connector instance when it starts. To do this, you can use a script that launches Connect, waits for the Connect listener to respond and adds the connector instance using the Connect REST API. And that's how you run Kafka Connect on Docker.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 15