Senior Developer Advocate (Presenter)
You can run a Kafka Connect worker directly as a JVM process on a virtual machine or bare metal, but you might prefer the convenience of running it in a container, using a technology like Kubernetes or Docker. Note that containerized Connect via Docker will be used for many of the examples in this series.
Confluent maintains its own image for Kafka Connect, cp-kafka-connect-base, which provides a basic Connect worker to which you can add your desired JAR files for sink and source connectors, single message transforms, and converters.
You can use Confluent Hub to add your desired JARs, either by installing them at runtime or by creating a new Docker image. Of course, there are pros and cons to either of these options, and you should choose based on your individual needs.
Adding your dependencies at runtime means that you don’t have to create a new image, but it does increase installation time each time your container is run, and it also requires an internet connection. It’s a good option for prototyping work but probably not for a production deployment.
Your JARs should be in a location that causes them to be class loadable by the Connect process, and you’ll need to add an environmental variable that identifies their location (note that in production you will likely have many more environmental variables than just this one). Also make sure to specify the correct version of the Connect base image. Finally, you should add a command that overrides the base image’s default command so that you can call the Confluent Hub utility, which will install the connectors specified (in this case, the Neo4j connector).
kafka-connect:
image: confluentinc/cp-kafka-connect:7.1.0-1-ubi8
environment:
CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components
command:
- bash
- -c
- |
confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2
/etc/confluent/docker/run
The second way to add dependencies, and the option probably most often used in production deployments, is to build a new image.
Make sure to use the correct Confluent base image version and also check the specific documentation for each of your connectors.
FROM confluentinc/cp-kafka-connect:7.1.0-1-ubi8
ENV CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:2.0.2
Typically, you will add connector instances once the worker process is running by manually submitting the configuration or via an external automation. However, you may find—perhaps for demo purposes—that you want a self-sufficient container that also adds the connector instance when it starts. To do this, you can use a launch script that looks like this:
# Launch Kafka Connect
/etc/confluent/docker/run &
#
# Wait for Kafka Connect listener
echo "Waiting for Kafka Connect to start listening on localhost ⏳"
while : ; do
curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)
echo -e $$(date) " Kafka Connect listener HTTP state: " $$curl_status " (waiting for 200)"
if [ $$curl_status -eq 200 ] ; then
break
fi
sleep 5
done
echo -e "\n--\n+> Creating Data Generator source"
curl -s -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/source-datagen-01/config \
-d '{
"connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"kafka.topic": "ratings",
"max.interval":750,
"quickstart": "ratings",
"tasks.max": 1
}'
sleep infinity
Hi, Danica Fine here. Kafka Connect runs as a JVM process so there are ton of options for how and where we can run it but let's see how to run Kafka Connect in Docker. You can run a Kafka Connect worker directly as a JVM process on a virtual machine or bare metal but you might prefer the convenience of running it in a container using a technology like Kubernetes or Docker. Note that containerized Connect via Docker will be used for many of these examples in this series. Confluent maintains its own image for Kafka Connect, cp-kafka-connect, which provides a basic connect worker to which you can add your desired JAR files for sink and source connectors, single message transforms and converters. You can use Confluent Hub to add your desired JARs, either by installing them at runtime or by creating a new Docker image. Of course, there are pros and cons to either of these options and you should choose based on your individual needs. Adding your dependencies at runtime means that you don't have to create a new image but it does increase installation time each time your container is run. And it also requires an internet connection. It's a good option for prototyping work but probably not for a production deployment. Your JARs should be in a location that causes them to be class loadable by the Connect process. And you'll need to add an environmental variable that identifies their location and note that in production, you'll likely have many more environmental variables than just this one. Also, make sure to specify the correct version of the Connect base image that you need. And finally, you should add a command that overrides the base image's default command so that you can call the Confluent Hub utility, which will then install the connectors that you specified. In this case, we're using the Neo4j connector. The second way to add dependencies and the option probably most often used in production deployments is to build a new image. Make sure to use the correct Confluent base image version and also check the specific documentation for each of your connectors. Typically, you'll add connector instances once the worker process is running by manually submitting the configuration or via an external automation. However, you might find, perhaps for demo purposes, that you want a self-sufficient container that also adds the connector instance when it starts. To do this, you can use a script that launches Connect, waits for the Connect listener to respond and adds the connector instance using the Connect REST API. And that's how you run Kafka Connect on Docker.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.