Get Started Free
course: Kafka Connect 101

Troubleshooting Self-Managed Kafka Connect

4 min
Danica Fine

Danica Fine

Senior Developer Advocate (Presenter)

Troubleshooting Self-Managed Kafka Connect

troubleshooting-self-managed-connect

Given that Kafka Connect is a data integration framework, troubleshooting is just a necessary part of using it. This has nothing to do with Connect being finicky (on the contrary, it’s very stable). Rather, there are keys and secrets, hostnames, and table names to get right. Then there are the external systems that you are integrating with, each of which needs to be visible and accessible by Connect, and each of which has its own security model. Basically, if you’ve done any integration work in the past, the situation is familiar.

Troubleshooting Scenario

connect-troubleshooting-scenario

Your Connect worker is running, your source connector is running—but no data is being ingested.

Because connectors consist of tasks, one of the first things you should consider is that one or more of its tasks has failed, independently of the connector. To verify this, you’ll need to gather more information from the Connect API.

Getting Connector and Task Status

getting-connector-and-task-status

You might start by using curl to get the status of the connector instance and its tasks.

Here we see what was represented in the previous diagram. The connector instance status is RUNNING while the task status is FAILED.

But we need more details about this in order to identify what caused the failure.

Getting Task Status

connect-getting-task-status

You can also use curl to request the stack trace for the task and pipe the results through jq. The illustrated command requests the stack trace for the first element in the tasks array.

Next, read through the trace and look for clues. In this instance, upon reviewing, you notice that there is a Connect exception and also that a driver is missing.

Kafka Connect Log4j Logging

connect-log4j-logging

In addition to the stack trace, you should read the log. There are different ways to access the log, depending on how you are running Connect:

  • If you are just running the Confluent CLI locally, the command is confluent local services connect log
  • If you are using Docker, it’s docker logs, plus the name of the container
  • If you are running completely vanilla Connect using Apache Kafka, you can just read the log files with cat, or more likely tail (the location varies by installation)

Connector contexts were added to logging in Apache Kafka 2.3 with KIP-449, and they make the diagnostic process a lot easier.

Identify the Problem Cause

connect-identify-problem-cause

“Task is being killed and will not recover until manually restarted”

This is a general error, a symptom of the problem rather than the cause, and it doesn’t reveal any information about the underlying problem. When you see this, this is a sign that you need to search further in the stack trace or the connect worker log for your problem. For example, you can see this error in the connect worker log, and it’s a sign that you should look further in the log to the exceptions in order to identify your problem.

At this point, you are only at the beginning of troubleshooting the problem but at least you know where to look. With a bit of research, you would find the documentation for the JDBC connector indicates the MySQL JDBC driver needs to be installed on the connect worker machine when a MySQL database is part of the pipeline.

If your research doesn’t bear fruit, you might consider posting your problem to the Confluent Community Forum, but just keep in mind that a useful post will elaborate upon the “Task is being killed” error alone.

Dynamic Log Configuration

connect-dynamic-log-configuration

Dynamic log configuration arrived in Apache Kafka 2.4. It means you can change the level of logging detail without having to restart the worker.

For example, perhaps there is a particular connector such as io.debezium above that you’d like to log at TRACE level to try and troubleshoot. If you set everything to TRACE, it would be overwhelming. Using dynamic log configuration, you can conveniently do so at runtime via REST without restarting Connect, and targeting the specific logger of interest.

Use the promo code 101CONNECT to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.