Senior Developer Advocate (Presenter)
Given that Kafka Connect is a data integration framework, troubleshooting is just a necessary part of using it. This has nothing to do with Connect being finicky (on the contrary, it’s very stable). Rather, there are keys and secrets, hostnames, and table names to get right. Then there are the external systems that you are integrating with, each of which needs to be visible and accessible by Connect, and each of which has its own security model. Basically, if you’ve done any integration work in the past, the situation is familiar.
Your Connect worker is running, your source connector is running—but no data is being ingested.
Because connectors consist of tasks, one of the first things you should consider is that one or more of its tasks has failed, independently of the connector. To verify this, you’ll need to gather more information from the Connect API.
You might start by using curl to get the status of the connector instance and its tasks.
Here we see what was represented in the previous diagram. The connector instance status is RUNNING while the task status is FAILED.
But we need more details about this in order to identify what caused the failure.
You can also use curl to request the stack trace for the task and pipe the results through jq. The illustrated command requests the stack trace for the first element in the tasks array.
Next, read through the trace and look for clues. In this instance, upon reviewing, you notice that there is a Connect exception and also that a driver is missing.
In addition to the stack trace, you should read the log. There are different ways to access the log, depending on how you are running Connect:
Connector contexts were added to logging in Apache Kafka 2.3 with KIP-449, and they make the diagnostic process a lot easier.
“Task is being killed and will not recover until manually restarted”
This is a general error, a symptom of the problem rather than the cause, and it doesn’t reveal any information about the underlying problem. When you see this, this is a sign that you need to search further in the stack trace or the connect worker log for your problem. For example, you can see this error in the connect worker log, and it’s a sign that you should look further in the log to the exceptions in order to identify your problem.
At this point, you are only at the beginning of troubleshooting the problem but at least you know where to look. With a bit of research, you would find the documentation for the JDBC connector indicates the MySQL JDBC driver needs to be installed on the connect worker machine when a MySQL database is part of the pipeline.
If your research doesn’t bear fruit, you might consider posting your problem to the Confluent Community Forum, but just keep in mind that a useful post will elaborate upon the “Task is being killed” error alone.
Dynamic log configuration arrived in Apache Kafka 2.4. It means you can change the level of logging detail without having to restart the worker.
For example, perhaps there is a particular connector such as io.debezium above that you’d like to log at TRACE level to try and troubleshoot. If you set everything to TRACE, it would be overwhelming. Using dynamic log configuration, you can conveniently do so at runtime via REST without restarting Connect, and targeting the specific logger of interest.
Hi again. Danica Fine here. If you're using Kafka Connect, you may know that quite a bit goes on under the hood. Even when you're running your own Kafka Connect clusters, it can be tough to debug. This time, let's dive into Kafka Connect troubleshooting and see how to debug self-managed connectors. Kafka Connect Overview Kafka Connect is a data integration framework built on a powerful, distributed system. Troubleshooting is a necessary part of using and understanding it. Think about it. At the surface level, there are configurations to set up and get right. Keys and secrets, host names and tables. Then there are the external systems that you're integrating with. Each of which needs to be visible to and accessible by Connect, and each of which has its own security model. Connecting External Systems Basically, if you've done any integration work in the past, this situation should sound familiar. Let's cover a common scenario. Your source connector is running, but no data is being ingested. Because connectors are comprised of tasks, one of the first things you should consider is checking to see if one or more of its tasks has failed independently of the connector. Checking Tasks To verify this, you'll need to gather more information from the Connect API. You might start by using curl to get the status of the connector instance and its tasks. Here we see what was represented in the previous diagram. The connector instance status is running, while the task status is failed. But why did it fail? We need more details about this in order to identify what caused the failure. You can also use curl to request the stack trace for the task and pipe the results through jq, a remarkably capable JSON formatter. The illustrated command requests the stack trace for the first element in the task's array. Next, read through the trace and look for clues. In this instance, upon reviewing, you might notice that there is a ConnectException and also that a driver is missing. That could be problematic. Logs Now this should come as a surprise to no one, but in addition to the stack trace, you should also read and make use of the logs. They're the source of truth after all. Depending on how you're running Connect, there are different ways to access the log. If you're running the Confluent CLI locally, the command is "confluent local services connect log." If you're using Docker, it's "docker logs" plus the name of the container. If you're running completely vanilla Connect using Apache Kafka, you can read the log files with "cat," or more likely "tail," and the location of the log will vary by your installation. Thankfully, connector contexts were added to logging in Apache Kafka 2.3 with KIP-449, and they make the diagnostic process a lot easier. Let's dive in. The first log line you're likely to latch onto is "Task is being killed and will not recover until manually restarted." This is a general error, a symptom of the problem rather than the cause, and it doesn't reveal any information about the underlying problem. When you see this, this is a sign that you need to search further in the stack trace or the Connect worker log for your problem. And at this point, you're only at the beginning of troubleshooting the problem, but at least you know where to look. With a bit of research, in this particular case, you would find the documentation for the JDBC connector and that it indicates the MySQL JDBC driver needs to be installed on the Connect worker machine when a MySQL database is part of the pipeline. Dynamic Log Configuration If your research doesn't bear fruit, you might consider posting your problem to the Confluent Community Forum. But just keep in mind that a useful post will include a bit more information than just the "task is being killed" phrase. To make debugging even easier, dynamic log configuration arrived in Apache Kafka 2.4. It means that you can change the level of logging detail without having to restart the worker. For example, perhaps there's a particular connector that you'd like to log at trace level to try to troubleshoot. If you set everything to trace, it would be overwhelming. Using dynamic log configuration, you can conveniently target the specific logger of interest at runtime via REST without restarting Connect. Knowing how to debug and diagnose a distributed system is half the battle. With these tools in hand, you'll be better equipped for your next Kafka Connect troubleshooting session.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.