Course: Kafka Connect 101

Error Handling and Dead Letter Queues

7 min
Tim BerglundSr. Director, Developer Advocacy (Course Presenter)
Robin MoffattStaff Developer Advocate (Course Author)

Error Handling in Kafka Connect

Kafka Connect supports several error-handling patterns, including fail fast, silently ignore, and dead letter queues.

error-handling-kafka-connect

The dead letter queue in Kafka Connect is another Kafka topic, which means that it is easy to examine and reprocess messages as needed.

If you’ve used Connect before, or even just Kafka with Avro a few times, you’ve probably seen the classic Unknown magic byte! serialization exception.

org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!

In this situation, the Avro deserializer is trying to process a message from a Kafka topic, and the message is either not Avro, or it is Avro, but it wasn’t created with the Confluent Schema Registry serializer. Either way, the message doesn’t match the expected wire format, thus the reference to “magic bytes.” The error is quite specific, but basically it means that the data is in a format other than that which the Avro deserializer expects.

Serialization Challenges

As you set up and try various options with Connect, it is easy to accidentally mix formats in a topic. So “Unknown magic byte!” could be triggering because you have some early messages in a topic that aren’t Avro—even though the later ones are Avro. Or it could be that you’re simply using the wrong converter. For example, if you have JSON data in the source topic, then you need the JSON converter and not the Avro converter.

kafka-connect-serialization-challenges

But it could also be that interspersed among your Avro messages are non-Avro messages. As mentioned, this can easily happen as you experiment with integration. However, this could also be a result of your having two producers writing to the same topic, but one is producing Avro and the other JSON.

kafka-connect-serialization-challenges 2

You can address both of these scenarios with configuration for error tolerances and dead letter queues.

Error Tolerances

Fail Fast (Default)

By default, if Connect receives a serialization error like the one discussed above, it is going to stop and you will have to deal with the issue and then restart it. This is safe behavior because if there’s an invalid message, Connect won’t process it.

source-topic-messages-kafka-connect

YOLO ¯\_(ツ)_/¯

The connector configuration option errors.tolerance=all means that the Connect process just ignores any messages that it cannot deserialize. It’s silent, though, so you won’t be aware of errors that happen. This may be desirable in some scenarios, but in others, it would be a problem because you won’t be aware of an issue with your data. A better option usually is to designate a dead letter queue to receive the messages that were unable to be processed.

Dead Letter Queue

A dead letter queue in Kafka is just another Kafka topic to which messages can be routed by Kafka Connect if they fail processing in some way. The term is employed for its familiarity; a dead letter queue as traditionally conceived is part of an enterprise messaging system and is a place where messages are sent based on some routing logic that classifies them as having nowhere to go, and as potentially needing to be processed at a later time.

In Kafka Connect, the dead letter queue isn’t configured automatically as not every connector needs one. Kafka Connect’s dead letter queue is where failed messages are sent, instead of silently dropping them. Once the messages are there, you can inspect their headers, which will contain reasons for their rejection, and you can also look at their keys and values.

kakfa-connect-dead-letter-queues

Reprocessing the Dead Letter Queue

The dilemma whereby an Avro and a JSON producer are sending to the same topic has a solution in the dead letter queue. Basically you set the dead letter queue to receive the erroring messages, then reprocess them from the dead letter queue with the appropriate converter, and send them on to the sink. So, for example, if Avro messages are correctly proceeding to the sink, but JSON messages are erroring into the dead letter queue, you could add another connector with a JSON converter to process them out of the dead letter queue and send them on to the sink. This would allow you to complete the processing of the source topic, which is not possible with a single connector.

source-topic-messages-dead-letter-queue-kafka-connect

Manual Processing

The dead letter queue reprocessing solution is easy to work out computationally, and it’s simple to configure and get running. However, if the option doesn’t exist for some reason or messages are failing for reasons that are hard to identify, you will have to try something else. You may be able to develop a consumer to convert the messages if you know how to deal with a particular failure mode, but often in these situations, you will end up having to manually handle the erroring messages—so you should have a process for a person to review them.

This is ultimately why dead letter queues aren’t a default in Kafka Connect, because you need a way of dealing with the dead letter queue messages, otherwise you are just producing them somewhere for no reason. In effect, you should make sure you have thought through how you will process the results of a dead letter queue before you set one up.

Use the promo code CONNECT101 to receive $101 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.