Event Processing Applications may encounter invalid data as they operate over the infinite stream of events. Errors may include invalid data formats, nonsensical, missing or corrupt values, technical failures, or other unexpected scenarios.
How can an event processing application handle processing failures without terminating, or becoming stuck, when a message cannot be read?
When the event processing application cannot process an event for an unrecoverable reason, the problematic event is published to a “dead letter stream”. This stream stores the event allowing it to be logged, reprocessed later, or otherwise acted upon. Additional contextual information can be provided in the "dead letter event" to ease fault resolution later, such as details on why its processing failed.
Java Basic Kafka Consumer
while (keepConsuming) {
try {
final ConsumerRecords<K, V> records = consumer.poll(Duration.ofSeconds(1));
try {
eventProcessor.process(records);
} catch (Exception ex) {
deadEventReporter.report(/*Error Details*/);
}
}
catch (SerializationException se) {
deadEventReporter.report(/*Error Details*/);
}
}
Python Basic Kafka Consumer
while True:
try:
event = consumer.poll(1000)
except SerializerError as e:
deadEventReporter.report(e)
break
if msg.error():
deadEventReporter.report(msg.error())
continue
if msg is None:
continue
eventProcessor.process(msg)
What should real-world applications do with the events in the dead letter stream? Reprocessing events automatically will often lead to reorderings and hence the potential for corruption to downstream systems if the stream contains events that represent changing states of the same underlying entity, such as orders being booked, processed, or shipped. Manual reprocessing can be useful, but is often viewed more as an error log in many real-world implementations.