Staff Software Practice Lead
When we consume a message, we need to decide when to acknowledge that we are done with it. Depending on our choice, we can end up with either an at-most-once, at-least-once, or exactly-once (effectively-once) delivery guarantee. Although an exactly-once guarantee is technically impossible, Kafka includes transactional semantics that can achieve an effectively-once guarantee. This video will discuss each of these delivery guarantees and show how you can implement them. It will also highlight the dangers of operating with the default auto-commit settings in .NET.
Topics:
Hi, I'm Wade from Confluent. In this video, we'll take a look at the delivery guarantees provided by a Kafka consumer, and how we can implement transactions. The default behavior of a Kafka consumer is to automatically commit offsets at a given interval. The Java documentation suggests this grants an at-least-once delivery guarantee. In other words, the consumer will receive every message from a topic, but may see duplicates. However, the behavior in .NET is quite different. Let's take a look at a simple example. Here we see a basic consumer loop. When we call consume, the offset for the current message is stored in memory. A background thread periodically commits the most recently stored offset to Kafka, according to the AutoCommitInterval. If our application crashes before we commit the offsets, then we'll see the messages again once we recover. In this scenario, we see all of the messages, but we get duplicates. However, because the commit happens in a separate thread, it may occur before we process the message. In this case, if we crash while processing, the message is lost and won't be recovered. If we want to achieve at-least-once processing, we need to manually store the offsets. To do this, we start by disabling automatic offset storage. This prevents the consumer from storing offsets whenever we call consume. When the message is fully processed, we call store offset on the consumer to explicitly store the offset in memory. The auto-commit thread will still commit the offsets to Kafka according to the specified interval. In the event of a crash, the offset won't have been stored. When we recover, we'll see the message again, giving us the at-least-once guarantee we were looking for. Sometimes we want greater control over our commits. For example, if our messages are already organized into batches, we may not want to wait for the auto-commit. We can disable auto-commit by adjusting the configuration. Then we can manually call Commit on the consumer when we are ready. This gives us full control over when offsets are committed, but continues to grant an at-least-once guarantee. Just be wary, manual commits are a blocking operation and may impact the performance of the application. A less common pattern is to implement at-most-once delivery. In this case, we make a best effort to handle the message, but if we fail, we don't try again. Here, we don't have duplicates, but we may lose messages. To implement this, we commit our offsets immediately after consuming and before processing. This ensures that if the processing fails, we don't retry the message. However, it does mean that those messages are lost. The more complicated delivery guarantee is exactly-once. In this case, our consumer should receive every message once without any duplicates. This guarantee is technically impossible due to something known as the two general's problem. However, we can simulate exactly-once using a combination of at-least-once and de-duplication or idempotency. This is sometimes known as effectively-once. In order to achieve effectively-once processing, we need to enable some settings on our producer. Normally, when the producer experiences a write failure, it will retry the message. This can result in duplicates. However, if we enable idempotence, then a unique sequence number is included with each write. If the same sequence number is sent twice, then it will be de-duplicated. Idempotence requires specific values for other settings. MaxInFlight messages must be less than or equal to five. Retries must be greater than zero, and Acks must be set to all. These are all set correctly by default, so as long as we don't alter them, we should be fine. But this isn't sufficient to guarantee effectively-once delivery. We also need to ensure our application is written to support it. There are two approaches to this. The complicated route is to rely on an external offset store. In this case, we store the message offsets alongside the data we are writing rather than in Kafka. Those writes are done in an idempotent and atomic fashion, often using upsert. If we try to write data with the same offset twice, we either overwrite the first result or ignore the second one. This eliminates any duplication, giving us an effectively-once guarantee. However, this is a complicated solution because we are managing our offsets manually. When we start our consumer, we'll need to load the last offset and seek to the correct location. We also need to manage partitions and rebalances ourselves. Sometimes this is necessary, but we should try to avoid it if we can. If we are consuming data from Kafka and producing new records to Kafka with no other external storage, then we can rely on Kafka transactions. This requires us to identify the instance of our application with a TransactionId. We also need to initialize our producer with a timeout for the transactions in case they fail. When consuming a message, we start a transaction by calling BeginTransaction on the producer. We also pass the current offsets to the transaction so that it can manage them for us. Whenever we produce a message, it will be stored uncommitted, along with the offset. When we successfully complete the operation, we commit the transaction. This will commit both the messages and the offsets. If the operation fails for some reason, then we abort the transaction. Any messages and offsets that were pending in our producer will be rolled back. That way we'll be able to retry the message, but we won't create duplicates in our downstream topic. This gives us an effectively-once guarantee because each downstream message will appear once and only once. However, be aware that in some languages, the default behavior of a downstream consumer is to read all messages, even if they haven't been committed. That means the consumer might read messages from a transaction that ends up being aborted. This behavior is controlled by the setting IsolationLevel. Thankfully, in the .NET client, the default IsolationLevel is ReadCommitted. This ensures that the consumers are only reading fully committed messages. As long as we don't alter this value, our consumers should be safe. However, if you consume the messages in a language other than C# you'll wanna check this setting. Deliver guarantees can be tricky to manage. It's tempting to rely on the default auto-commit settings to manage them for us. However, as we can see, those defaults aren't sufficient for most use cases. Instead, we need to look at what our application requires and implement the solution that best fits our needs. If you aren't already on Confluent developer, head there now using the link in the video description to access the rest of this course and its hands on exercises.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.