Now that your data is inside your Kafka cluster, how do you get it out? In this video, Dan Weston covers the basics of Kafka Consumers.
Senior Curriculum Developer
Now that your data is inside your Kafka cluster, how do you get it out? In this video, Dan Weston covers the basics of Kafka Consumers: what consumers are, how they get your data flowing, and best practices for configuring consumers in a real-time data streaming system. You will also learn about offsets, consumer groups, and partition assignment.
Learn more about Apache Kafka here with our free courses
Intro
Hi, I'm Dan Weston from Confluent. In this video will take a look at the Kafka consumer and the things you should keep in mind as you build your Kafka cluster.
What is a Consumer?
There are three main pieces to a Kafka based architecture, the producer or where messages come from, the Kafka cluster itself and the consumers. To understand consumers, it's helpful to follow the message path as it moves through your Kafka cluster. Messages can be any event. A sale was made on your website, a temperature sensor recorded the current temperature or an order was shipped. Each of these messages contains five core elements and one that's optional. The key, the value, the timestamp of when the message was created, the current message offset, the partition that the message comes from, and the optional headers. If message order is important for your application, you'll want to provide the key. This ensures that messages are sent to the same partition and kept in the order that they were received. The key is also used to serialize the message before being sent to Kafka and deserialized in the consumer. Consumers subscribe to your topics and pull for messages. One of the things that sets Kafka apart from other messaging systems is that once a message has been consumed, it is not deleted. Instead, each message stays around until the retention period is met. This is one of the main reasons why Kafka scales so easily. You can have any number of consumers connecting to each topic.
Consumer Offsets
One of consumer's most important and often misunderstood aspects is understanding offsets. An offset is a unique identifier of a message in a partition. It's an increasing number that's assigned to each message when it is stored. Consumers can save which messages have been processed in a special Kafka topic that allows them to pick up where they left off in case something goes wrong. This operation is called committing offsets. By default, the Kafka Java consumer does this automatically for you and offers at least once delivery. If you aren't going to be using the Java consumer, you'll want to check out the default behavior of your selected language. For an example of how .NET implements consumer offset, be sure to check out our course Apache Kafka for .NET developers.
Consumer Group
Each consumer is a part of a consumer group. Each group can be as small as one consumer or as large as the number of partitions in your Kafka topic. Each consumer in a group adds parallelization for that task, splitting the messages among the consumers. For example, let's say you have a topic that has a thousand messages arriving per second. If your consumer can only process 500 messages per second, you would never catch up and you'd eventually fall significantly behind. However, if you added another consumer to that group, you'd be able to handle all 1000 messages per second.
Partition Assignment
Kafka automatically assigns consumers to partitions within a consumer group. So if you had one partition, you'd be able to have one consumer. If you had three partitions, you'd be able to have three consumers. For instances where you have less than the maximum number of consumers, Kafka will assign consumers equally. So if you had five partitions and only four consumers, each partition would be assigned to a consumer, with one consumer being assigned to two partitions. This is the same thing that happens when you have a consumer that fails or is added. A rebalancing takes place and assigns partitions to consumers. If you need additional consumers in a group, you will also need to add more partitions to your topic. It is also common to have a consumer act as a producer. In this scenario you would have the application, consume the messages, perform any analysis, aggregation or other transformation, and then send it back to the separate Kafka topic.
Consumer Configuration
Now that we have a better picture of how a consumer works, let's take a look at a simplified consumer configuration. This is a sample configuration taken directly from Confluent Cloud. First, we provide the properties like the Bootstrap server and any certificates or API keys to connect to our Kafka cluster. Next, we provide the group ID for our consumers. Third, we set the offset behavior. This property can contain three options earliest, latest or none. If you care about all of the messages in a topic you should select earliest. If you only want to see the messages from when your consumer connects onward, select latest. None will throw an exception if there is no offset for the consumer group. This property only applies the first time a consumer connects to a topic. After that, when a consumer reconnects, it will pick up consuming from the last offset. We then call Kafka consumer and provide the following information. The topic we’re connecting to, how often you want to pull the Kafka cluster to check for new messages, and then we print the key and value of our message.
Closing
That's the basic of Kafka consumers. Let us know in the comments below if you have any other questions or content you'd like to see us cover and be sure to like and subscribe to be notified as we publish additional videos. Until next time, have fun streaming.