Get Started Free

Test,What are Kafka Producers and How do they work?

Understanding how to get data into your Kafka cluster is the first step to real-time streaming data. Producer has entered the chat. Watch this video to find out all you need to know about how a Kafka producer helps your data flow into the future.

5 min

What are Kafka Producers and How do they work?

5 min
dan-weston

Dan Weston

Senior Curriculum Developer

Understanding how to get data into your Kafka cluster is the first step to real-time streaming data. Producer has entered the chat. Watch this video to find out all you need to know about how a Kafka producer helps your data flow into the future.

Learn more about Apache Kafka here with our free courses

Resources

What are Kafka Producers and How do they work?

Intro

Hi, I’m Dan Weston from Confluent. Whether you’re new to Apache Kafka or you’ve built your entire company's infrastructure with Kafka as its central nervous system, Producers are where everything begins. If you want data flowing into Kafka, you need a producer. So, let’s get to producing.

Understanding Apache Kafka

To understand what a producer is, and how it fits into the overall picture of creating a real-time data streaming platform let’s talk a little bit about Apache Kafka. Apache Kafka is a distributed streaming platform that provides a publish-subscribe messaging system. In Kafka, producers are responsible for publishing messages to a Kafka topic. Each topic is spread across multiple partitions. These partitions help Kafka scale by allowing you to add resources and additional partitions as your system handles more and more data. Each partition is an ordered, immutable sequence of messages that is continually appended to. Think of a Kafka topic as a library and each partition as a different bookshelf in that library. Just as different bookshelves contain different books, each partition in a topic holds a subset of the overall data. So, that makes each book in a library an individual message that is sent to Kafka. Each of these messages contain three elements, a key, the value of the message, and a timestamp of when the message was produced. If you provide a key with the message, the producer first computes a hash on the key. It then applies a mod operation with the total number of partitions to find the partition to produce to. This allows messages with the same key to be written to the same partition and in the correct order. If your key is null, a sticky partition is used. This essentially means that messages are batched either by a certain size or a period of time and then sent to the same partition. At which point a new batch will be started and sent to a different partition. This helps drive latency down while still spreading messages out over all of the partitions. Producers in Kafka can be implemented in just about any programming language, whether officially supported or through various community projects. No matter which language you select, the best way for producers to send messages to Kafka topics is to use the Producer API.

Producer Configuration

Now that we have some context and know a little bit more about how partitions work, let’s take a look at what a basic Kafka producer looks like. There are four main code sections in a producer: First, we specify some configuration details. This is stuff like how to connect to the Kafka cluster and the type of the key, and the value. Next, we create a producer by invoking the KafkaProducer class. We then create the shutdown behavior. And, last of all, we actually send the data. Here you create an instance of the ProducerRecord class and provide the topic, the key, and value for your messages. After you've created the ProducerRecord, you pass it to the 'send' method of KafkaProducer, it is serialized and the partition on which it will land is determined. As mentioned earlier, if you’ve provided a key, all the messages will be sent to the same partition. This is so that when you read the messages back using a consumer, you have the guarantee that they are in order. If you didn’t provide a key, each message will be divided up in batches among all of your partitions. Each partition has a “leader” node handling all its read and write requests, and “follower” nodes that replicate the data for fault tolerance and high availability. You can run a cluster without having followers, but you run the risk of losing data. Leaders and followers help in fault tolerance and data replication. These partitions are spread across different brokers. If one of these brokers goes down, a new leader is selected from the followers.

Producer "Acks"

There is one more crucial decision you need to make when writing a producer. How does a producer know if the messages have been delivered? For that, we have producer acknowledgment or Acks for short. In other words, how important is it that your messages arrive? There are three options to choose from, Acks 0, Acks 1, or Acks -1. Acks 0 as the name implies is none. The messages are sent without the producer waiting to see if they ever arrive in your Kafka cluster. This provides the lowest latency, but there is a good chance that you will lose data. Acks 1 sends the message and waits for the leader to acknowledge that it has received the message. This assures that your messages are delivered, without slowing the delivery dramatically. The last one is Acks -1. This is where your producer waits for the leader and any followers to acknowledge that the message has been delivered. While the slowest of the three, this also provides the greatest assurance that your messages have arrived and have been written to multiple disks.

Closing

That’s the basics of Kafka producers. Let us know in the comments below if you have other questions or content you’d like to see us cover and be sure to like and subscribe to be notified as we publish additional videos. Until next time, have fun streaming!