Get Started Free
Untitled design (21)

Tim Berglund

VP Developer Relations

Gilles Philippart profile picture  (round 128px)

Gilles Philippart

Software Practice Lead

Kafka Producers

In Kafka, producers are the client applications responsible for writing data to a Kafka cluster. While brokers handle storage and replication, producers are what send messages (key-value pairs) into topics. Any application that sends data to Kafka—whether it's a microservice, IoT device, or a data pipeline—is considered a producer.

The KafkaProducer API is how applications interact with Kafka. It supports multiple programming languages, including Java, Python, Go, JavaScript, and .NET. Java is Kafka's native language, so new features typically appear there first, but support across other languages is robust and growing.

To configure a producer, you provide a set of properties—key-value pairs that include important details like:

  • bootstrap.servers: A list of brokers that the producer can connect to. It only needs a few to discover the rest of the cluster.
  • acks: Defines the level of acknowledgment required from brokers before considering a message successfully sent.

The main objects in the API are:

  1. KafkaProducer – Manages the connection to the cluster and handles message sending.
  2. ProducerRecord – Represents the message (key, value) and the topic it will be sent to. It also allows you to set optional fields like timestamp, partition, and headers.

Here's a snippet of example code:

Properties config = loadProperties("kafka.properties");
Producer<Long, String> producer = new KafkaProducer<>(config);

ThermostatReading reading = new ThermostatReading("kitchen", 32, Temperature.celcius(22));
String readingAsJson = objectMapper.writeValueAsString(reading);
ProducerRecord record = new ProducerRecord<>("thermostat_readings", 32, readingAsJson));
producer.send(record);

When sending messages, the producer library handles much of the complexity behind the scenes:

  • It chooses which partition to send the message to, either through round-robin or by hashing the message key.
  • It manages retries, acknowledgments, and ensures idempotency (no duplicate writes).

Data sent by producers is simply bytes to Kafka, meaning you can use any serialization format. While the example showed JSON as a string, Kafka supports built-in serializers for primitive types like Integer, Long, Double, and String. For more structured data, you can leverage the Confluent Schema Registry to seamlessly manage schemas across applications, supporting formats like JSON Schema, Avro, and Protobuf for consistent serialization and deserialization.

Finally, the producer is where the partition logic happens. When you send a message, the producer library determines its partition based on its key or by round-robin if no key is provided. This is critical for maintaining ordering and load distribution across the cluster.

In summary, producers are how data enters Kafka. They are simple to configure but powerful, handling complex tasks like serialization, partitioning, and retries, allowing developers to send data efficiently and reliably to Kafka topics.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes KAFKA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud storage and skip credit card entry.

Producers

This is the introduction to Apache Kafka on producers. So for producers, let's get outside of the Kafka cluster proper, those brokers that are doing all the replication, partition management, all the stuff that they do, outside of those, and think about the applications that use Kafka. This could be code that you write. Producers are the client applications that write into a cluster. And every component of the Kafka platform that's not a broker is at the end of the day, either a producer or a consumer, or both. This is everything that interacts with Kafka, it's either gonna put stuff in or get stuff out, or both. Producers are how we get stuff in. So let's take a look at them.

Now, the API surface area of the producer library is really pretty simple. I'm gonna show you a little bit of Java code. If you're not a Java person, don't worry. There's official support from a Confluent perspective of Python, Go, JavaScript, languages of the .NET platform, community-based drivers for really any language you could think of. If you find one that's not supported, I wanna know about it, because it's probably something pretty cool. And Java is of course the native language of Kafka. Features show up in the Java library first, and it's the one that I'll be using in this module and the next.

So here's our first producer. It's a thermostat app reading from a smart thermostat. And there is a class conveniently called KafkaProducer that we're gonna use. That's gonna connect to the cluster in our application. That's an API that we would write code against, and it handles all of the network plumbing and getting messages in and making sure they're acknowledged and the trade-offs between low latency and good throughput, batching, all of these things configurable in the producer.

So for those configuration parameters, you give the producer class a map of parameters, of key-value pairs. We're doing that with a properties file here. You see down at the bottom, that's got the bootstrap servers and the acks=all stuff down there. The bootstrap servers list is pointing the producer to the cluster. So that's a list of some of the brokers in the cluster, two or three, right? Now the cluster could have 50 brokers in it. I don't know, maybe it's huge. You don't wanna list all of those, right? Because some of them come and go, and do you really want names? That's a lot. So if you just list a few of them, that gives it a chance that it can connect to at least one of them. So like if the first one is down, but the second one's up, that's fine. It's got one connection. It gets all the metadata back from that broker about the cluster that it needs. And now the producer in this case, in general, the client library, knows what it needs to know about the cluster. It can build connection pools and manage all the network stuff appropriately. Security would be another thing that would show up in there typically.

So this is a very, very clean, nice, pretty example with only two properties. Usually, there's maybe five or ten in a practical case. But to produce, we create that KafkaProducer object. And then another class we need is called ProducerRecord. That's what you use to hold the key-value pair that you wanna send to the cluster. So messages, these events, these are key-value pairs. And so ProducerRecord is the class that we use to create one. It also needs to know the topic that we're sending that message to. That's a part of that producer record. There's more that you could do there. You can have precise control over the timestamp that's defined, the partition it gets written to if you're into that kind of thing, headers, you can define those key-value pair headers in there.

But essentially, this is the core API surface area that you absolutely need to think about how to produce messages. And given the rest of the network plumbing, as I say, going on under there, acknowledging messages, retransmitting, dealing with enforced idempotency, all kinds of things that can be happening. This isn't really trivial code. It seems like it's a simple thing because the API is so simple, but whether you know it or not, you are very glad that someone else wrote this library for you.

And a quick note about the code you just saw, you may have noticed I had created an object and then just kind of serialized it into JSON and dumped it in as a string. And that's a little sloppy. As I said earlier in the course, Kafka is agnostic with respect to data format. The key and the value are just bytes. That's all it really knows on the inside. But here at the producer level, you might care, right? You've got some kind of domain object that uses your language's type system. Now Kafka has built-in serializers for primitive types, things like integer, long, double, string. And we made this JSON just a string. So Kafka thought it was just dealing with a string.

But usually, you want something that's going to relate to the actual format of your object, the schema that you've got in use. Not showing that right now, we do have a module coming up in a little bit on the Confluent Schema Registry where we can talk about how that works and a slightly more sophisticated approach to serialization and deserialization there.

A little bit earlier, we were talking about partitions. And I said, you know, when you're writing, you have to decide what partition. And we talked about hashing and round robining and all of that. Well, it's the producer library that makes that decision, that does that hashing and decides on the partition to send the message to. So that's another little thing that's being done by this code in the producer.

And that really is your introduction to the Kafka producer. This might be a place where you start to get your hands dirty with some code. At some point, you're gonna need to get your fingers on the keyboard. If you are a developer watching this, really see how the API works, type things out for yourself, watch them run. That's important. But next up, we're gonna look at the other half of producing: consuming.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.