VP Developer Relations
Software Practice Lead
In the world of data, many systems are not Apache Kafka®—they might be relational databases, SaaS applications, Elasticsearch, or even legacy file systems. To integrate these with Kafka, we use Kafka Connect. Kafka Connect is a framework designed to stream data in and out of Kafka from external systems without writing custom integration code.
Kafka Connect is both an ecosystem of connectors and a distributed service for running them. It allows data to flow:
These connectors are pluggable components that you can configure declaratively—no need to write code. Just specify connection details, topics, and basic settings, and Kafka Connect handles the rest. Connect itself runs independently of the Kafka brokers and can be deployed as a single instance or a cluster for scalability and fault tolerance.
For example, a Source Connector might read Change Data Capture (CDC) events from a Postgres database and produce them to a Kafka topic. On the other side, a Sink Connector can consume from that topic and write the events to Elasticsearch for indexing.
Connectors also support lightweight, stateless transformations called Single Message Transforms (SMTs). These allow you to modify the data as it flows through Connect:
For example, a source connector reading from Postgres might add a "source_system" field to every message before writing it to Kafka. This field didn't exist in the original database, but SMTs allow it to be enriched on the fly. Importantly, these transforms are stateless—they operate on each message independently. If you need stateful processing, you'd typically use Flink (or Kafka Streams).
One of the biggest strengths of Kafka Connect is its ecosystem of pre-built connectors. Many of the common systems you'll want to integrate with already have robust, community-tested connectors. Confluent hosts a Connect Hub where you can find:
While the quality of community connectors varies, major use cases are well covered and widely used. For cloud users, Confluent Cloud provides fully managed connectors, eliminating the need to manage Connect clusters yourself. You can now even provide your own connector to run in Confluent Cloud managed Connect service!
Kafka Connect is the standard way of moving data between Kafka and the outside world. It enables seamless integration with databases, cloud services, analytics engines, and more—ensuring your Kafka streams are easily accessible to any system that needs them. Whether you're running Kafka on-premises or in the cloud as a managed service, Connect provides a scalable, fault-tolerant way to manage data integration with minimal code and maximum reliability.
For a more detailed introduction to Kafka Connect, check out the Kafka Connect 101 course.
This is the introduction to Apache Kafka on Kafka Connect. It's a fact of the world in information storage and retrieval systems that some of those systems are not Kafka, the nerve of some people. Now that might be unfortunate, maybe not, I don't know. You can form your own opinion, but you have to talk to things that aren't Kafka. You have to get stuff from out there, get it into a topic, take things from the topic and put it into that thing over there.
This is the job of Kafka Connect, which is Kafka's integration API, really integration subsystem. On the one hand, Connect is an ecosystem of pluggable connectors that you could drop in, declaratively configure, and they run. On the other hand, it's a client application. It's its own little distributed system that runs these connectors, takes them and produces into Kafka and consumes from Kafka. Remember, everything is not a broker, it's a producer or consumer. Connect is no exception.
As a client application, let's take a look at this. We have a little Kafka cluster here and here we have these nervy systems out there that aren't Kafka, they're relational databases, they're SaaS applications, maybe there's some Elasticsearch. There's all kinds of things going on in the world, right? But they're definitely not Kafka. Connect is this client application running outside of the brokers. Now, I'm showing one instance on either side of the cluster. It's possible to run Connect as a single instance. Often it's a cluster of several for fault tolerance and for scale. But just to kind of illustrate here, you've got source connectors, that's reading from something and producing to a topic and sync connectors, that's consuming from a topic and writing to some other system out there.
And so when you're being a source connector, you're gonna read from, say, that database there, that could be some Change Data Capture process, produce to a Kafka topic and there could be a sync connector looking at that same topic says, oh, hey, good, a message. I'm gonna read it and write it now to some other external database. So messages flow through Kafka from external systems to other external systems using Kafka Connect. And again, Connect itself is a distributed system. There could be many of these instances depending on the kind of load you have and the number of connectors that you're running and the kinds of systems that you're talking to.
I said it was declarative configuration and that's what it is. Now, you don't actually want to write the code that talks to Elasticsearch. Somebody has already done that. You're not gonna make the world better if you do it yourself. So that code exists, you just have to plug in some configuration, some security parameters, tell it where the Elastic cluster is, what's the topic you're reading from or writing to, whatever you're usually reading from in the case of Elastic, it's usually a sync connector. So you get to write this little piece of JSON, throw it at a REST endpoint on the Connect cluster and make sure the connector class is loadable by that Connect instance and off you go.
It's possible for connectors to do little tiny bits of stateless stream processing on the messages that flow through them. That's a little bit of a spicy way to put it, but single message transforms are things that you can configure in the connector configuration. You can do things like filter, you can add fields to provide a little bit of context. You could rename a field, mask, PII, modify certain values. Maybe look into a message value and extract something as a key. That's kind of on the sophisticated end of things.
Here we have a source connector that reads records from a Postgres database and adds a new field called source_system to every message and sets its value to users_db. So that source system is now going to be in the topic, in messages in the topic, even though that was nowhere in the database table that we are sourcing from, that we're capturing change data from. The important thing about single message transforms is that they are entirely stateless. If you want to do stateful things in single message transforms, number one, you can't. Number two, you should look to some stream processing. In other words, get the data into Kafka and then do the stream processing with something like Flink or Kafka Streams once the data is in.
One of the big advantages of Connect is the huge ecosystem of connectors that somebody else has written and probably lots of people have deployed and tested. There's a classical power law distribution here. So, you know, 5% or 10% of the possible connectors in the world are gonna account for 90% of the actual data integration work that you wanna do. So whatever it is you're connecting to, it's probably something pretty standard and that connector has probably been battle tested for a long time. So there's a huge advantage there.
And there's a long tail of these things too. It's not just the biggies. There are over 4,000 connectors on GitHub in various stages of usefulness. You know how it is with stuff in the long tail. Maybe somebody wrote it and hasn't maintained it in a few years and it worked for their use case. It might not work for theirs. You know, it's that kind of code. It's not free as in speech. It's not free as in beer. It's free as in puppy. But, you know, a lot of those things are covered and the big use cases are really covered well.
To know what these are, Confluent has a thing called the Connect Hub. You can go to hub.confluent.io or confluent.io/hub, whatever you prefer, browse through here. This isn't just about the connectors that we provide as a part of our cloud product or in our on-premise product called Confluent Platform. There are a number of connectors we've got here in the hub, things that we're aware of. But we do support over a hundred pre-built connectors. And in our fully managed cloud service, we support over 80 Kafka Connect connectors that are just things you can configure. They run in our cloud infrastructure. You never worry about a Connect cluster or anything like that.
So you should know about Connect. You have to understand what Connect is, understand something about the ecosystem of connectors. You may choose to operate that on your own. You may get that as a benefit of a cloud service that you use, but it is the standard way of integrating Kafka with other systems in the world that aren't Kafka.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.