Get Started Free
‹ Back to courses
course: Apache Kafka® 101

Kafka Connect

6 min
Untitled design (21)

Tim Berglund

VP Developer Relations

Gilles Philippart profile picture  (round 128px)

Gilles Philippart

Software Practice Lead

Kafka Connect: Integrating Kafka With the Outside World

When building a real-time data platform, one of the biggest challenges is getting data in and out of Kafka from the many systems you already use—like databases, APIs, and SaaS tools. That’s where Kafka Connect comes in.

What You’ll learn

Kafka Connect makes Kafka extensible. Rather than writing custom code to sync data from source to sink, you can use plug-and-play connectors that handle it for you—no need to reinvent the wheel.

In this module, you’ll explore:

  • What Kafka Connect is and how it fits into a streaming architecture
  • The role of source and sink connectors in data pipelines
  • The basics of lightweight transformations using SMTs (Single Message Transforms)
  • How Confluent makes this easier with fully managed connectors in Confluent Cloud

Why Connectors Matter

Think of connectors as adapters:

  • Source Connectors bring external data into Kafka (like capturing changes in Salesforce)
  • Sink Connectors send Kafka data to external systems (like pushing records into Elasticsearch)

With just a few configuration settings, you can stream data between Kafka and your existing tools—without writing producer or consumer code.

Transforming Data on the Fly

Kafka Connect also supports lightweight message transformations. For example, you might enrich each event with a field that wasn't in the source system (like a region or device type). These transformations are stateless and fast, ideal for simple data shaping tasks before processing begins.

Powered by a Vast Connector Ecosystem

Confluent maintains an extensive hub of production-ready connectors for popular platforms like MySQL, Snowflake, MongoDB, and BigQuery. In Confluent Cloud, many of these connectors are fully managed, meaning you don’t have to worry about infrastructure at all.

Why This Matters for You

If you're working on a project where data comes from multiple sources—or needs to land in different systems—Kafka Connect is the tool that makes it all work together. It’s the bridge between Kafka and the rest of your stack, allowing you to build robust, scalable, and integrated pipelines without needing to write a lot of code.

For a more detailed introduction to Kafka Connect, check out the Kafka Connect 101 course.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes KAFKA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud storage and skip credit card entry.

Kafka Connect

This is the introduction to Apache Kafka on Kafka Connect. It's a fact of the world in information storage and retrieval systems that some of those systems are not Kafka, the nerve of some people. Now that might be unfortunate, maybe not, I don't know. You can form your own opinion, but you have to talk to things that aren't Kafka. You have to get stuff from out there, get it into a topic, take things from the topic and put it into that thing over there.

This is the job of Kafka Connect, which is Kafka's integration API, really integration subsystem. On the one hand, Connect is an ecosystem of pluggable connectors that you could drop in, declaratively configure, and they run. On the other hand, it's a client application. It's its own little distributed system that runs these connectors, takes them and produces into Kafka and consumes from Kafka. Remember, everything is not a broker, it's a producer or consumer. Connect is no exception.

As a client application, let's take a look at this. We have a little Kafka cluster here and here we have these nervy systems out there that aren't Kafka, they're relational databases, they're SaaS applications, maybe there's some Elasticsearch. There's all kinds of things going on in the world, right? But they're definitely not Kafka. Connect is this client application running outside of the brokers. Now, I'm showing one instance on either side of the cluster. It's possible to run Connect as a single instance. Often it's a cluster of several for fault tolerance and for scale. But just to kind of illustrate here, you've got source connectors, that's reading from something and producing to a topic and sync connectors, that's consuming from a topic and writing to some other system out there.

And so when you're being a source connector, you're gonna read from, say, that database there, that could be some Change Data Capture process, produce to a Kafka topic and there could be a sync connector looking at that same topic says, oh, hey, good, a message. I'm gonna read it and write it now to some other external database. So messages flow through Kafka from external systems to other external systems using Kafka Connect. And again, Connect itself is a distributed system. There could be many of these instances depending on the kind of load you have and the number of connectors that you're running and the kinds of systems that you're talking to.

I said it was declarative configuration and that's what it is. Now, you don't actually want to write the code that talks to Elasticsearch. Somebody has already done that. You're not gonna make the world better if you do it yourself. So that code exists, you just have to plug in some configuration, some security parameters, tell it where the Elastic cluster is, what's the topic you're reading from or writing to, whatever you're usually reading from in the case of Elastic, it's usually a sync connector. So you get to write this little piece of JSON, throw it at a REST endpoint on the Connect cluster and make sure the connector class is loadable by that Connect instance and off you go.

It's possible for connectors to do little tiny bits of stateless stream processing on the messages that flow through them. That's a little bit of a spicy way to put it, but single message transforms are things that you can configure in the connector configuration. You can do things like filter, you can add fields to provide a little bit of context. You could rename a field, mask, PII, modify certain values. Maybe look into a message value and extract something as a key. That's kind of on the sophisticated end of things.

Here we have a source connector that reads records from a Postgres database and adds a new field called source_system to every message and sets its value to users_db. So that source system is now going to be in the topic, in messages in the topic, even though that was nowhere in the database table that we are sourcing from, that we're capturing change data from. The important thing about single message transforms is that they are entirely stateless. If you want to do stateful things in single message transforms, number one, you can't. Number two, you should look to some stream processing. In other words, get the data into Kafka and then do the stream processing with something like Flink or Kafka Streams once the data is in.

One of the big advantages of Connect is the huge ecosystem of connectors that somebody else has written and probably lots of people have deployed and tested. There's a classical power law distribution here. So, you know, 5% or 10% of the possible connectors in the world are gonna account for 90% of the actual data integration work that you wanna do. So whatever it is you're connecting to, it's probably something pretty standard and that connector has probably been battle tested for a long time. So there's a huge advantage there.

And there's a long tail of these things too. It's not just the biggies. There are over 4,000 connectors on GitHub in various stages of usefulness. You know how it is with stuff in the long tail. Maybe somebody wrote it and hasn't maintained it in a few years and it worked for their use case. It might not work for theirs. You know, it's that kind of code. It's not free as in speech. It's not free as in beer. It's free as in puppy. But, you know, a lot of those things are covered and the big use cases are really covered well.

To know what these are, Confluent has a thing called the Connect Hub. You can go to hub.confluent.io or confluent.io/hub, whatever you prefer, browse through here. This isn't just about the connectors that we provide as a part of our cloud product or in our on-premise product called Confluent Platform. There are a number of connectors we've got here in the hub, things that we're aware of. But we do support over a hundred pre-built connectors. And in our fully managed cloud service, we support over 80 Kafka Connect connectors that are just things you can configure. They run in our cloud infrastructure. You never worry about a Connect cluster or anything like that.

So you should know about Connect. You have to understand what Connect is, understand something about the ecosystem of connectors. You may choose to operate that on your own. You may get that as a benefit of a cloud service that you use, but it is the standard way of integrating Kafka with other systems in the world that aren't Kafka.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.