Get Started Free
‹ Back to courses
course: Building Data Pipelines with Apache Kafka® and Confluent

Streaming Data from Kafka to External Systems

4 min
Untitled design (21)

Tim Berglund

VP Developer Relations

Robin Moffatt

Robin Moffatt

Principal Developer Advocate (Author)

Streaming Data from Kafka to External Systems

As this course draws to a close, we have one final piece to put into our pipeline. We've got the data in, we've processed and enriched it—now let's do something with it and write it to another cloud service or external system to drive the operational dashboard that we hypothesized about in our example.

Just as we used Kafka Connect to get the customer data in from the database earlier, we're going to use Kafka Connect again here to stream the data to the target system. There are connectors for most places you'd want to stream data to nowadays—object stores, cloud data warehouses, NoSQL stores, and so on.

We're going to use Elasticsearch, with the managed connector in Confluent Cloud. You can also run it for yourself in a self-managed Kafka Connect cluster. To learn more about Kafka Connect itself, check out the Kafka Connect course.

The Elasticsearch connector is fully managed and just requires a few details to configure:

kafka-elasticsearch-connector

With the data flowing into the target system, we can now build out the operational dashboard that we had in mind at the beginning of this exercise.

kafka-dashboard

Summary

Let's recap what we've built. We started with a stream of events, each one with review rating information submitted by a customer. The data for these customers is held in a relational database table, which we also streamed into Apache Kafka:

kafka-events-database

We then used stream processing to enrich each rating event as it arrives with information about the customer who left the review. The resulting enriched data was written back into a new stream and then streamed to Elasticsearch, and from there we built a dashboard for the data.

Use the promo code PIPELINES101 to receive $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Streaming Data from Kafka to External Systems

Hi, I'm Tim Berglund with Confluent. Welcome to Data Pipelines lesson six, Streaming Data from Kafka into External Systems. All right, as finish up our review of data pipelines, we have one thing left to put in place. We've got data in we've done real time computation on that data using ksqlDB. Now, let's go put it somewhere so somebody can see it and so it can participate maybe in some operational dashboard that we hypothesized about in our example. So just as we used Kafka connect to get data in from the database earlier, we're going to use connect again here to stream it out to the target system. There are connectors for most places you'd want to stream data to nowadays, Object stores, Cloud data warehouses, NoSQL stores, things like that. Really, at the time of this recording, Confluent Cloud's, coverage of the managed connectors that you're probably going to want for this kind of purpose is pretty complete. We're gonna use Elastic search with the managed connector in Confluent cloud. You can also run it for yourself if you'd rather, but you'd be making life a lot harder for yourself, so let's stick with cloud for now. These connectors are fully managed. You click and say you want to add it, you fill in the web UI and that's really all you need to do. Now, as I've said before, we're focusing really on pipelines and building pipelines with connect. If you want to learn more about connect itself, check out the Kafka connect course that's available elsewhere on confluent developer. Now, once that output connection, that sync connection is established and that data's flowing into the target system, then we can build out the operational dashboard that we had in mind at the beginning of the exercise. And that's not a confluent cloud thing at this point, this is in the sink system. So this is Kibana and you can build this dashboard and visualize the real-time data that's flowing through our pipeline. Let's recap what we've built. So, we've taken a stream of events from an external system, in this case, our data simulator connector that had rating review information submitted by customers. We brought in a live copy of the data held in a database that held individual data like records about those customer entities. We brought that in using Kafka connect from that database into confluent cloud. And we used stream processing with ksqlDB to enrich each rating event as it arrived with information about the customer that came from that customer table that we brought in from the database. The resulting enriched data was written back into a new stream, a new topic in Kafka, and then connected out to Elastic search where we built a dashboard around it. So that wraps it up. That's all the components of a data pipeline. And we've got lots of materials for you to deep dive into each one of the things a little bit more specifically, more about ksqlDB, more about Kafka connect. These things are all worth a lot of study, but this review should get you all the pieces in mind and really show you what this pipeline is all about, what we mean by pipeline, what the pieces are and give you enough to start thinking and hopefully to start building.