VP Developer Relations
Principal Developer Advocate (Author)
As this course draws to a close, we have one final piece to put into our pipeline. We've got the data in, we've processed and enriched it—now let's do something with it and write it to another cloud service or external system to drive the operational dashboard that we hypothesized about in our example.
Just as we used Kafka Connect to get the customer data in from the database earlier, we're going to use Kafka Connect again here to stream the data to the target system. There are connectors for most places you'd want to stream data to nowadays—object stores, cloud data warehouses, NoSQL stores, and so on.
We're going to use Elasticsearch, with the managed connector in Confluent Cloud. You can also run it for yourself in a self-managed Kafka Connect cluster. To learn more about Kafka Connect itself, check out the Kafka Connect course.
The Elasticsearch connector is fully managed and just requires a few details to configure:
With the data flowing into the target system, we can now build out the operational dashboard that we had in mind at the beginning of this exercise.
Let's recap what we've built. We started with a stream of events, each one with review rating information submitted by a customer. The data for these customers is held in a relational database table, which we also streamed into Apache Kafka:
We then used stream processing to enrich each rating event as it arrives with information about the customer who left the review. The resulting enriched data was written back into a new stream and then streamed to Elasticsearch, and from there we built a dashboard for the data.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Tim Berglund with Confluent. Welcome to Data Pipelines lesson six, Streaming Data from Kafka into External Systems. All right, as finish up our review of data pipelines, we have one thing left to put in place. We've got data in we've done real time computation on that data using ksqlDB. Now, let's go put it somewhere so somebody can see it and so it can participate maybe in some operational dashboard that we hypothesized about in our example. So just as we used Kafka connect to get data in from the database earlier, we're going to use connect again here to stream it out to the target system. There are connectors for most places you'd want to stream data to nowadays, Object stores, Cloud data warehouses, NoSQL stores, things like that. Really, at the time of this recording, Confluent Cloud's, coverage of the managed connectors that you're probably going to want for this kind of purpose is pretty complete. We're gonna use Elastic search with the managed connector in Confluent cloud. You can also run it for yourself if you'd rather, but you'd be making life a lot harder for yourself, so let's stick with cloud for now. These connectors are fully managed. You click and say you want to add it, you fill in the web UI and that's really all you need to do. Now, as I've said before, we're focusing really on pipelines and building pipelines with connect. If you want to learn more about connect itself, check out the Kafka connect course that's available elsewhere on confluent developer. Now, once that output connection, that sync connection is established and that data's flowing into the target system, then we can build out the operational dashboard that we had in mind at the beginning of the exercise. And that's not a confluent cloud thing at this point, this is in the sink system. So this is Kibana and you can build this dashboard and visualize the real-time data that's flowing through our pipeline. Let's recap what we've built. So, we've taken a stream of events from an external system, in this case, our data simulator connector that had rating review information submitted by customers. We brought in a live copy of the data held in a database that held individual data like records about those customer entities. We brought that in using Kafka connect from that database into confluent cloud. And we used stream processing with ksqlDB to enrich each rating event as it arrived with information about the customer that came from that customer table that we brought in from the database. The resulting enriched data was written back into a new stream, a new topic in Kafka, and then connected out to Elastic search where we built a dashboard around it. So that wraps it up. That's all the components of a data pipeline. And we've got lots of materials for you to deep dive into each one of the things a little bit more specifically, more about ksqlDB, more about Kafka connect. These things are all worth a lot of study, but this review should get you all the pieces in mind and really show you what this pipeline is all about, what we mean by pipeline, what the pieces are and give you enough to start thinking and hopefully to start building.