Staff Software Practice Lead
A Flink datastream consists of multiple moving parts, including Sources, Sinks, and Operators. How these pieces fit together can be compared to a plumbing system where the water flowing through the plumbing is similar to the data flowing through the stream. This analogy can be especially useful when trying to understand the role of backpressure in the stream. This video will walk through the various components in a datastream using the plumbing analogy to help explain each stage.
Topics:
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Wade from Confluent. In this video we'll explore the basic structure of a Flink datastream, including sources, sinks, and operators. When trying to imagine datastreams, I like to picture them as water pipes. Conceptually, there's a lot of similarity between datastreams and plumbing, and we can use that analogy to better understand how they work. If we want to push water through a plumbing system then the first thing we need is a water source. Water Source Typically for a plumbing system, that source would take the form of a water pump. The job of the water pump is to draw water from a lake or reservoir and push it downstream where it can be used. Data Source Datastreams also have a source, but rather than drawing water into the stream they're pulling data from somewhere. This might be coming from a database, but often it comes from a streaming source, such as a Kafka topic. However, their job is the same as the water pump, they pull in the data and push it downstream. Destination Let's go back to our plumbing analogy. The other key thing we need is a destination for our water. This is typically going to be something like a shower or a sink in a building somewhere. When we turn on the faucet, water will flow out and collect in the sink. Sink Datastreams also have a sink. However, in this case, the purpose of the sink is to store our data. We might consume the data and dump it into a database or we might push it to a Kafka topic. Whatever we decide to do with it, this is the end point of our stream. Of course, like with plumbing, End Point it might not be the final destination. When we pour water into our sink, some of it may go down the drain where it encounters even more plumbing, but that water has left our original plumbing system and entered into something new. Similarly, a datastream might connect to a Kafka topic to be consumed somewhere else, but our datastream is now finished with it. Pipes The final thing we need in our plumbing system is something to carry the water from the source into our sink. This is done using pipes. The pipes can do a lot with the water flowing through them. They can change direction in order to get the flow to the right place, they can widen or narrow to adjust the speed and pressure of the water, or they can branch off to multiple pipes. Operators In a Flink datastream, this role is taken on by the operators. These operators allow us to transform our data in various ways. We can alter the data to look different from the input, we can perform operations that will increase or decrease the flow of data, such as filtering out data we don't care about, and we can create branch points in our data flow that allow us to fan out to multiple streams or join streams together. Back Pressure One of the key similarities between our plumbing analogy and datastreams is the role of backpressure. In our plumbing analogy, when we turn off the faucet, that causes backpressure in the pipes. This pressure goes all the way back to the water source and prevents any more water from flowing. If we didn't have this backpressure then the source would keep pushing water and it would have nowhere to go. The result could be a catastrophic failure somewhere in the pipes. In our datastreams, we have a similar problem. If we have a portion of our stream that's too slow, it can cause data to build up. If we let that happen unchecked, eventually our application would fail. Thankfully, Flink supports end-to-end backpressure. If an operator is slow, Flink will prevent upstream operators from sending more data. This backpressure can propagate through the system all the way back to the source. This will help prevent catastrophic failures. So that's a rough outline of many of the key components in a Flink datastream. There's a lot more going on under the hood that we haven't covered here, but this should be enough of a tour that we can start putting some of these pieces together in real applications. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course and its hands-on exercises.