Watermark alignment is a relatively new feature in Apache Flink. This video breaks down the scenario when you need to temporally join streams of different event frequencies.
Senior Curriculum Developer
Watermark alignment is a relatively new feature in Apache Flink. It lets you cope with the problem of needing to temporally join streams with mismatched event frequencies, e.g., one stream’s updates are much more frequent than those of the stream(s) with which you need to join it. In this video we’ll break the feature down, and relate how it can help you better manage your Apache Flink integration.
Hi, I’m Dan Weston.
In this Apache Flink in Action, we’ll be talking
about a relatively new feature of Apache Flink, Watermark Alignment.
In this series of videos, we typically talk about problems that are difficult to solve,
or don’t have a clear solution.
In this video, I wanted to present a lesser-used
feature that was added to Flink one dot seventeen, watermark alignment.
This is helpful for when you are joining two streams whose
What is Watermark Alignment?
timestamps are progressively more and more out of sync.
Say you have a temporal join between two streams.
One of those streams is significantly ahead of the other, and its data is being buffered,
waiting for the watermark of the stream that's behind to advance.
If the timestamps for these streams diverge more and more,
the amount of data that needs to be buffered grows.
This will hurt performance, and can even cause operational problems,
such as checkpointing failures.
What we’d like to do is stop reading from the stream that is ahead,
so that the other stream can catch up, and not fall further behind.
In essence, we’re pausing the stream that’s ahead so the situation doesn’t get worse,
Conclusion
and we can put a cap on how much buffering is needed.
Watermark alignment allows you to specify how tightly synchronized your streams should be,
and will prevent any of the sources from getting too far ahead of the others.
If you find yourself in this situation, I recommend enabling this feature for your
streams as it can save you precious resources in your Flink instance.
If you are running Flink in Confluent Cloud this option is already enabled.
Until next time, happy streaming.