Get Started Free

Apache Flink® FAQs

Here are some of the questions that you may have about Apache Flink and its surrounding ecosystem.

If you’ve got a question that isn’t answered here then please do ask the community.

Flink applications run in distributed compute clusters that can be scaled up to hundreds or thousands of compute nodes. Flink’s performance at scale can be attributed to the following design characteristics:

  • All state is local, either in memory or a local RocksDB instance
  • Fault tolerance is guaranteed by checkpoints that are drawn asynchronously
  • Flink streams data between compute nodes using a purpose-built, optimized network stack
  • Backpressure is handled in a very natural way. Thanks to a combination of fixed-size network buffers and credit-based flow control, Flink throttles its sources rather than spilling data into buffers.

Flink is able to guarantee that the state it manages is affected once, and only once, by each event. You can expect correct results, without concern for data loss or duplication.

To understand how this works, see Exactly-Once Processing in Apache Flink.

Streaming data is processed as it becomes available, and often this means that streams are being processed out-of-order with respect to the timestamps in the events (which indicate when the events actually occurred). Time-related operations, such as windows, need to know how long to wait for out-of-order events before producing results, and how long to retain whatever state is required to handle these out-of-order events correctly.

Watermark are timestamped stream records that Flink inserts into your data streams. Each watermark marks a position in the stream with the timestamp that it carries. Time-based operations, like windows, rely on watermarks to know when to produce results, and when state they’re keeping can be safely garbage collected.

Watermark generation is typically based on an estimate of the maximum out-of-orderness that is expected for each data source. As a developer, you can control the tradeoff between latency and completeness by choosing this parameter that controls how long Flink will wait for out-of-order events.

For more on watermarks, see Event Time and Watermarks.

Learn more with these free training courses

Apache Flink® 101

Learn how Flink works, how to use it, and how to get started.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free