Get Started Free

Apache Flink® FAQs

Here are some of the questions that you may have about Apache Flink and its surrounding ecosystem.

If you’ve got a question that isn’t answered here then please do ask the community.

Flink SQL is a declarative API used for creating Flink jobs. It is well-suited for real-time ETL, data enrichment, and event-driven applications.

Confluent Cloud provides a cloud-native, serverless service for Flink that enables simple, scalable, and secure stream processing that integrates seamlessly with Apache Kafka®. Your Kafka topics appear automatically as queryable Flink tables, with schemas and metadata attached by Confluent Cloud.

Confluent’s fully managed Flink service enables you to:

  • Easily filter, join, and enrich your data streams with Flink
  • Enable high-performance and efficient stream processing at any scale, without the complexities of managing infrastructure
  • Experience Kafka and Flink as a unified platform, with fully integrated monitoring, security, and governance

Confluent Cloud for Apache Flink is engineered to be:

  • Cloud-native: Flink is fully managed on Confluent Cloud and autoscales up and down with your workloads.
  • Complete: Flink is integrated deeply with Confluent Cloud to provide an enterprise-ready experience.
  • Everywhere: Flink is available in AWS, Azure, and Google Cloud.

The documentation includes how-to guides for several common use cases. On Confluent Developer you’ll find Flink SQL demos and tutorials.

A regular join, as in

SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id;

is poorly suited for streaming. To execute this join, the Flink SQL runtime must keep around forever, in its state, all order and customer records. These older records are needed for producing the full set of new join results that must be produced as new orders and customer records arrive. In most cases this is unreasonably expensive, and it’s rarely useful.

In most streaming applications, joins are used for event enrichment, e.g., augmenting each incoming order event with timely customer information. For stream enrichment use cases, you should use a temporal join instead of a regular join:

SELECT *
FROM orders
INNER JOIN customers FOR SYSTEM_TIME AS OF orders.order_time
ON orders.customer_id = customers.id;

To understand regular and temporal joins in more detail, see How To Use Streaming Joins with Apache Flink.

Another type of specialized, optimized join that is useful for streaming use cases is the interval join. This is what you could use, for example, to find orders that shipped within 4 hours of being received:

SELECT *
FROM orders o, shipments s
WHERE o.id = s.order_id
AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time;

Windows rely on watermarks, and some operations, like regular joins, cause watermarks to be lost. Even if both input streams/tables to a regular join have watermarks, the result can not have watermarks because there’s no a priori limit to how out-of-order the resulting stream/table might be.

If you need to apply windowing to the result of a join, you can either use a temporal or interval join, or in cases where you know that the result will happen to be (at least roughly) in order, you can write out the result of a regular join somewhere, such as a Kafka topic, and then apply suitable watermarking to that intermediate table.

Learn more with these free training courses

Apache Flink® 101

Learn how Flink works, how to use it, and how to get started.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free