Using Kafka with Flink

5 min

David Anderson

Software Practice Lead

Using Kafka with Flink

Overview

Flink has first-class support for developing applications that use Kafka. This video includes a quick introduction to Kafka, and shows how Kafka can be used with Flink SQL.

Topics:

Apache Kafka
Kafka Connect, Kafka Streams, ksqlDB, Schema Registry
Producers and Consumers
Topics and Partitions
Kafka Records: Metadata, Header, Key, and Value
Using Kafka with Flink SQL

Resources

Apache Kafka SQL Connector (docs)
How to use some of the more popular formats:
- CSV
- JSON
- Apache Avro
- Confluent Avro
- Protobuf
- Debezium

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes FLINK101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Using Kafka with Flink

Hi, David from Confluent here to tell you about using Kafka with Flink. Flink includes support for using Kafka as both a source and sink for your Flink applications. Flink is also interoperable with Kafka Connect, Kafka Streams, ksqlDB, and the Schema Registry. To be more precise, I should explain that while Flink explicitly supports Kafka, it is actually unaware of these other tools in the Kafka ecosystem, but it turns out that that doesn't matter. For example, Flink can be used to process data written to Kafka by Kafka connect or Kafka streams, so long as Flink can deserialize the events written by those other frameworks. And that's generally not a problem because Flink includes support for many popular formats out of the box, including JSON, Confluent Avro, debezium, protobuf, et cetera. And you can always supply your own serializers if you need something custom. In case you're not familiar with Kafka, I'll give you a quick, whirlwind tour of the highlights, starting with this birds-eye view of a Kafka cluster. Kafka producers publish events to topics, such as the orders topic in this diagram. Kafka consumers subscribe to topics. In Flink, the Kafka producer is called the KafkaSink, and the consumer is the KafkaSource. For scalability, Kafka topics are partitioned, with each partition being handled by a different server. This diagram shows a topic with three partitions, but most production environments will use many more partitions than this. Internally, Kafka events have some structure to them. The primary payload of each event is a key/value pair. The key and value are serialized by the producer in some format, such as JSON or avro. Kafka also writes some metadata into each event, which includes the topic, the partition, the offset of the event within the partition, and the event's timestamp. It's also possible to include some additional application-specific metadata in each event in the form of key/value pairs; these are the event headers. Now given that background, how does this all fit into Flink SQL? When you use Kafka with Flink SQL, you map tables onto Kafka topics. Flink doesn't store any of the table data itself. Instead, the Orders Table shown here is backed by the orders topic in a Kafka cluster. As events are appended to that orders topic, it will be as though they had been appended to this Orders Table, and these event records will then be processed by whatever SQL query logic you've attached to the table. Flink needs to know the format of the data you are working with. You can separately specify the formats used by the key and the value. In this case, the key and value are encoded using JSON. Sometimes the information you need isn't in the value part of the Kafka record, but is part of the metadata or the headers instead. That's ﬁne, Flink SQL gives you access to every part of the Kafka record. A common use for this is to map the timestamps in the metadata onto a timestamp column. This involves the special METADATA FROM 'timestamp' syntax shown here, where an order_time column in the Orders Table is mapped onto the Kafka timestamp in the metadata part of the Kafka record. The other columns shown here, namely the order_id, price, and quantity, are all coming from the record's value. In summary, Flink acts as a compute layer for Kafka, powering real-time applications and pipelines, with Kafka providing the core stream data storage layer, and interoperability with the many hundreds of other services and tools that work with Kafka. For more information about using Kafka with Flink, see the hands-on exercise associated with this video on Confluent Developer. There you will find more explanations, working examples, and pointers to the documentation. These exercises take advantage of Confluent Cloud. If you haven't already done so, follow the link to get started, and be sure to use the promo code in the description below. If you aren't already on Confluent developer, head there now using the link in the video description to access other courses, hands-on exercises, and many other resources for continuing your learning journey.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 15