In our last newsletter, we featured a blog post from Wade Waldron on solving the dual-write problem. This time, Wade’s back with a new case study video on the same topic. We’ve also got a brand-new demo of the (early access) confluent-kafka-javascript client, including the code behind the website, www.lets-settle-this.com. In addition, we’re featuring a parallel consumer for Kafka, and a summary of headless data architecture.
Our Terminal Tips section today is a mini-tutorial on computed columns with Flink dynamic tables, which we think you’ll find especially enlightening.
Enjoy!
Kafka Pyrallel Consumer, a parallel consumer for Kafka with deduplication capabilities based on confluent_kafka
Visit Let's Settle This: Programming Style to settle the classic coding debates once and for all… merge vs. rebase, Vim vs. Emacs, and KStreams vs. Flink SQL! Lucia Cerchie delivers a summary of the relevant code powered by confluent-kafka-javascript (early access) in this video. Visit the GitHub repository for the full code
How to avoid data loss and solve the dual-write problem… Wade Waldron weighs in
Marketecture or a solid approach? Adam Bellemare breaks down headless data architecture in a new video
A Kafka Connect Sandbox! Estêvão B. Saleme provides a new test environment to simulate real-world scenarios, and help developers detect issues ahead of time. Read the blog post here to learn how to use it
Italo Nesi’s new demo outlines the current steps needed to monitor client metrics. Italo considers it a resource for anyone currently in a bind with respect to client metrics, but also an illustration of the usefulness of KIP-714, which proposes a generally available metrics and telemetry interface
One can never have too much telemetry when working with large distributed systems! Learn why, and how to access metrics when using a Python Kafka client, in Afzal Mazhar’s new video
Say you’re transforming data coming in from a Kafka connector that is on its way to some database. Can you add a transformation on more than one field while running Kafka Connect? Yes, you can!
Learn the pattern for adding transformations on more than one field when ingesting change data capture (CDC) events from a database.
Got your own favorite Stack Overflow answer related to Flink or Kafka? Send it in to devx_newsletter@confluent.io !
Let’s understand the behavior of a Flink dynamic table with a computed column. Computed columns are virtual columns that are not stored in the table, but are computed on the fly based on the values of other columns. These virtual columns are not registered in Schema Registry.
Computed columns are useful for reporting specific values (concatenated first_name and last_name, concatenated device_id and part_id, for example) for downstream applications to consume.
Let’s study this behavior using Confluent CLI for Flink and Confluent Schema Registry CLI.
Login to the Confluent CLI:
confluent login --save
Connect with the Confluent shell for Flink:
confluent flink shell --compute-pool <FLINK_COMPUTE_POOL_ID> --environment <ENV_ID>
In the shell, let’s create a Flink dynamic table with a computed column:
CREATE TABLE device_details (
`device_id` BIGINT,
`device_name` STRING,
`part_name` STRING,
`full_device_name` AS CONCAT(device_name, ' ', part_name)
);
A table successfully created message gets displayed:
Statement phase is COMPLETED.
Table 'device_details' created.
The device_details table has the computed column full_device_name which, although it is a column of the table, will not have an entry in Confluent Schema Registry.
Let’s have a look at the schema.
Display the schema list to find the right schema_id:
confluent schema-registry schema list
From the list of results, identify the schema_id for the table device_details:
Schema ID | Subject | Version
------------+-----------------------------------+----------
100039 | device_details-value | 1
The value-schema for the table device_details has the Schema ID 100039.
Let’s describe the schema to test if the computed column is a part of the schema or not:
confluent schema-registry schema describe 100039
Check the result:
Schema ID: 100039
Type: AVRO
Schema:
{
"type": "record",
"name": "record",
"namespace": "org.apache.flink.avro.generated",
"fields": [
{
"name": "device_id",
"type": [
"null",
"long"
],
"default": null
},
{
"name": "device_name",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "part_name",
"type": [
"null",
"string"
],
"default": null
}
]
}
So, it is confirmed, that the generated AVRO schema does not store the computed column full_device_name and the computed column is only created on the fly.
Flowers and Numbers – Natalia Kiseleva creates data badges
When’s the best time to do laundry on campus? – How IU does its laundry
The History of Command Palettes – bridging the gap between traditional commands and natural language
Hybrid
In-person
Data in Motion event (Sep 26): The tour makes a stop in Paris! Reserve your seat today to listen to Kafka enthusiasts and practitioners from companies like FnacDarty, Michelin, Système U, Euronext, sunday, and PeopleSpheres. Learn how Confluent impacted their data pipelines
Apache Kafka® x Apache Flink® Meetup Lisbon (July 10): Lessons learned about Apache Flink from using it in production
Brisbane Apache Kafka® Meetup (July 11th): Dive in deep to Kafka client metrics
We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!
If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.