Get Started Free

Prevent Data Loss and Solve the Dual-Write Problem

July 11, 2024

In our last newsletter, we featured a blog post from Wade Waldron on solving the dual-write problem. This time, Wade’s back with a new case study video on the same topic. We’ve also got a brand-new demo of the (early access) confluent-kafka-javascript client, including the code behind the website, www.lets-settle-this.com. In addition, we’re featuring a parallel consumer for Kafka, and a summary of headless data architecture.

Our Terminal Tips section today is a mini-tutorial on computed columns with Flink dynamic tables, which we think you’ll find especially enlightening.

Enjoy!

Data Streaming Resources:

  • Kafka Pyrallel Consumer, a parallel consumer for Kafka with deduplication capabilities based on confluent_kafka

  • Visit Let's Settle This: Programming Style to settle the classic coding debates once and for all… merge vs. rebase, Vim vs. Emacs, and KStreams vs. Flink SQL! Lucia Cerchie delivers a summary of the relevant code powered by confluent-kafka-javascript (early access) in this video. Visit the GitHub repository for the full code

  • How to avoid data loss and solve the dual-write problem… Wade Waldron weighs in

  • Marketecture or a solid approach? Adam Bellemare breaks down headless data architecture in a new video

  • A Kafka Connect Sandbox! Estêvão B. Saleme provides a new test environment to simulate real-world scenarios, and help developers detect issues ahead of time. Read the blog post here to learn how to use it

  • Italo Nesi’s new demo outlines the current steps needed to monitor client metrics. Italo considers it a resource for anyone currently in a bind with respect to client metrics, but also an illustration of the usefulness of KIP-714, which proposes a generally available metrics and telemetry interface

  • One can never have too much telemetry when working with large distributed systems! Learn why, and how to access metrics when using a Python Kafka client, in Afzal Mazhar’s new video

A Droplet From Stack Overflow:

Say you’re transforming data coming in from a Kafka connector that is on its way to some database. Can you add a transformation on more than one field while running Kafka Connect? Yes, you can!

Learn the pattern for adding transformations on more than one field when ingesting change data capture (CDC) events from a database.

Got your own favorite Stack Overflow answer related to Flink or Kafka? Send it in to devx_newsletter@confluent.io !

Terminal Tip of the Week:

Let’s understand the behavior of a Flink dynamic table with a computed column. Computed columns are virtual columns that are not stored in the table, but are computed on the fly based on the values of other columns. These virtual columns are not registered in Schema Registry.

Computed columns are useful for reporting specific values (concatenated first_name and last_name, concatenated device_id and part_id, for example) for downstream applications to consume.

Let’s study this behavior using Confluent CLI for Flink and Confluent Schema Registry CLI.

Login to the Confluent CLI:

confluent login --save

Connect with the Confluent shell for Flink:

confluent flink shell --compute-pool <FLINK_COMPUTE_POOL_ID> --environment <ENV_ID>

In the shell, let’s create a Flink dynamic table with a computed column:

CREATE TABLE device_details (
  `device_id` BIGINT,
  `device_name` STRING,
  `part_name` STRING,
  `full_device_name` AS CONCAT(device_name, ' ', part_name)
);

A table successfully created message gets displayed:

Statement phase is COMPLETED.

Table 'device_details' created.

The device_details table has the computed column full_device_name which, although it is a column of the table, will not have an entry in Confluent Schema Registry.

Let’s have a look at the schema.

Display the schema list to find the right schema_id:

confluent schema-registry schema list

From the list of results, identify the schema_id for the table device_details:


Schema ID |              Subject              | Version  
------------+-----------------------------------+----------
     100039 | device_details-value              |       1 

The value-schema for the table device_details has the Schema ID 100039.

Let’s describe the schema to test if the computed column is a part of the schema or not:

confluent schema-registry schema describe 100039

Check the result:

Schema ID: 100039
Type: AVRO
Schema:
{
    "type": "record",
    "name": "record",
    "namespace": "org.apache.flink.avro.generated",
    "fields": [
        {
            "name": "device_id",
            "type": [
                "null",
                "long"
            ],
            "default": null
        },
        {
            "name": "device_name",
            "type": [
                "null",
                "string"
            ],
            "default": null
        },
        {
            "name": "part_name",
            "type": [
                "null",
                "string"
            ],
            "default": null
        }
    ]
}

So, it is confirmed, that the generated AVRO schema does not store the computed column full_device_name and the computed column is only created on the fly.

Links From Around the Web:

Upcoming Events:

Hybrid

In-person

By the way…

We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!

If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.

Subscribe Now

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Recent Newsletters