Get Started Free

Fixing Bad Data in Event Streams

July 29, 2024

Fixing Bad Data in Event Streams, Data Quality for Streams:

Implementing a “Shift-Left” architecture and ensuring event streams with good quality data results in a healthy data engineering pipeline, and greatly reduces “contaminated” data lakes. In this edition of the newsletter, we feature an in-depth study of techniques to prevent bad data in event streams written by Adam Bellemare, Staff Technologist at Confluent. Read it here.

Data Streaming Resources:

  • Learn how to build real time data products on Confluent Cloud using the Shift Left architecture from Kai Waehner’s blog.
  • Srinivasulu Grandhi, Confluent’s VP of Engineering and Site Leader, delved into the challenges and opportunities of data streaming and stream processing for modern businesses, and how they are applicable across different industries. Read the full interview for ideas on how your business can be “always on” too!
  • Learn Apache Kafka® Zero-Copy OS optimization from Stanislav Kozlovski’s wonderful explanation in this LinkedIn post.
  • See a YouTube video showing how to build a streaming data pipeline to a cloud data warehouse for real-time analytics, which leverages Confluent Data Streaming Platform

A Droplet From Stack Overflow:

What’s the purpose of the CoProcessFunction in Apache Flink®?

David Anderson gives an answer, including an example of how you might encounter a need for the CoProcessFunction in the wild, in today’s droplet.

Got your own favorite Stack Overflow answer related to Flink or Kafka? Send it in to devx_newsletter@confluent.io!

Terminal Tip of the Week:

Let’s say your underlying dataset uses one of the reserved keywords in Confluent Cloud for Apache Flink SQL, like “blob”:

CREATE TABLE blob (
      fluid_density INT,
      acceleration INT,
      pressure INT
);

You might get an error like this:

Something went wrong.
SQL parse failed. Encountered "blob" at line 1, column 14.
Was expecting one of:
     <BRACKET_QUOTED_IDENTIFIER> ...
     <QUOTED_IDENTIFIER> ...
     <BACK_QUOTED_IDENTIFIER> ...
     <BIG_QUERY_BACK_QUOTED_IDENTIFIER> ...
     <HYPHENATED_IDENTIFIER> ...
     <IDENTIFIER> ...
     <UNICODE_QUOTED_IDENTIFIER> ...

Your best practice is to enclose the reserved keyword in backticks:

CREATE TABLE `blob` (
     fluid_density INT,
     acceleration INT,
     pressure INT,
    ....
);

Links From Around the Web:

Upcoming Events:

Hybrid

In-person

Online

  • NAM West Summer Apache Kafka® Meetup (Virtual): Sandon Jacobs, Senior Developer advocate at Confluent will talk about data stream quality and discuss best practices involved in making producer applications good stewards of data streams for downstream consumers.

By the way…

We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!

If you’d like to view previous editions of the newsletter, visit our archive.

If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.

P.S. If you want to learn more about Kafka, Flink, or Confluent Cloud, visit our developer site at Confluent Developer.

Subscribe Now

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Recent Newsletters