Fixing Bad Data in Event Streams

July 29, 2024

Fixing Bad Data in Event Streams, Data Quality for Streams:

Implementing a “Shift-Left” architecture and ensuring event streams with good quality data results in a healthy data engineering pipeline, and greatly reduces “contaminated” data lakes. In this edition of the newsletter, we feature an in-depth study of techniques to prevent bad data in event streams written by Adam Bellemare, Staff Technologist at Confluent. Read it here.

Data Streaming Resources:

Learn how to build real time data products on Confluent Cloud using the Shift Left architecture from Kai Waehner’s blog.
Srinivasulu Grandhi, Confluent’s VP of Engineering and Site Leader, delved into the challenges and opportunities of data streaming and stream processing for modern businesses, and how they are applicable across different industries. Read the full interview for ideas on how your business can be “always on” too!
Learn Apache Kafka® Zero-Copy OS optimization from Stanislav Kozlovski’s wonderful explanation in this LinkedIn post.
See a YouTube video showing how to build a streaming data pipeline to a cloud data warehouse for real-time analytics, which leverages Confluent Data Streaming Platform

A Droplet From Stack Overflow:

What’s the purpose of the CoProcessFunction in Apache Flink®?

David Anderson gives an answer, including an example of how you might encounter a need for the CoProcessFunction in the wild, in today’s droplet.

Got your own favorite Stack Overflow answer related to Flink or Kafka? Send it in to devx_newsletter@confluent.io!

Terminal Tip of the Week:

Let’s say your underlying dataset uses one of the reserved keywords in Confluent Cloud for Apache Flink SQL, like “blob”:

CREATE TABLE blob (
      fluid_density INT,
      acceleration INT,
      pressure INT
);

You might get an error like this:

Something went wrong.
SQL parse failed. Encountered "blob" at line 1, column 14.
Was expecting one of:
     <BRACKET_QUOTED_IDENTIFIER> ...
     <QUOTED_IDENTIFIER> ...
     <BACK_QUOTED_IDENTIFIER> ...
     <BIG_QUERY_BACK_QUOTED_IDENTIFIER> ...
     <HYPHENATED_IDENTIFIER> ...
     <IDENTIFIER> ...
     <UNICODE_QUOTED_IDENTIFIER> ...

Your best practice is to enclose the reserved keyword in backticks:

CREATE TABLE `blob` (
     fluid_density INT,
     acceleration INT,
     pressure INT,
    ....
);

Links From Around the Web:

You’ve heard of ambient music… what about Ambient Chaos?
Here are instructions for making a 3D printed handheld retro games console
A stunningly visual Dictionary of Typography

Upcoming Events:

Hybrid

Current 2024 (Sep 17-18): The agenda for Current 2024 has been released. Check it out! Register here to attend.

In-person

IN PERSON! Apache Kafka® Meetup Singapore (Jul 2024): Learn how stream processing solves two different problems - recording license plate data to database for a traffic pipeline, and creating a dynamic pricing engine for travelers.
Summer Munich Meetup: Apache Flink Q&A, Germany (Jul 30): A Q&A in the Beergarden with Flink community members.
Jakarta, Indonesia Jul 31, 2024: Learn about the differences between Zookeeper and KRaft

Online

NAM West Summer Apache Kafka® Meetup (Virtual): Sandon Jacobs, Senior Developer advocate at Confluent will talk about data stream quality and discuss best practices involved in making producer applications good stewards of data streams for downstream consumers.

By the way…

We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!

If you’d like to view previous editions of the newsletter, visit our archive.

If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.

P.S. If you want to learn more about Kafka, Flink, or Confluent Cloud, visit our developer site at Confluent Developer.

Subscribe Now

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Recent Newsletters

November 6, 2025

Kafka® 101

Apache Flink® SQL

Apache Flink® Table API: Processing Data Streams in Java

Designing Event-Driven Microservices

Apache Flink® 101

Building Flink® Apps in Java

Kafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Streamables

Learn More

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

Community Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2026

Past Current and Kafka Summit events