Get Started Free

Current 2025 Bengaluru - Registrations Open!

February 20, 2025

Current 2025 Bengaluru - Registrations Open!

Current 2025 Bengaluru, the first edition of the world’s most popular data streaming event, will take place on the March 19th, 2025, at Sheraton Grand Bengaluru Whitefield.

Registrations are now open for everyone! Register now, to take advantage of a whopping 50% discount, with a Super Early Bird Pass!

current bengaluru rsvp

Data Streaming Resources:

  • Freight clusters are generally available for Confluent customers using AWS. You can now spin up Freight clusters in Confluent Cloud, and immediately take advantage of their unparalleled cost savings, enabling you to onboard more data-intensive workloads to Confluent Cloud in a budget-friendly manner. Learn more about massively reducing Apache Kafka® workload costs, here.

  • BYOC (Bring-Your-Own-Cloud) has taken off big time with WarpStream. Consequently, streaming data governance for WarpStream has also become absolutely essential. The WarpStream Data Governance (schema) product page is now live! Learn more about how to govern streaming data workloads with WarpStream agents.

  • Learn how to use the Flink Table API with Confluent Cloud for Apache Flink® with this blog from Martijn Visser from Confluent. Table API enables developers to express complex processing logic in a declarative manner, using a fluent API in Java or Python.

  • Check out the most viewed documents on Confluent Cloud and Confluent Platform:

Links From Around the Web:

  • Anthropic has just launched an initiative aimed at understanding AI's effects on labor markets and the economy over time. The Index’s initial report provides first-of-its-kind data and analysis, based on millions of anonymized conversations on Claude, revealing the clearest picture yet of how AI is being incorporated into real-world tasks across the modern economy. Read it here.

  • Cursor, the AI Code Editor, has become the fastest growing SaaS in the history of SaaS. Read a reaction on X, here.

  • Read a thought-provoking blog written by Apurva Mehta from Responsive, on what makes Apache Kafka a great base layer for applications, the technology landscape for building Kafka applications, and when to use Kafka Streams for your applications.

Catalyst Insight:

In our brand-new “Catalyst Insight” section, we ask catalysts from the data streaming community to share their experiences.

In this edition, we ask Amandeep Midha to give his insights. Amandeep is a Senior Software Engineer at Hybrid Greentech Energy Storage Intelligence in Denmark. Prior to this, he worked as a Principal IT Consultant, Data Architect and CTO in the Fintech & Banking space in Nordics, helping to integrate Banking systems with event streaming, and developing and upskilling engineering talent towards migrating platforms from legacy to Cloud.

cflt comminity spotlight

How would you describe your role in the data world? Not necessarily as in your title, but what unique perspective and experiences do you bring?

I bring a pragmatic perspective, rooted in years of experience, where I’ve seen technologies evolve from on-prem databases to cloud-native streaming systems. I’ve learned the importance of balancing cutting-edge solutions with real-world constraints like cost, scalability, and human understanding.

Can you tell us the story of an interesting data streaming bug you ran into and solved at one point?

In a financial project related to data decoupling, we aimed to decouple data from legacy systems into Kafka to support an Anti-Money Laundering (AML) initiative. The goal was to send anonymized account and transaction information to external vendors, while master data remained on the mainframe. We needed to refine, enrich, aggregate, and anonymize the data before sending it for real-time fraud detection.

We faced issues when preparing the data for third-party vendors. Missing and duplicate transactions compromised the integrity of the AML reports. Kafka’s auto-commit offsets prematurely marked data as processed, and consumer lag during peak periods led to unprocessed messages. Additionally, inconsistencies in the data anonymization process raised compliance concerns. The complexity was further compounded by the need to orchestrate this pipeline using Kubernetes for scalability and resource management.

The Solution

  • Manual offset management: We implemented manual offset commits, ensuring that only fully processed and anonymized data was committed to Kafka.
  • Optimized consumer load balancing: We adjusted partitioning and consumer strategies to mitigate lag and avoid rebalances, ensuring smoother data flows.
  • Anonymization improvements: The anonymization layer was enhanced to ensure that data was consistently anonymized before being sent to vendors, guaranteeing compliance.
  • Replayability: Kafka’s ability to replay data enabled us to backfill missing transactions without data loss.
  • Identifying retriable errors: By tracking retriable errors, we were able to gracefully handle failures, achieving a state machine behavior, ensuring data flow was uninterrupted during transient issues. Kubernetes helped orchestrate and monitor the entire pipeline, enabling scalability and resilience during peak loads.

Key Takeaways

  • Manual offset management is critical for ensuring data integrity in compliance-sensitive projects.
  • Kafka’s replayability and retriable error handling are essential for recovering missed data, and maintaining pipeline reliability.
  • Optimizing Kafka consumers and anonymization layers is crucial for accurate, compliant, and high-throughput data flows in a high-stakes environment like financial services.
  • Kubernetes plays a key role in ensuring the scalability and resilience of Kafka-based pipelines during peak data loads.

What advice would you offer a burgeoning data streaming engineer?

  1. Understand what you’re trying to solve:
    Start with clarity. What’s the real problem you’re addressing? Is it real-time data integration, event processing, or system resilience? Grasp the CAP theorem to understand distributed system trade-offs, and make informed architectural decisions.
  2. Know the ecosystem:
    Dive into the connectors, frameworks, and tools that interact with Kafka. Learn their origins—TIBCO, WebMethods, Fivetran, Debezium, and others—and what problems they were designed to solve. This historical and functional understanding will guide you to select the right tools for your scenario.
  3. Master the fundamentals before scaling complexity:
    Begin with stateless operations like filtering and mapping, and stateful operations like joins and aggregations. Build a strong foundation here before tackling more complex stateful operations like windowed joins, event-time processing, and checkpointing. Ensure you incorporate observability, debuggability, and replayability into your pipelines early—these will save you time and effort when scaling or troubleshooting.

Want to learn more about our Confluent Community Catalyst Program? Visit the page here to get all of the details!

Upcoming Events:

In-Person Meetups:

Stay up to date with all Confluent-run meetup events - by copying the following link into your personal calendar platform:

https://airtable.com/app8KVpxxlmhTbfcL/shrNiipDJkCa2GBW7/iCal

(Instructions for GCal, iCal, Outlook, etc.

By the way…

We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!

If you’d like to view previous editions of the newsletter, visit our archive.

If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.

P.S. If you want to learn more about Kafka, Flink, or Confluent Cloud, visit our developer site at Confluent Developer.

Subscribe Now

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Recent Newsletters