January 13, 2022 | Episode 194

From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine

  • Transcript
  • Notes

Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.

By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system. 

In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Kafka connectors to connect to various sources and target systems. 

  1. Kafka connectors: Building two major connectors for the data pipeline, including a source connector to connect the legacy data source to stream data into Kafka, and another target connector to pipe data from Kafka back into the legacy architecture. 
  2. Algorithm: Implementing Kafka Streams applications to migrate data from a monolithic architecture to a stream processing architecture. 
  3. Data join: Leveraging Kafka Connect and the JDBC source connector to bring in all data streams to complete the pipeline.
  4. Streams join: Using ksqlDB to join streams—the legacy data system continues to produce streams while the Kafka data pipeline is another stream of data. 

As a final tip, Danica suggests breaking algorithms into process steps. She also describes how her experience relates to the data pipelines course on Confluent Developer and encourages anyone who is interested in learning more to check it out. 

Continue Listening

Episode 195January 20, 2022 | 30 min

Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra

Maximizing cloud Apache Kafka performance isn’t just about running data processes on cloud instances. There is a lot of engineering work required to set and maintain a high-performance standard for speed and availability. Alok Nikhil (Senior Software Engineer, Confluent) and Adithya Chandra (Staff Software Engineer II, Confluent) share about their efforts on how to optimize Kafka on Confluent Cloud and the three guiding principles that they follow whether you are self-managing Kafka or working on a cloud-native system:

Episode 196January 24, 2022 | 4 min

Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs

Apache Kafka 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won’t want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics.

Episode 197January 27, 2022 | 31 min

Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela

In an effort to make Apache Kafka cloud native, Anna Povzener (Principal Engineer, Confluent) and Anastasia Vela (Software Engineer I, Confluent) have been working to expand multi-tenancy to cloud-native systems with automated capacity planning and scaling in Confluent Cloud. They explain how cloud-native data systems are different from legacy databases and share the technical requirements needed to create multi-tenancy for managed Kafka as a service.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free