March 10, 2022 | Episode 203

Why Data Mesh? ft. Ben Stopford

  • Transcript
  • Notes

With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh.   

Unlike standard data architecture, data mesh is about moving data away from a monolithic data warehouse into distributed data systems. Doing so will allow data to be available as a product—this is also one of the four principles of data mesh: 

  1. Data ownership by domain
  2. Data as a product
  3. Data available everywhere for self-service
  4. Data governed wherever it is

These four principles are technology agnostic, which means that they don’t restrict you to a programming language, Apache Kafka®, or other databases. Data mesh is all about building point-to-point architecture that lets you evolve and accommodate real-time data needs with governance tools.

Fundamentally, data mesh is more than a technological shift. It’s a mindset shift that requires cultural adaptation of product thinking—treating data as a product instead of data as an asset or resource. Data mesh invests ownership of data by the people who create it with requirements that ensure quality and governance. Because data mesh consists of a map of interconnections, it’s important to have governance tools in place to identify data sources and provide data discovery capabilities. 

There are many ways to implement data mesh, event streaming being one of them. You can ingest data sets from across organizations and sources into your own data system. Then you can use stream processing to trigger an application response to the data set. By representing each data product as a data stream, you can tag it with sub-elements and secondary dimensions to enable data searchability. If you are using a managed service like Confluent Cloud for data mesh, you can visualize how data flows inside the mesh through a stream lineage graph. 

Ben also discusses the importance of keeping data architecture as simple as you can to avoid derivatives of data products.

Continue Listening

Episode 204March 15, 2022 | 41 min

Handling 2 Million Apache Kafka Messages Per Second at Honeycomb

In this episode, you’ll get a taste of how Apache Kafka is used at Honeycomb! Liz Fong-Jones (Principal Developer Advocate, Honeycomb) explains how Honeycomb manages Kafka-based telemetry ingestion pipelines and scales Kafka clusters. Honeycomb is an observability platform that helps you visualize, analyze, and improve cloud application quality and performance. Their data volume has grown by a factor of 10 throughout the pandemic, while the total cost of ownership has only gone up by 20%.

Episode 205March 22, 2022 | 42 min

Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole

Data availability, usability, integrity, and security are words that we sometimes hear a lot. But what do they actually look like when put into practice? That’s where data governance comes in. This becomes especially tricky when working with real-time data architectures.

Episode 206March 29, 2022 | 23 min

Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs

What is GraphQL? And how can you combine GraphQL with Apache Kafka to query data in real time? With over 10 years of experience as a backend engineer, Gerard Klijs is a Confluent Community Catalyst, a contributor to the GraphQL project, and also a creator and maintainer of a Rust library to use Confluent Schema Registry with Java client. In this episode, he explains why you want to use Kafka with GraphQL and how they work together to bridge the gap between backend and frontend to make data more easily accessible in the frontend.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free