Data Mesh

Learn about building a data mesh on event streams with Apache Kafka® and Confluent.

What Is Data Mesh?

Data mesh is a concept that ensures data access, governance, federation, and interoperability across distributed teams and systems. First introduced by Zhamak Dehghani in How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh, data mesh aims to solve scaling failures in centralized approaches in data systems. As companies become software, there is a general trend towards decentralized data ownership to enable scale.

data-mesh-decentralized-ownership

How Data Mesh Works

Data mesh architecture is not simply a type of data architecture but a new approach for designing modern data architectures by embracing organizational constructs as well as data-centric ones, data management, governance, etc. To understand how it works, we need to understand the four principles of data mesh: domain-driven design (DDD), data as a product, data access, and federated data governance. Confluent's free Data Mesh 101 course is a detailed video-based learning path for exploring data mesh concepts. The course covers in detail the four key principles of data mesh.

4 Principles of Data Mesh Architecture

  • Domain ownership: Responsibility over modeling and providing important data is distributed to the people closest to it, providing access to the exact data they need, when they need it.
  • Data as a product: Data is treated as a product like any other, complete with a data product owner, consumer consultations, release cycles, and quality and service-level agreements.
  • Self-service: Empower consumers to independently search, discover, and consume data products. Data product owners are provided standardized tools for populating and publishing their data product.
  • Federated governance: This is embodied by a cross-organization team that provides global standards for the formats, modes, and requirements of publishing and using data products. This team must maintain the delicate balance between centralized standards for compatibility and decentralized autonomy for true domain ownership.

Why a Data Mesh? Benefits and Use Cases

Data mesh solves a number of common problems. At the smaller scale, it addresses many of the issues seen with data pipelines, which often become brittle and problematic over time by creating their own webs and messy point-to-point kind of systems. It also addresses larger organizational issues, such as different departments in a company disagreeing on core facts of the business. In a data mesh, you’re less likely to have copies of facts. In both of these cases, a data mesh can bring much needed order to a system, resulting in a more mature, manageable, and evolvable data architecture.

Building a Data Mesh

The Definitive Guide to Building a Data Mesh with Event Streams blog post provides a summary of the foundational concepts and includes a data mesh prototype that you can build and run yourself.

data-mesh-prototype-preview

The data mesh prototype is built on Confluent Cloud using event streams, ksqlDB, and the fully managed data catalog API. The prototype's GitHub repository has all the details.

More Data Mesh Resources

Reach Out

Want to learn more about data mesh and how to use it? For questions, feedback, or to open a discussion on data mesh, sign up and engage with the friendly Confluent Community. For more resources like this, be sure to explore all of Confluent Developer and find complete courses, tutorials, examples, and more.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free