Data mesh is a new approach for designing modern data architectures by embracing organizational constructs as well as data-centric ones, data management, governance, etc. The idea is that data should be easily accessible and interconnected across the entire business.
Similar to the way that microservices are a set of principles for designing modern software architectures, data mesh is a much newer concept that’s so early in its evolution. While microservices have frameworks such as Spring Boot and Micronaut, along with defined practices, books, and all kinds of other infrastructure, there aren’t many training materials, tutorials, and guides for data mesh and how to use it in action.
While the concept is so new and may seem like an abstract concept at this moment, you’re doing the right thing by learning it now. In this data mesh tutorial, we’ll give you a complete overview of data mesh architecture, how it works, its benefits and use cases, examples, and how to get started.
To understand how data mesh works, we need to understand its four principles: data marts, domain-driven design (DDD), microservices, and event stream processing. Data mesh has its own unique set of four principles, however. You need to specifically adhere to these principles in order to gain its benefits.
The four principles are all technology agnostic, so they don't confine you to Java or Apache Kafka® or relational databases. A good way to think about data mesh is to frame the problem that it solves: a well-implemented data mesh lets you scale a data architecture in both a technological sense and an organizational sense. So data mesh is flexible in the sense that it lets you add more compute when necessary, but also in the sense that as a business evolves and changes and grows—along with the things that people want out of the data—it can accommodate them.
Data Ownership by Domain
In much the same way that microservices each own a specific business function, in data mesh, data is broken down around a specific business domain. Access to that data is decentralized; there's not a central data bureau, a data team, an analytics team, etc. Rather, there's a place where that data lives, probably next to its functionality, i.e. the microservices that produce the data. Access to the data is granted at that point.
Data as a Product
In data mesh, data is considered a product by each team that publishes it. A team owns data just like a team would own the set of services that implement the slice of the business that they support. That team has to engage in product thinking about the data: They're wholly responsible for the data, including its quality, its representation, and its cohesiveness.
Data is Available Everywhere, Self Serve
All data is available everywhere in the company by self-serve in data mesh. Governance is still an issue, but in principle, data products are published and they are available everywhere. If you're producing a sales forecast for Japan, for example, you could find all of the data that you need to drive that report—ideally in a few minutes. You'd be able to quickly get all of the data you need from all of the places it lives into a database or reporting system that you control.
Data is Governed Wherever it is
As mentioned, governance is still a concern in data mesh, but that concern is ideally driven down to the place where the data is created and produced. No data architecture is ever perfect or perfectly static; things always evolve and grow. Governance allows you to ring fence this, allowing you to trust and more quickly navigate data in the mesh, and believe—subject to governance restraints—that you can use the data you find.
Data architectures often lack rigor, and commonly evolve in an ad hoc way with minimal discipline and structure. A typical data architecture may look like this:
Fortunately, applying data mesh can turn such a point-to-point architecture into something more uniform and manageable—a kind of central nervous system for your data:
In fact, a nervous system is a great metaphor for data mesh because it is itself a mesh: Look inside a brain and you'll see a mesh of independent little products connected to one another. Similarly, with data mesh, an event in one part of your organization can immediately trigger an event in another part, since, in an analytics sense, they are all interconnected.
Data mesh solves a number of common problems. At the smaller scale, it addresses many of the issues seen with data pipelines, which often become brittle and problematic over time by creating their own webs and messy point-to-point kind of systems. It also addresses larger organizational issues, such as different departments in a company disagreeing on core facts of the business. In a data mesh, you’re less likely to have copies of facts. In both of these cases, a data mesh can bring much needed order to a system, resulting in a more mature, manageable, and evolvable data architecture.
As you've learned, data mesh has organizational and technological dimensions. You can think of a data mesh as a network for exchanging data about a business, about what's happening right now with nodes and with connections.
The nodes in the mesh are data products: a microservice, a database, an application, etc. The nodes produce high quality, locally curated data, within the mesh. Data is curated by the team that's building it for the benefit of the other nodes, and related nodes are grouped into domains (for example, “inventory” or “pipeline sensors”). The data flows from node to node, and thus from domain to domain, as needed.
The next module will get into the first principle of data mesh, but before proceeding, you might consider setting up a Confluent Cloud account, which you can do with the promo code
To get started, go to the Confluent Cloud sign up page. Enter your name, email address, and password, which you will use later to log in to Confluent Cloud. Select the Start Free button and watch your inbox for a confirmation email to continue. The link in your confirmation email will lead you to the next step, where you should choose a Basic clusters. Click Begin Configuration to choose your preferred cloud provider, region, and availability zone. Costs are shown on the dropdowns. Continue to set up your billing information. Click Review to get one last look at your choices (the associated costs are listed, but the startup amount of free usage will cover more than you need for this course), and then launch your new cluster.
Make sure to delete your cluster when you are finished so that you don't incur extra charges. To do this, go to Cluster settings on the left-hand side menu, then click Delete cluster. Enter your cluster name, then select Continue.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.