Data mesh might seem to be a really abstract concept at the moment, since it's still early days for the idea. Basically, data mesh is a set of principles for designing a modern data architecture, similar to the way that microservices are a set of principles for designing modern software architectures. However, while microservices have frameworks such as Spring Boot and Micronaut, along with defined practices, books, and all kinds of other infrastructure, data mesh is so early in its evolution that it has very little infrastructure. It's truly a new idea and you're doing the right thing by studying it now.
Data mesh inherits from a number of genetic influences: It overlaps with data marts, domain-driven design (DDD), microservices, and event streaming. Data mesh has its own unique set of four principles, however. You need to specifically adhere to these principles in order to gain its benefits.
The four principles are all technology agnostic, so they don't confine you to Java or Apache Kafka® or relational databases. A good way to think about data mesh is to frame the problem that it solves: a well-implemented data mesh lets you scale a data architecture in both a technological sense and an organizational sense. So data mesh is flexible in the sense that it lets you add more compute when necessary, but also in the sense that as a business evolves and changes and grows—along with the things that people want out of the data—it can accommodate them.
Data Ownership by Domain
In much the same way that microservices each own a specific business function, in data mesh, data is broken down around a specific business domain. Access to that data is decentralized; there's not a central data bureau, a data team, an analytics team, etc. Rather, there's a place where that data lives, probably next to its functionality, i.e. the microservices that produce the data. Access to the data is granted at that point.
Data as a Product
In data mesh, data is considered a product by each team that publishes it. A team owns data just like a team would own the set of services that implement the slice of the business that they support. That team has to engage in product thinking about the data: They're wholly responsible for the data, including its quality, its representation, and its cohesiveness.
Data is Available Everywhere, Self Serve
All data is available everywhere in the company by self-serve in data mesh. Governance is still an issue, but in principle, data products are published and they are available everywhere. If you're producing a sales forecast for Japan, for example, you could find all of the data that you need to drive that report—ideally in a few minutes. You'd be able to quickly get all of the data you need from all of the places it lives into a database or reporting system that you control.
Data is governed wherever it is
As mentioned, governance is still a concern in data mesh, but that concern is ideally driven down to the place where the data is created and produced. No data architecture is ever perfect or perfectly static; things always evolve and grow. Governance allows you to ring fence this, allowing you to trust and more quickly navigate data in the mesh, and believe—subject to governance restraints—that you can use the data you find.
Data architectures often lack rigor, and commonly evolve in an ad hoc way with minimal discipline and structure. A typical data architecture may look like this:
Fortunately, applying data mesh can turn such a point-to-point architecture into something more uniform and manageable—a kind of central nervous system for your data:
In fact, a nervous system is a great metaphor for data mesh because it is itself a mesh: Look inside a brain and you'll see a mesh of independent little products connected to one another. Similarly, with data mesh, an event in one part of your organization can immediately trigger an event in another part, since, in an analytics sense, they are all interconnected.
Data mesh solves a number of common problems. At the smaller scale, it addresses many of the issues seen with data pipelines, which often become brittle and problematic over time by creating their own webs and messy point-to-point kind of systems. It also addresses larger organizational issues, such as different departments in a company disagreeing on core facts of the business. In a data mesh, you’re less likely to have copies of facts. In both of these cases, a data mesh can bring much needed order to a system, resulting in a more mature, manageable, and evolvable data architecture.
As you've learned, data mesh has organizational and technological dimensions. You can think of a data mesh as a network for exchanging data about a business, about what's happening right now with nodes and with connections.
The nodes in the mesh are data products: a microservice, a database, an application, etc. The nodes produce high quality, locally curated data, within the mesh. Data is curated by the team that's building it for the benefit of the other nodes, and related nodes are grouped into domains (for example, “inventory” or “pipeline sensors”). The data flows from node to node, and thus from domain to domain, as needed.
The next module will get into the first principle of data mesh, but before proceeding, you might consider setting up a Confluent Cloud account, which you can do with the promo code DATAMESH101.
To get started, go to the Confluent Cloud sign up page. Enter your name, email address, and password, which you will use later to log in to Confluent Cloud. Select the Start Free button and watch your inbox for a confirmation email to continue. The link in your confirmation email will lead you to the next step, where you can choose between Basic, Standard, or Dedicated clusters (the associated costs are listed, but the startup amount will cover more than you need for this course). Click Begin Configuration to choose your preferred cloud provider, region, and availability zone. Costs are shown at the bottom of the screen. Continue to set up your billing information. Click Review to get one last look at your choices, and then launch your new cluster.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.