Learn how to build a data mesh architecture that enables decentralization, governance, domain ownership, and interoperability across distributed systems and teams.
VP Developer Relations
Lead Technologist, Office of the CTO (Author)
Principal Technologist (Author)
Data mesh is a new approach for designing modern data architectures by embracing organizational constructs as well as data-centric ones, data management, governance, etc. The idea is that data should be easily accessible and interconnected across the entire business.
Similar to the way that microservices are a set of principles for designing modern software architectures, data mesh is a much newer concept that’s so early in its evolution. While microservices have frameworks such as Spring Boot and Micronaut, along with defined practices, books, and all kinds of other infrastructure, there aren’t many training materials, tutorials, and guides for data mesh and how to use it in action.
While the concept is so new and may seem like an abstract concept at this moment, you’re doing the right thing by learning it now. In this data mesh tutorial, we’ll give you a complete overview of data mesh architecture, how it works, its benefits and use cases, examples, and how to get started.
To understand how data mesh works, we need to understand its four founding principles: data as a product, domain ownership, self-service, and federated governance.
The four principles are all technology agnostic, so they don't confine you to Java or Apache Kafka® or relational databases. A good way to think about data mesh is to frame the problem that it solves: a well-implemented data mesh lets you scale a data architecture in both a technological sense and an organizational sense. So data mesh is flexible in the sense that it lets you add more compute when necessary, but also in the sense that as a business evolves and changes and grows—along with the things that people want out of the data—it can accommodate them.
Data Ownership by Domain
In much the same way that microservices each own a specific business function, in data mesh, data is broken down around a specific business domain. Access to that data is decentralized; there's not a central data bureau, a data team, an analytics team, etc. Rather, there's a place where that data lives, probably next to its functionality, i.e. the microservices that produce the data. Access to the data is granted at that point.
Data as a Product
In data mesh, data is considered a product by each team that publishes it. A team owns data just like a team would own the set of services that implement the slice of the business that they support. That team has to engage in product thinking about the data: They're wholly responsible for the data, including its quality, its representation, and its cohesiveness.
Data is Available Everywhere, Self Serve
All data is available everywhere in the company by self-serve in data mesh. Governance is still an issue, but in principle, data products are published and they are available everywhere. If you're producing a sales forecast for Japan, for example, you could find all of the data that you need to drive that report—ideally in a few minutes. You'd be able to quickly get all of the data you need from all of the places it lives into a database or reporting system that you control.
Data is Governed Wherever it is
As mentioned, governance is still a concern in data mesh, but that concern is ideally driven down to the place where the data is created and produced. No data architecture is ever perfect or perfectly static; things always evolve and grow. Governance allows you to ring fence this, allowing you to trust and more quickly navigate data in the mesh, and believe—subject to governance restraints—that you can use the data you find.
Data architectures often lack rigor, and commonly evolve in an ad hoc way with minimal discipline and structure. A typical data architecture may look like this:
Fortunately, applying data mesh can turn such a point-to-point architecture into something more uniform and manageable—a kind of central nervous system for your data:
In fact, a nervous system is a great metaphor for data mesh because it is itself a mesh: Look inside a brain and you'll see a mesh of independent little products connected to one another. Similarly, with data mesh, an event in one part of your organization can immediately trigger an event in another part, since, in an analytics sense, they are all interconnected.
Data mesh solves a number of common problems. At the smaller scale, it addresses many of the issues seen with data pipelines, which often become brittle and problematic over time by creating their own webs and messy point-to-point kind of systems. It also addresses larger organizational issues, such as different departments in a company disagreeing on core facts of the business. In a data mesh, you’re less likely to have copies of facts. In both of these cases, a data mesh can bring much needed order to a system, resulting in a more mature, manageable, and evolvable data architecture.
As you've learned, data mesh has organizational and technological dimensions. You can think of a data mesh as a network for exchanging data about a business, about what's happening right now with nodes and with connections.
The nodes in the mesh are data products: a microservice, a database, an application, etc. The nodes produce high quality, locally curated data, within the mesh. Data is curated by the team that's building it for the benefit of the other nodes, and related nodes are grouped into domains (for example, “inventory” or “pipeline sensors”). The data flows from node to node, and thus from domain to domain, as needed.
The next module is the first step in a hands-on guide to building a data mesh prototype. If you'd like to perform the hands on aspects of the course, you will need a Confluent Cloud account.
To get started, go to the Confluent Cloud sign up page. Enter your name, email address, and password, which you will use later to log in to Confluent Cloud. Select the Start Free button and watch your inbox for a confirmation email to continue. New accounts are given $400 of free usage within the first 30 days (this is subject to change). In addition, be sure to apply the promo code DATAMESH101 for an additional $25 of free cloud usage (details).
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Tim Berglund with Confluent. Welcome to this course on Data Mesh. In it, we're going to learn what a Data Mesh even is. Some of the challenges and trade-offs faced during the implementation of one, and the best approaches maybe for trying to build one out in your organization. Now Data Mesh seems to be a really abstract concept right now in these early days. What it is really is a set of principles for designing a modern data architecture. And this is similar to the way microservices are a set of principles for modern software architectures and microservices have frameworks. You know, there's spring boot, there's micronaut in the Java world. There's there's practices, there's books, there's all kinds of things. People have built them. And Data Mesh really is conceptually very similar, but it's a lot earlier in its evolution. It's really a new idea. And one, I think you're doing the right thing by studying right now. There are four principles that comprise Data Mesh and they're all technology agnostic, okay. They're they're not like, you know, Data Mesh is not a Java thing or a Kafka thing or a relational database thing. All these principles, they don't kinda, they don't really care what technology you're using. The best way to think about Data Mesh is to frame the problem that it solves. A well-implemented Data Mesh lets you scale a data architecture in two ways. First the, the, the crassly, you know, the, the, the technology scale way, like, can I, can I add enough computers to make it run fast enough, right? So it's, it's fundamentally scalable in that way. And can I scale it across an organization as a business evolves and change changes and grows and the things people want out of data change? And we're, we're pushing that data down to the operational level where the jobs of individual contributors are affected by data products and not just reports that sit on the desk of important people. So this is, this is some idea of the way for you to think conceptually about Data Mesh. Now it has many genetic influences, the things that have come before it that, that influence it. Things like data marts, domain-driven design. If you're a DDD person, you're going to see those ideas all over the place here. Microservices again, strong analogy. I think the best analogy really is, is Data Meshes to data and analytics. What microservices is to application architecture. Personally, I may be a little bit biased here, but I can't look at Data Mesh and not think about Kafka and not think about event streaming. I think you'll see that as we go on. Now, we're going to get into more detail throughout this course. I have concepts and high-level ideas I need to get across to you, but we'll try to make it concrete now. For now though, let's talk a little bit about those influences and how they combine. I think really the culminate in four principles that govern how data is managed in the mesh. Not surprisingly, you're looking at the outline for the middle part of this course here. So don't be surprised if you see modules named this as we move on. First of all, data is always owned by a specific domain in much the same way that microservices each own a specific business function that data here in Data Mesh is also broken down around a specific domain in the business. Access to that data is then decentralized. So there's not a data bureau, a data team , an analytics team, anything like that. There's a place where that data lives, probably coextensive with the functionality that is the, the microservices that produce that data and the access to that data is granted at that point. Second, data is considered a product by each team that publishes it. I kind of just implied in the first principle that that a team owns data, just like a team would own the set of services that implement the, the slice of the business that they're supporting. Well, same thing here. And that team has to engage in product thinking about the data. They're wholly responsible for it. Its quality, its representation, its cohesiveness. Like it was a thing that they were, I don't know, sculpting out of glass and polishing and putting in a storefront. They just want that thing to look nice and to be useful. So third, data is available everywhere and self-serve anywhere in the company. Now I hear what you're saying. You're thinking about governance. Governance is still a thing, but in principle, these data products are published and they're available everywhere. So if you're producing a sales report for a sales forecast for Japan, you can find and source all of the data you need to drive that report, ideally in a little bit, a small amount of time, a few minutes. So getting that data from all the places where it lives into some database or some reporting system that you have control of. Fourth, data is governed where it is. Now, governance is always a concern of course, and that concern is going to be driven down ideally to the place where the data is created and produced. Now no data architecture is perfect or ever perfectly static. Things always evolve and grow. Governance kind of ring fences this, allowing you to trust and more quickly navigate data in the mesh and believe subject to these governance constraints that you can use the stuff you find. Now that picture probably looks like things you've built or used or seen. We all know data architectures often lack rigor and they evolve in an ad hoc way. Often maybe without as much discipline or structure as we'd like. Applied well, Data Mesh takes this messy architecture of this, this kind of, of architecture, which we've all seen and turns it into something more uniform and more manageable, a kind of central nervous system for the data in your organization. And that's a great metaphor because nervous systems are themselves meshes, right? I mean, you, you think of, okay, there's a brain and there's nerves that go out to the parts of the body. But look inside say that brain. That really is a mesh of independent little products, you know, connected to one another. And that's kind of the analogy we're going for here. So if something happens in one part of your organization, it can immediately trigger something to happen in another part. This approach solves a number of common problems. At the smaller scale, it addresses many of the issues we see with data pipelines, which often become brittle and problematic over time by kind of creating their own web and, and, and messy point to point to point kind of system. It also addresses larger organizational issues like different departments and a company disagreeing on core facts of the business. We're going to be less likely in a Data Mesh to have those copies of facts. Of course it's still possible, but this mass, this mesh approach discourages them. In both cases, the Data Mesh principles when applied correctly, can bring much needed order to a system. This results in a more mature, more manageable, hopefully more evolvable data architecture. And as you've noticed, this concept of Data Mesh has organizational and technological dimensions. Essentially we can think of Data Mesh as a network to exchange data about the business, about what's happening right now with nodes and with connections. Now, the nodes in the mesh are the data products. We're going to learn more about them later in their own lesson, but for now let's just imagine a node could be a microservice, a database and application, something like that, just to keep it concrete. So our feet are still on the ground here. The nodes function in such a way as to produce and consume high-quality locally curated data within the mesh. So that data is again curated, productized by the team that's building it for the benefit of the other nodes. Related nodes, those nodes that are, that are connected are grouped into domains. A domain might be inventory or orders or pipeline sensors or something like that. And the data flows node to node and thus from domain to domain as needed. So that's a quick look at what a Data Mesh is. We're going to dive into the first principle of Data Mesh in more detail as we explore data ownership by different domains. But first let me get you geared up to try things out in Confluent cloud. So note this code you see on the slide right now, DATAMESH101, no spaces. They're all capitals DATAMESH101. And the URL you see on the screen QR code if you would like. so now you could probably figure out how to sign up for Confluent cloud, but you know what? I'm still going to walk you through it and show you a little screencast of it. So to get started you go to the URL on the screen, then enter your name and email and password you'd like. I'm going to trust you to do those things. And this email and password, this is your account for Confluent cloud. So like save those things, put them in your password manager or a post-it note on your monitor, whatever it is you do. Please, please don't really do the post-it note thing. The stuff that would be that would bad. So click the start free button and watch your inbox for a confirmation email. Once you get it, click the link in the confirmation email, it takes you to the next step where you can choose between a basic, standard or dedicated cluster. Now, the UI here is going to tell you something about the difference between those and their associated costs, differences and everything. If in general, if you're taking a course here on conflict developer, in most cases, basic clusters work, and those are the cheapest and that you'll get the most out of your free trial money there. So click begin configuration to choose your preferred cloud provider, region availability zone. All of the basic stuff. Costs will vary with those choices, but they're all clearly shown on the screen. So you can see what your provider and region is going to cost you. Then continue to set up billing info. And here you see you get a chunk of free usage for your first three months. And also the DATAMESH101 code that gets you an additional $101, because we honestly like you to be able to do the exercise here in this course without having to pay for it. Click review one last time, make sure everything looks good, then launch your new cluster. There you are. You're ready to go. Here's that promo code again just in case. Go sign up and we'll see you in the next lesson.