A complete get-started tutorial on how to build a data mesh with step-by-step instructions.
VP Developer Relations
Lead Technologist, Office of the CTO (Author)
Principal Technologist (Author)
This module covers the typical journey you might go through when implementing a data mesh in your organization. It won't cover the technical nitty-gritty details (code, APIs, etc.), but will rather give guidance into how a data mesh architecture works. The hands on modules after this one will walk you through the prototype setup in module 2.
You can view the four principles of the data mesh, covered in previous modules, as an evolution:
For example, you wouldn't start with Principle 3, data available everywhere, as a self service, rather, you'd start with Principle 1, data ownership by domain. You'd move from there to Principle 2, data as a product, and so on. Each principle generally features an increasing level of difficulty, but as you move through each of the data mesh principles, your capabilities will develop as well.
Data mesh has a concrete implementation, but it is more importantly a set of ideas. Reducing those ideas to practice by building out the mesh is a journey. So if you get started now, you're probably not going to have a data mesh next week. But there will come a point when you can reasonably say "I've built a data mesh." There may not be an obvious threshold, but you will recognize when you've reached that point.
To begin a data mesh, you first need to get the necessary management commitment. Then start with some concrete initial use cases: Ideally, things that are contained, simple, and owned by high-capability, forward-looking teams. You also want them to be visible, i.e., you want results that you can show to the business.
While data mesh is a valuable concept, it's not everything. It works in conjunction with other important systems such as microservices and domain-driven design, as we've mentioned. So those other methods are most likely going to need to be a part of your work, alongside and sometimes even orthogonal to data mesh. Basically, you should apply the data mesh concepts as you see fit to gain the maximum benefit for your company.
Continue to the following modules to evaluate the data mesh prototype created in module 2, Hands On: Data Mesh Prototype Setup.
Hi, I'm Tim Berglund with Confluent. Welcome to data mesh module six, implementing a data mesh. In this final module I want to talk you through the typical journey that you'll go through implementing a data mesh in your organization. Now, this isn't going to be like a code level you know, here's the framework, and here's the API's used in this particular language. But it's more of architectural guidance for how to walk through this process. Recall that there are four principles of the data mesh which we've covered in the previous four modules. And you can really see these as an evolution. It's not like you'd start with principle three that data is available everywhere, right? I mean, you'd start with domain teams own data and then that's kind of coupled to product thinking, but you might move from there to product thinking. So you go through this 1, 2, 3, 4 sort of evolution. And four might be a little cross-cutting potentially, but I generally see kind of an increasing difficulty as we go through them. And that's not to say that you should be discouraged and life's going to get harder as you move on. 'Cause really your capabilities are developing. You can't just jump in to three. It's like you can't just run a marathon tomorrow if you've been fairly sedentary for the last year, you know. A marathon is a very difficult thing but you get there a piece at a time through training, and practice, and development of your capabilities. And that's kind of what we see in this journey. And, you know, I want to emphasize that data mesh is a concept. It has concrete implementation, but it's a set of ideas and reducing those ideas to practice by building out the thing. This building a data mesh in your company. Is a journey like you didn't have one next week. And if you get started, you're not going to have a data mesh next week. There will be some point at which it becomes reasonable and defensible for you to say, hey, I've built a data mesh here but that's not like there's some obvious threshold. It's like, you know how many grains of sand are required for there to be a pile of sand? It's not obvious like, you know, one when you see one, but the threshold is a little fuzzy. So anyway, to do this, you get the necessary management commitment. You know, you do this in the open. And start with some concrete initial use cases. Things that ideally are contained and simple, and maybe are owned by high capability forward-looking teams if possible. And things that are visible, right? You want to have results that you can show to the business as being meaningful. And then iterate from there from those initial use cases. And finally, while data mesh is a valuable concept it's not everything, right? It sits next to other important things like I've mentioned microservices and domain-driven design a number of times in this course. And those are important things also that are probably going to need to be a part of your life alongside maybe even sometimes orthogonal to data mesh. And with its focus on analytical data it's really just one of the ways that you can apply event streaming in Kafka to drive software architectural success and to create value in the business. I mean, you can use event streaming with microservices to implement strictly operational systems in an extremely powerful way quite apart from any analytics concerns. So, apply these concepts in the way that you see fit so you can make the technology useful in your company. To help you get started let's look at some concrete steps for implementing a data mesh in practice. Again, these are at the architectural level. We're not talking about code yet because that stuff's just not there in the data mesh world just yet. So things you want to have in place for a successful data mesh implementation. You want data in motion to be centralized. That's a central event streaming platform. I can think of a nice one Kafka, as the open source name for that confluent cloud as the fully managed service. That seems like a good idea to me as a potentially biased source of that opinion. But yeah, I mean, you need that centralized platform. You want to nominate data owners. You need concrete owners for the key data sets in the organization. Now, you don't need concrete owners for everything, but you do want concrete owners for the things that you're chiseling off at first. And you want to make that information that ownership information broadly accessible. This team owns this product. That team that owns that product. You want to publish data on demand stored in Kafka indefinitely, ideally in very long retention period topics, or you know data that can somehow be republished by data products on demand. You want a system for schema change, all right. Owners are going to publish schema. That's a thing that they need to own and document. Publish it to the mesh. And the form that that takes is it a Wiki? Is it data extracted from Confluent cloud schema registry and transformed into an HTML document? You know, these are all things that can be done. There's lots of ways to get this done, but it needs to be done. That schema stuff needs to be published. There's also a process for schema change where we can negotiate that. Is that a pull request on a GitHub repo? Is it a meeting? That'll just depend on your own local organizational flavor. You want to secure event streams. This is governance. So you want access to individual events streams to be permissioned by a central authority. There are probably regulatory concerns here. So, there may be literal laws enforced by a literal government. If you're a public company in the United States there are systems you're going to have to have in place to do this. And likewise in other jurisdictions. So you wanna be able to get data in from any database. There are source and sync connectors made available for many, many supported database types. And you want to make sure that that connection exists for you in Kafka connect just to make provisioning of new output ports and input sources, to be easier. Finally, a central user interface for discovery and registration of new event streams. Again, this can take any form. Can this be an application that you create? Yes. Can it be a Wiki? I keep saying Wiki and I feel like I should stop, but you know, for an extremely pragmatic first step that's very, very simple it can be that. But ultimately you're going to need to support searching for schemas for data of interest. You wanna be able to preview event stream. Certainly that's functionality. That's not going to be in the day zero Wiki form and to request access to new event streams that are governed and might be out of your control. And of course, just to see some idea of data lineage that's a thing in a mature tool that you're gonna have a way of exploring. Now, I think I said earlier in this course I'm recording this early in the summer of 2021. I'm not sure when you're watching it. But if you're doing this work if you're working on an implementation relatively soon after this recording you are on the leading edge. This is new stuff. I think it's important stuff. I think five years from now we're going to look back at this as the early days and these early adopters as risk-taking visionaries who built out the expertise, and the knowledge, and paved the path for the frameworks that we'll be able to use five years from now. And the products that evolved around this. I think this technology is a thing that really is going to have legs. So if you're walking through these steps and thinking about these things, and blazing this trail you really are making a significant contribution to the community. So, I want to say thank you for that. And I really look forward to hearing the kinds of things you build.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.