The first principle of data mesh is data ownership by domain, or domain-driven decentralization. This principle simply requires that data be owned by those who understand it best. It's a reaction to the years that most companies have spent doing things the other way: putting everything into a centralized data warehouse.
To illustrate the data mesh pattern, it's most effective to begin with its antipattern: the data warehouse. A typical data warehouse implementation has many data sources spread across a company, with varying levels of quality. There will be many ETL jobs, which are possibly running in different systems and pulling data sets back to the central data warehouse. The data warehouse teams will often be required to clean up and fix a lot of the data. If you've ever done this kind of work, you know that 95% of the work is cleaning. Extracting and loading takes up the little bit of remaining time.
This centralized approach cuts across domains, or units of business, by optimizing for a common set of technological skills, rather than a set of business skills: The team running the data warehouse usually doesn't understand the data sets very well because their focus is running the data warehouse as expertly as possible. Even if they do have some familiarity with the data, they are never going to know it as well as the domain teams where the data originates.
Another challenge is that the source systems that are feeding data into the data warehouse don't always behave particularly well. They haven't been built with responsible data sharing in mind, since the builders are being pressed to build application features, not thinking about the best possible way to make the data shareable.
So instead of optimizing horizontally as with a data warehouse, you should optimize vertically.
In a data mesh, ownership of an asset is given to the local team that's most familiar with it—the ones who are intimately familiar with its structure, its purpose, and its value. In this decentralized approach, many parties work together to ensure excellent data. The parties that own the data have the responsibility to be good stewards of that data, and they are explicitly identified and known to the rest of the organization.
Imagine Alice, at the top of this diagram, is working on an orders domain. Joe, at the bottom of the diagram, is working on inventory management, and he gets some bad inventory data from Alice.
So for the data mesh to work, there are a couple of requirements:
Similar to domain-driven design practices, a standard language and nomenclature should be used for all of the data in your decentralized data mesh. This ensures that your streams of events will create a shared narrative that all business users can understand. (All of the various flows through the mesh should be comprehensible to non-technical people).
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.