The first principle of data mesh is data ownership by domain, or domain-driven decentralization. This principle simply requires that data be owned by those who understand it best. It's a reaction to the years that most companies have spent doing things the other way: putting everything into a centralized data warehouse.
To illustrate the data mesh pattern, it's most effective to begin with its antipattern: the data warehouse. A typical data warehouse implementation has many data sources spread across a company, with varying levels of quality. There will be many ETL jobs, which are possibly running in different systems and pulling data sets back to the central data warehouse. The data warehouse teams will often be required to clean up and fix a lot of the data. If you've ever done this kind of work, you know that 95% of the work is cleaning. Extracting and loading takes up the little bit of remaining time.
This centralized approach cuts across domains, or units of business, by optimizing for a common set of technological skills, rather than a set of business skills: The team running the data warehouse usually doesn't understand the data sets very well because their focus is running the data warehouse as expertly as possible. Even if they do have some familiarity with the data, they are never going to know it as well as the domain teams where the data originates.
Another challenge is that the source systems that are feeding data into the data warehouse don't always behave particularly well. They haven't been built with responsible data sharing in mind, since the builders are being pressed to build application features, not thinking about the best possible way to make the data shareable.
So instead of optimizing horizontally as with a data warehouse, you should optimize vertically.
In a data mesh, ownership of an asset is given to the local team that's most familiar with it—the ones who are intimately familiar with its structure, its purpose, and its value. In this decentralized approach, many parties work together to ensure excellent data. The parties that own the data have the responsibility to be good stewards of that data, and they are explicitly identified and known to the rest of the organization.
Imagine Alice, at the top of this diagram, is working on an orders domain. Joe, at the bottom of the diagram, is working on inventory management, and he gets some bad inventory data from Alice.
The first thing Joe does is try to fix the data locally for himself. While he may be successful, the problem is that this only fixes the data for his service. Effectively, by fixing the data locally, he couples his inventory product to his local fixes. So there will be a problem if Alice decides to fix the product on her end after Joe has made these local changes.
A better solution is for Alice to fix the data on her end, which will eliminate Joe's problem. This is for the benefit of not just Joe, but anyone else who might want to consume Alice's data.
So for the data mesh to work, there are a couple of requirements:
There must be a process that allows Joe and Alice to communicate quickly.
There must be a sense of responsibility amongst members of the mesh: Alice needs to be both willing and able to make her changes quickly, which will ensure that her data in the mesh is always of good quality (there should be incentives in place to ensure that this happens).
Similar to domain-driven design practices, a standard language and nomenclature should be used for all of the data in your decentralized data mesh. This ensures that your streams of events will create a shared narrative that all business users can understand. (All of the various flows through the mesh should be comprehensible to non-technical people).
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.