Sr. Director, Developer Advocacy (Presenter)
Lead Technologist, Office of the CTO (Author)
Principal Technologist (Author)
As with the first principle, data ownership by domain, federated governance is largely an organizational concern. The objective is to ensure that independent and autonomous teams, which own all of the data products within the mesh, can actually work together. You want to make sure that the various constituents of the mesh actually "mesh," so that you can generate real network effects and create value.
Predominantly, this has to do with creating global standards and applying those across the full mesh. So although you started with decentralization in the first and second principles, you now want to strike the right balance between decentralization and centralization in order to tie everything together. This isn't easy, and in practice, the execution will likely differ from organization to organization.
One important task is to ensure that the same data or data attribute is represented identically across domains. Otherwise, you couldn't join or correlate data (for example, customer data) across various domains in the company (since everyone has a different view of the customer). Data contracts, schemas, and schema registries will help you to implement and enforce the standards for the mesh.
Another issue is how data products should detect and recover from error. Here, you can use common strategies such as application logs, data profiling, and data lineage to unearth errors in the mesh. Streams are very useful in this context, because they capture live and historical data in a sequence of events, which lets you identify cause and effect relationships much more easily (particularly if you can join and correlate streams from different domains). Streams also let you recover and fix errors by replaying and reprocessing historical data.
There are a lot of streams in a data mesh and they may span across systems, technology platforms, data centers, clouds, and so on. For the purpose of tracking data lineage, you ideally want to cover the full mesh. Event streaming is a key technology for implementing this in practice, because it allows you to track data in motion all the way from its origins, through to its intermediate steps, to its final destination.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.