Get Started Free

Data Governance and Federation

Federated data governance ensures that autonomous, distributed teams can own the data within their domain while meshing with others for a true data mesh.

5 min
course: Data Mesh 101

Data Governed Wherever It Is

5 min
Untitled design (21)

Tim Berglund

VP Developer Relations

Ben Stopford

Ben Stopford

Lead Technologist, Office of the CTO (Author)

Michael Noll

Michael Noll

Principal Technologist (Author)

Data Governed Wherever It Is

As with the first principle, data ownership by domain, federated governance is largely an organizational concern. The objective is to ensure that independent and autonomous teams, which own all of the data products within the mesh, can actually work together. You want to make sure that the various constituents of the mesh actually "mesh," so that you can generate real network effects and create value.

Predominantly, this has to do with creating global standards and applying those across the full mesh. So although you started with decentralization in the first and second principles, you now want to strike the right balance between decentralization and centralization in order to tie everything together. This isn't easy, and in practice, the execution will likely differ from organization to organization.

data-mesh-governance

Example Standard: Data Contracts, Schemas, and Schema Registries

One important task is to ensure that the same data or data attribute is represented identically across domains. Otherwise, you couldn't join or correlate data (for example, customer data) across various domains in the company (since everyone has a different view of the customer). Data contracts, schemas, and schema registries will help you to implement and enforce the standards for the mesh.

data-contracts-schemas-registry

Example Standard: Error Detection and Recovery

Another issue is how data products should detect and recover from error. Here, you can use common strategies such as application logs, data profiling, and data lineage to unearth errors in the mesh. Streams are very useful in this context, because they capture live and historical data in a sequence of events, which lets you identify cause and effect relationships much more easily (particularly if you can join and correlate streams from different domains). Streams also let you recover and fix errors by replaying and reprocessing historical data.

data-mesh-governance-error-detection-recovery

Example Standard: Tracking Data Lineage

There are a lot of streams in a data mesh and they may span across systems, technology platforms, data centers, clouds, and so on. For the purpose of tracking data lineage, you ideally want to cover the full mesh. Event streaming is a key technology for implementing this in practice, because it allows you to track data in motion all the way from its origins, through to its intermediate steps, to its final destination.

data-mesh-tracking-data-lineage

Recommendations for Federated Governance

  • You need the right mindset: Be pragmatic, and don't expect the governance system to be perfect. It only needs to be "sufficient" (a word whose definition is often answered by legal counsel). There will always be parts of a data architecture that aren't covered or that were just recently changed.
  • Governance is more of an ongoing organizational process than a technology.
  • There are things that you will need to centralize, but beware of centralized data models, which can become too slow to change. Use processes and tooling such as GitHub to collaborate and change quickly.

Use the promo code DATAMESH101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.