Get Started Free
‹ Back to courses

Data Governance and Federation

Federated data governance ensures that autonomous, distributed teams can own the data within their domain while meshing with others for a true data mesh.

5 min
‹ Back to courses
course: Data Mesh 101

Data Governed Wherever It Is

5 min
Untitled design (21)

Tim Berglund

VP Developer Relations

Ben Stopford

Ben Stopford

Lead Technologist, Office of the CTO (Author)

Michael Noll

Michael Noll

Principal Technologist (Author)

Data Governed Wherever It Is

As with the first principle, data ownership by domain, federated governance is largely an organizational concern. The objective is to ensure that independent and autonomous teams, which own all of the data products within the mesh, can actually work together. You want to make sure that the various constituents of the mesh actually "mesh," so that you can generate real network effects and create value.

Predominantly, this has to do with creating global standards and applying those across the full mesh. So although you started with decentralization in the first and second principles, you now want to strike the right balance between decentralization and centralization in order to tie everything together. This isn't easy, and in practice, the execution will likely differ from organization to organization.

data-mesh-governance

Example Standard: Data Contracts, Schemas, and Schema Registries

One important task is to ensure that the same data or data attribute is represented identically across domains. Otherwise, you couldn't join or correlate data (for example, customer data) across various domains in the company (since everyone has a different view of the customer). Data contracts, schemas, and schema registries will help you to implement and enforce the standards for the mesh.

data-contracts-schemas-registry

Example Standard: Error Detection and Recovery

Another issue is how data products should detect and recover from error. Here, you can use common strategies such as application logs, data profiling, and data lineage to unearth errors in the mesh. Streams are very useful in this context, because they capture live and historical data in a sequence of events, which lets you identify cause and effect relationships much more easily (particularly if you can join and correlate streams from different domains). Streams also let you recover and fix errors by replaying and reprocessing historical data.

data-mesh-governance-error-detection-recovery

Example Standard: Tracking Data Lineage

There are a lot of streams in a data mesh and they may span across systems, technology platforms, data centers, clouds, and so on. For the purpose of tracking data lineage, you ideally want to cover the full mesh. Event streaming is a key technology for implementing this in practice, because it allows you to track data in motion all the way from its origins, through to its intermediate steps, to its final destination.

data-mesh-tracking-data-lineage

Recommendations for Federated Governance

  • You need the right mindset: Be pragmatic, and don't expect the governance system to be perfect. It only needs to be "sufficient" (a word whose definition is often answered by legal counsel). There will always be parts of a data architecture that aren't covered or that were just recently changed.
  • Governance is more of an ongoing organizational process than a technology.
  • There are things that you will need to centralize, but beware of centralized data models, which can become too slow to change. Use processes and tooling such as GitHub to collaborate and change quickly.

Use the promo code DATAMESH101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Data Governed Wherever It Is

I'm Tim Berglund with Confluent. Welcome to Data Mesh, module five, "Data Governed Wherever It Is." So let's talk about this fourth and final principle of a data mesh, federated governance of your data, that is data governed wherever that data happens to be. Like principle one, decentralization, this is largely an organizational concern. The objective here is to ensure that these independent and autonomous teams owning all these data products within the mesh to ensure that they can actually work together. That is, we wanna make sure that the various constituents of the mesh, actually mesh, so that we generate real network effects of creating value, maybe in proportional to the number of nodes in the mesh raised to some exponent greater than one, for all this data we wanna share across the organization. Now, predominantly this is about establishing global standards like data governance and applying those across the full mesh. So even though we started with decentralization in principles one and two, we now want to strike the right balance between decentralization and centralization to tie everything together. This isn't easy and in practice, the execution will likely differ from organization to organization, company to company. So let's look at some concrete examples and you can start to apply this to your situation. One important task is to ensure that the same data, that's a difficult concept, I know in reality, but ensure that the same data or data attribute is represented identically across different domains. Otherwise we can't join or correlate, for an example of two things we might wanna do, data about our customers across various domains in the company, like inventory orders, shipments, if everybody's got a different view of the customer. Now a comprehensive solution to that problem is beyond our scope here. But what we can say is that once you have them, then data contracts, schemas, and schema registries will help you to implement and enforce standards like this throughout the data mesh. Another example is how data products should detect and recover from errors. Here, we can use common strategies like application logs, data profiling, and data lineage to unearth errors in the mesh. Streams are very useful here because they capture live and historical data in a sequence of events, which lets you identify cause and effect relationships much more easily, particularly, if you can join and correlate streams from different domains, as discussed in the slide just previous. Also, streams let you then recover and fix errors by replaying and reprocessing historical data, since by definition, they've got that historical data lying around in them, since they're streams. Last example, we can see again, that there are a lot of data streams within a data mesh. All right? These data streams may span across systems, technology platforms, data centers, clouds, you know, historical bits of company that had been merged and acquired, that's a typical reality. For the purpose of tracking data lineage, we ideally wanna cover the full mesh so that we have to follow the data, otherwise it's not useful lineage. Event streaming is again a key technology to implement this in practice because it lets you track data in motion all the way from its origins to intermediate steps into the final destinations. Before we wrap up, let's talk briefly about some further considerations for federated governance. In my opinion, the most important recommendation is having the right mindset. Be pragmatic. Don't expect a governance system to be perfect. It only needs to be sufficient. And often the definition of sufficient is answered by legal counsel, right? So there'll always be parts of a data architecture that aren't covered or that were just recently changed, anything like that. So governance is really more of a process, an ongoing evolving process than a particular technology or point in time solution. And finally, we have talked a lot in this course about decentralization versus centralization, governance is on the centralized side of things. It has to be, I mean, fundamentally in the real world it is. So, while there are things you need to centralize beware of the temptation to centralize data models, okay? Careful of that siren song that really is contrary to the spirit of data mesh. Those will eventually become too slow to change and outside the control of the people who know them best. So this wraps up our fourth principle, federated governance. In the next module, we'll dive into a little bit about how to implement a data mesh in practice.