From instant fraud detection and predictive analytics to powerful customer experiences, data streaming systems have countless benefits, especially at large scale. Unfortunately, managing data at scale comes with many challenges. As the amount of data and the number of streams grows, the complexity of managing the system becomes prohibitive.
However, it’s not just the number of streams that becomes a problem. As more data becomes available, more teams will want access to it. But with such a large number of streams, it becomes difficult for those teams to find what they need. There is plenty of data available, but that doesn’t necessarily mean that it is immediately useful.
Furthermore, the data in our streams is in constant flux. As new business requirements are discovered, the data in those streams needs to evolve in order to support the changes. We need to ensure that the integrity of the streams is maintained even as those evolutions occur.
And of course, the biggest challenge of all is maintaining all of this functionality while ensuring that the data is only accessible to authorized users.
The solution to these problems is known as Data Governance.
Data governance is a collection of standards, processes, roles, and metrics that ensure that data is usable, accessible, and effective. Through data governance, organizations can make use of their data in an efficient, secure way.
In short, data governance was established to help teams manage this complexity when dealing with data-at-rest. But in a modern, real-time system that empowers data-in-motion, we need to view the problem through the lens of streaming data. This is known as Stream Governance.
Data governance is defined by four main principles. They are:
The tools that we use must support these four principles. We need to be able to guarantee that our data is accurate and available when we need it, but also that it is protected and only accessible to authorized users.
Confluent Cloud provides a set of stream governance tools to support these principles using three main pillars. They are:
When our data is organized according to good governance principles, and we leverage tools such as those provided by Confluent Cloud, it allows us to:
In this course, we will learn the key elements of streaming data governance and how it can help you build robust, scalable applications. We will look at the specific tools provided by Confluent Cloud and see how they support each of the principles of data governance. Finally, you will have an opportunity to build a small pipeline of data streams and use the governance tools to explore them.
Wade has been a Software Developer since 2005. He has worked on video games, backend microservices, ETL Pipelines, IoT systems, and more. He is an advocate for Test-Driven Development, Domain-Driven Design, Microservice Architecture, and Event-Driven Systems. Today, Wade works as a Staff Software Practice Lead at Confluent where he shows people how to build modern data streaming applications.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.