Get Started Free
Wade Waldron

Wade Waldron

Staff Software Practice Lead

Streaming Data Governance

About This Course

From instant fraud detection and predictive analytics to powerful customer experiences, data streaming systems have countless benefits, especially at large scale. Unfortunately, managing data at scale comes with many challenges. As the amount of data and the number of streams grows, the complexity of managing the system becomes prohibitive.

However, it’s not just the number of streams that becomes a problem. As more data becomes available, more teams will want access to it. But with such a large number of streams, it becomes difficult for those teams to find what they need. There is plenty of data available, but that doesn’t necessarily mean that it is immediately useful.

Furthermore, the data in our streams is in constant flux. As new business requirements are discovered, the data in those streams needs to evolve in order to support the changes. We need to ensure that the integrity of the streams is maintained even as those evolutions occur.

And of course, the biggest challenge of all is maintaining all of this functionality while ensuring that the data is only accessible to authorized users.

The solution to these problems is known as Data Governance.

What is Data Governance?

Data governance is a collection of standards, processes, roles, and metrics that ensure that data is usable, accessible, and effective. Through data governance, organizations can make use of their data in an efficient, secure way.

In short, data governance was established to help teams manage this complexity when dealing with data-at-rest. But in a modern, real-time system that empowers data-in-motion, we need to view the problem through the lens of streaming data. This is known as Stream Governance.

The 4 Principles of Streaming Data Governance

Data governance is defined by four main principles. They are:

  • Availability
  • Usability
  • Data Integrity
  • Security

The tools that we use must support these four principles. We need to be able to guarantee that our data is accurate and available when we need it, but also that it is protected and only accessible to authorized users.

Confluent Cloud provides a set of stream governance tools to support these principles using three main pillars. They are:

  • Stream Quality
  • Stream Catalog
  • Stream Lineage

Benefits of Streaming Data Governance

When our data is organized according to good governance principles, and we leverage tools such as those provided by Confluent Cloud, it allows us to:

  • Deliver trusted, high-quality data streams to the business
  • Increase collaboration and productivity with self-service data discovery
  • Understand complex data relationships and uncover more insights

What is in the Course?

In this course, we will learn the key elements of streaming data governance and how it can help you build robust, scalable applications. We will look at the specific tools provided by Confluent Cloud and see how they support each of the principles of data governance. Finally, you will have an opportunity to build a small pipeline of data streams and use the governance tools to explore them.

Intended Audience

  • You will have already done some projects using Kafka, but now you are concerned about how the system will operate at scale.
  • You are familiar with the following concepts:
    • Topics
    • Brokers
    • Producers
    • Consumers
    • Schemas
  • A DevOps mindset is an asset.

Course Outline

  • This course will cover:
    • Streaming at Scale
    • Stream Governance
    • Stream Quality
    • Schema Registry
    • Stream Discoverability
    • Stream Catalog
    • Visualizing Streams
    • Stream Lineage
    • Closing Remarks

Prerequisites

  • Required Knowledge
    • Basic Kafka
    • Java development experience is an asset, but not required.
  • Required Setup
    • A local development environment including:
      • JDK
      • Maven

Length

  • Approximately 2-3 hours

Staff

Wade Waldron (Course Author)

Wade has been a Software Developer since 2005. He has worked on video games, backend microservices, ETL Pipelines, IoT systems, and more. He is an advocate for Test-Driven Development, Domain-Driven Design, Microservice Architecture, and Event-Driven Systems. Today, Wade works as a Staff Software Practice Lead at Confluent where he shows people how to build modern data streaming applications.

Use the promo code GOVERNINGSTREAMS101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.