Get Started Free
Wade Waldron

Wade Waldron

Staff Software Practice Lead

Streaming Data Governance

About This Course

From instant fraud detection and predictive analytics to powerful customer experiences, data streaming systems have countless benefits, especially at large scale. Unfortunately, managing data at scale comes with many challenges. As the amount of data and the number of streams grows, the complexity of managing the system becomes prohibitive.

However, it’s not just the number of streams that becomes a problem. As more data becomes available, more teams will want access to it. But with such a large number of streams, it becomes difficult for those teams to find what they need. There is plenty of data available, but that doesn’t necessarily mean that it is immediately useful.

Furthermore, the data in our streams is in constant flux. As new business requirements are discovered, the data in those streams needs to evolve in order to support the changes. We need to ensure that the integrity of the streams is maintained even as those evolutions occur.

And of course, the biggest challenge of all is maintaining all of this functionality while ensuring that the data is only accessible to authorized users.

The solution to these problems is known as Data Governance.

What is Data Governance?

Data governance is a collection of standards, processes, roles, and metrics that ensure that data is usable, accessible, and effective. Through data governance, organizations can make use of their data in an efficient, secure way.

In short, data governance was established to help teams manage this complexity when dealing with data-at-rest. But in a modern, real-time system that empowers data-in-motion, we need to view the problem through the lens of streaming data. This is known as Stream Governance.

The 4 Principles of Streaming Data Governance

Data governance is defined by four main principles. They are:

  • Availability
  • Usability
  • Data Integrity
  • Security

The tools that we use must support these four principles. We need to be able to guarantee that our data is accurate and available when we need it, but also that it is protected and only accessible to authorized users.

Confluent Cloud provides a set of stream governance tools to support these principles using three main pillars. They are:

  • Stream Quality
  • Stream Catalog
  • Stream Lineage

Benefits of Streaming Data Governance

When our data is organized according to good governance principles, and we leverage tools such as those provided by Confluent Cloud, it allows us to:

  • Deliver trusted, high-quality data streams to the business
  • Increase collaboration and productivity with self-service data discovery
  • Understand complex data relationships and uncover more insights

What is in the Course?

In this course, we will learn the key elements of streaming data governance and how it can help you build robust, scalable applications. We will look at the specific tools provided by Confluent Cloud and see how they support each of the principles of data governance. Finally, you will have an opportunity to build a small pipeline of data streams and use the governance tools to explore them.

Intended Audience

  • You will have already done some projects using Kafka, but now you are concerned about how the system will operate at scale.
  • You are familiar with the following concepts:
    • Topics
    • Brokers
    • Producers
    • Consumers
    • Schemas
  • A DevOps mindset is an asset.

Course Outline

  • This course will cover:
    • Streaming at Scale
    • Stream Governance
    • Stream Quality
    • Schema Registry
    • Stream Discoverability
    • Stream Catalog
    • Visualizing Streams
    • Stream Lineage
    • Closing Remarks

Prerequisites

  • Required Knowledge
    • Basic Kafka
    • Java development experience is an asset, but not required.
  • Required Setup
    • A local development environment including:
      • JDK
      • Maven

Length

  • Approximately 2-3 hours

Staff

Wade Waldron (Course Author)

Wade has been a Software Developer since 2005. He has worked on video games, backend microservices, ETL Pipelines, IoT systems, and more. He is an advocate for Test-Driven Development, Domain-Driven Design, Microservice Architecture, and Event-Driven Systems. Today, Wade works as a Staff Software Practice Lead at Confluent where he shows people how to build modern data streaming applications.

Use the promo codes GOVERNINGSTREAMS101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Overview

Hi, I'm Wade from Confluent. I'm here to show you some of the ways you can govern data streams in Confluent Cloud and to teach you why that's important. When you build your first microservices, things might seem pretty easy. If you're working in a greenfield environment you don't need to worry about integrating with an existing system. Or, if you're decomposing a monolith, you may only have a few small touchpoints to worry about. In either case, you have a small number of data streams that are easy to manage with a small team. But what happens is you scale from a few streams to hundreds or even thousands? Suddenly, small issues turn into big issues, and you no longer have the resources to manage them. Throughout this course, we'll be introducing you to stream governance through hands-on exercises that will have you producing and consuming data through Confluent Cloud. If you haven't already signed up for Confluent Cloud, sign up now so when your first exercise asks you to log in, you're ready to do so. Be sure to use the promo code when signing up to get the free usage that it provides. In this course, we'll talk about the challenges you face as your system scales up. We'll introduce Data Governance and Stream Governance, and see how Confluent Cloud can be used to simplify them. We'll cover how to use the Confluent Schema Registry to ensure high quality data streams. But quality alone isn't enough. At scale, it is difficult to find and use the data we need. We'll see how Confluent Stream Catalog can help us discover streams and understand how to use them. As more streams are introduced, it becomes difficult to visualize the system. We'll see how the Confluent Stream Lineage provides an overall picture of the streaming system. And finally, we'll briefly touch on the security layers that protect each of these features. So, let's get started. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course, and its hands on exercises.