Overview

2 min

Wade Waldron

Principal Software Practice Lead

Streaming Data Governance

About This Course

From instant fraud detection and predictive analytics to powerful customer experiences, data streaming systems have countless benefits, especially at large scale. Unfortunately, managing data at scale comes with many challenges. As the amount of data and the number of streams grows, the complexity of managing the system becomes prohibitive.

However, it’s not just the number of streams that becomes a problem. As more data becomes available, more teams will want access to it. But with such a large number of streams, it becomes difficult for those teams to find what they need. There is plenty of data available, but that doesn’t necessarily mean that it is immediately useful.

Furthermore, the data in our streams is in constant flux. As new business requirements are discovered, the data in those streams needs to evolve in order to support the changes. We need to ensure that the integrity of the streams is maintained even as those evolutions occur.

And of course, the biggest challenge of all is maintaining all of this functionality while ensuring that the data is only accessible to authorized users.

The solution to these problems is known as Data Governance.

What is Data Governance?

Data governance is a collection of standards, processes, roles, and metrics that ensure that data is usable, accessible, and effective. Through data governance, organizations can make use of their data in an efficient, secure way.

In short, data governance was established to help teams manage this complexity when dealing with data-at-rest. But in a modern, real-time system that empowers data-in-motion, we need to view the problem through the lens of streaming data. This is known as Stream Governance.

The 4 Principles of Streaming Data Governance

Data governance is defined by four main principles. They are:

Availability
Usability
Data Integrity
Security

The tools that we use must support these four principles. We need to be able to guarantee that our data is accurate and available when we need it, but also that it is protected and only accessible to authorized users.

Confluent Cloud provides a set of stream governance tools to support these principles using three main pillars. They are:

Stream Quality
Stream Catalog
Stream Lineage

Benefits of Streaming Data Governance

When our data is organized according to good governance principles, and we leverage tools such as those provided by Confluent Cloud, it allows us to:

Deliver trusted, high-quality data streams to the business
Increase collaboration and productivity with self-service data discovery
Understand complex data relationships and uncover more insights

What is in the Course?

In this course, we will learn the key elements of streaming data governance and how it can help you build robust, scalable applications. We will look at the specific tools provided by Confluent Cloud and see how they support each of the principles of data governance. Finally, you will have an opportunity to build a small pipeline of data streams and use the governance tools to explore them.

Intended Audience

You will have already done some projects using Kafka, but now you are concerned about how the system will operate at scale.
You are familiar with the following concepts:
- Topics
- Brokers
- Producers
- Consumers
- Schemas
A DevOps mindset is an asset.

Course Outline

This course will cover:
- Streaming at Scale
- Stream Governance
- Stream Quality
- Schema Registry
- Stream Discoverability
- Stream Catalog
- Visualizing Streams
- Stream Lineage
- Closing Remarks

Prerequisites

Required Knowledge
- Basic Kafka
- Java development experience is an asset, but not required.
Required Setup
- A local development environment including:
  - JDK
  - Maven

Length

Approximately 2-3 hours

Staff

Wade Waldron (Course Author)

Wade has been a Software Developer since 2005. He has worked on video games, backend microservices, ETL Pipelines, IoT systems, and more. He is an advocate for Test-Driven Development, Domain-Driven Design, Microservice Architecture, and Event-Driven Systems. Today, Wade works as a Staff Software Practice Lead at Confluent where he shows people how to build modern data streaming applications.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes GOVERNINGSTREAMS101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Overview

Hi, I'm Wade from Confluent. I'm here to show you some of the ways you can govern data streams in Confluent Cloud and to teach you why that's important. When you build your first microservices, things might seem pretty easy. If you're working in a greenfield environment you don't need to worry about integrating with an existing system. Or, if you're decomposing a monolith, you may only have a few small touchpoints to worry about. In either case, you have a small number of data streams that are easy to manage with a small team. But what happens is you scale from a few streams to hundreds or even thousands? Suddenly, small issues turn into big issues, and you no longer have the resources to manage them. Throughout this course, we'll be introducing you to stream governance through hands-on exercises that will have you producing and consuming data through Confluent Cloud. If you haven't already signed up for Confluent Cloud, sign up now so when your first exercise asks you to log in, you're ready to do so. Be sure to use the promo code when signing up to get the free usage that it provides. In this course, we'll talk about the challenges you face as your system scales up. We'll introduce Data Governance and Stream Governance, and see how Confluent Cloud can be used to simplify them. We'll cover how to use the Confluent Schema Registry to ensure high quality data streams. But quality alone isn't enough. At scale, it is difficult to find and use the data we need. We'll see how Confluent Stream Catalog can help us discover streams and understand how to use them. As more streams are introduced, it becomes difficult to visualize the system. We'll see how the Confluent Stream Lineage provides an overall picture of the streaming system. And finally, we'll briefly touch on the security layers that protect each of these features. So, let's get started. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course, and its hands on exercises.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 17