Stream Quality

3 min

Wade Waldron

Principal Software Practice Lead

Stream Quality

Overview

Large systems of streaming data involve multiple teams working on different components. To ensure effective communication between these components, we need an API for our streams. The API often takes the form of a schema. However, if we want these teams to be able to use each other's APIs then we need a central authority where they can be managed and explored. In this video, we will explore the challenge of using APIs across multiple teams, and introduce the Confluent Schema Registry which can be used to manage and enforce schemas.

Topics:

Cross-Team Compatibility
Stream APIs
Schemas
Confluent Schema Registry

Code

Customer Record: Before

{
	"name":"John Smith",
	"phone":"(123)456-7890"
}

Customer Record: After

{
	"name":"John Smith",
	"phone":[
		{
			"type":"mobile",
			"number":"(123)456-7890"
		},
		{
			"type":"work",
			"number":"(123)987-6543"
		}
	]
}

Customer Schema

{
	"name":"Customer",
	"type":"record",
	"fields":[
	  { "name":"name", "type":"string" },
	  {
	     "name":"phone",
	     "type":{
	        "type":"array",
	        "items":{
	           "type":"record",
	           "name":"PhoneNumber",
	           "fields":[
	              { "name":"type", "type":"string" },
	              { "name":"number", "type":"string" }
	           ]
	        }
	     }
	  }
	]
}

Resources

Stream Quality on Confluent Cloud

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes GOVERNINGSTREAMS101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Get Started

Stream Quality

Imagine we have built a large-scale system of data streams. Some of those streams might be managed by our team, however, many of them will be managed by someone else. Those other teams may have very different standards and processes. If we want to make use of the data coming from those teams, we are going to need to establish a standard for us to communicate with. Otherwise, it can create a situation where it is difficult to guarantee the quality of the data that we are receiving. Let's consider a concrete example. Our team is responsible for producing data about online orders. As part of that process, we sometimes want data about the customer placing the order. For example, we may want to display the data on an invoice. However, the customer data is managed by another team. That team produces data streams that contain the data we need. Unfortunately, it changes constantly. Just last week, they modified the customer record to include an array of phone number objects rather than a single string. If it wasn't done carefully, that could break our downstream consumer, because we process the phone number as a single string rather than as an array. These are the types of problems we wish to avoid. What we need is some kind of standard contract both teams can agree on. This contract typically takes the form of a schema. If the producer needs to make a change to the schema, they must ensure the change is compatible with downstream consumers. At the same time, they need to advertise those changes so that consumers know what the new format will look like. We can use message formats such as Avro or Protobuf, both of which include the capability of defining a schema. JSON doesn't have schemas built in, but a JSON schema add-on does exist. We can then build a schema that will outline the exact details of what a message might look like. However, by itself, that's not enough. Once we establish the schema and share it with the individual producers and consumers, what is to stop them from making changes without letting everyone know? To ensure the schemas don't drift, we need a central authority to manage them. This central authority can help us build trust in the schemas. Used properly, it can provide a guarantee that the schemas will remain compatible, even as they evolve. We'll no longer need to worry that someone might change the schema without us knowing. This is where the Confluent Schema Registry comes in. It provides tools to manage and enforce schemas. This allows us to advertise the schema used for a stream, but also enforce compatibility rules as it evolves. It can also be linked to schema validation to guarantee that all published data matches the schema. By leveraging the schema registry, we can build trust in our data streams and feel confident that they meet the quality standards we desire. If you aren't already on Confluent Developer, head there now, using the link in the video description, to access the rest of this course and its hands-on exercises.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 17