Get Started Free
‹ Back to courses
Wade Waldron

Wade Waldron

Staff Software Practice Lead

Schema Registry

Overview

The Confluent Schema Registry is a tool for building trust in the quality of data streams. Users of a registered schema can feel confident that the data they are consuming will match the specified format. In this video, we will see how to use the Confluent Schema Registry including registering, evolving, and validating schemas. We'll show how to do this with the Confluent Cloud user interface and the command-line interface (CLI).

Topics:

  • Registering Schemas
  • Accepted Schema Formats
  • Schema Evolution
  • Validating Schemas
  • Code Generation
  • Command-Line Interface (CLI)

Code

Register a Schema

confluent schema-registry schema create --schema Customers-value.avsc --subject Customers-value

Register a New Schema Version

confluent schema-registry schema create --schema Customers-value.avsc --subject Customers-value

Download Schemas with Maven

mvn schema-registry:download

Resources

Use the promo code GOVERNINGSTREAMS101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Schema Registry

The Confluent Schema Registry is there to ensure the quality of the data in our streams. It provides a centralized repository to establish the rules of how the data should be structured. These rules help build trust in the streams. They allow producers and consumers to share an enforced contract that ensures the data will be readable by anyone who needs it. The first step to building trust is to register the schemas in the Confluent Schema Registry. This advertises a contract that enforces our quality standards. Both producers and consumers must adhere to the contract. This eliminates any surprises. Schemas can be registered under any name, however, if you want them to be associated with a topic, certain conventions must be followed. By default, a schema registered with a name such as "-key", or "-value", will be automatically associated with the specified topic. The key and value suffixes will associate the schema with the corresponding part of the message. Typically, message keys are primitive, such as strings or integers, while the value is often a complex type. As an example, if we create a schema named "Customers-value", it will be associated with the value of any messages in the "Customers" topic. Other naming schemes are also available if the default is insufficient. There are three primary ways to add a new schema. It can be added directly to the topic through the UI. This will automatically name the schema appropriately. Alternatively, we can add the schema directly to the registry through the UI. In this case, we'll need to be careful to name the schema appropriately if we want it to be associated with a specific topic. Finally, we can use the Command Line Interface to add a new schema. Once again, if we choose this option, we'll need to be careful how we name it. The Confluent Schema Registry supports three format options. They include Avro, Protobuf, and JSON Schema. We'll need to choose whichever option is suitable for the use case. It's important to understand that schemas are immutable. Once a version of the schema has been created, it cannot be updated. This is done to protect downstream consumers from unexpected updates. However, sometimes updates need to be made. When we need to update a schema, we do so by creating a new version. This is known as "evolving the schema". We can create a new version through the UI using the "Evolve Schema" button. This will present us with the old schema, allow us to make changes, and commit a new version. Alternatively, we can evolve the schema through the command line by submitting a new request to create a schema. As long as it uses the same name, it will be added as a new version. Whether we evolve the schema through the UI or the command line, it will first undergo a validation. The validation checks that the schema is compatible with the previous version. If the validation fails, then the schema evolution will be rejected. Depending on whether you use JSON Schema, Avro, or Protobuf, different validation rules will be applied. The Confluent Schema Registry will enforce rules that ensure schemas remain compatible from one version to the next. However, it does not enforce the use of a schema. If a client wishes to publish messages without using a schema, that's entirely possible. Additional steps must be taken if we want to guarantee the data going into the topic matches an advertised schema. To ensure that the data in the topic matches the schema, we can make use of Kafka serializers that performs schema validation. Essentially, the serializer in the client downloads the schema and validates the record against it. If the data is incompatible, the serializer will fail. This gives us an extra layer of protection to ensure that the data going into the topic is valid. In addition to providing data validation, schemas can be used for code generation. Various tools can be run against the schema to produce application code. The exact tools depend on your programming language, build tools, and choice of schema format. Regardless of what tools you use, you usually need to download the schema first. To make use of schema in the client, we can download it to our project. This can be done by viewing the schema and selecting the "Download" option, or we can make use of tools and plugins that do this automatically. Once our schemas are in place, we have created an environment where we can trust the data in our streams. The Confluent Schema Registry allows us to apply evolutions to our schemas and ensure they remain compatible for future use. Combined with client-side validation, consumers can work with the streams without fearing unexpected changes. They can be confident those streams will maintain a consistent level of quality for as long as they are needed. If you aren't already on Confluent Developer, head there now using the link in the video description to access the rest of this course and it's hands-on exercises.