Senior Developer Advocate (Presenter)
Effective schema management requires an understanding of the following concepts and processes:
In this section you will learn how to manage schema files. Schema management starts with registering schemas. It also includes updating schemas and viewing or downloading them. Testing a schema will also be covered when we get to the section on evolving a schema.
Once you’ve written a schema you will want to register it. There are multiple ways to register a schema, but let’s first talk about what happens during registration.
When you register a schema you need to provide the subject-name and the schema itself. The subject-name is the name-space for the schema, almost like a key when you use a hash-map. The standard naming convention is topic-name-key or topic-name-value. There are some other strategies for subject names which are covered in a later module.
Once Schema Registry receives the schema, it assigns it a unique ID number and a version number. The first time you register a schema for a given subject-name, it is assigned a version of 1.
There are several methods available to register a schema. You can use the Confluent CLI, Schema Registry REST API, Confluent Cloud Console, or the Maven and Gradle plugins discussed earlier.
This example illustrates how to register a schema using the Confluent CLI. Take note of the subject purchases-value. This indicates that the schema represents the value part of the key-value records in the purchases topic. The default type for Schema Registry is AVRO so if you are registering a schema of any other type, you must specify it, e.g., PROTBUF in this example.
Note: If you have a schema for the key part of the record, the subject name would be purchases-key. While you can have schema for the key, this course takes the opinionated approach that scalar values (string, integer, etc.) are better suited for the key so we will only discuss using schemas for the value for the remainder of this course.
This next example uses the command line JSON tool jq in conjunction with the Schema Registry REST API. Since Avro schemas are defined in JSON, you can use them as is in the command.
Since Protobuf schema definitions are not defined in JSON, you need to first get them into JSON format before you can register them using the REST API. In this example, a helper script is used to parse and format the purchase.proto schema into JSON format. The curl command then uses this JSON formatted output from the script.
The Gradle and Maven plugins discussed earlier are also capable of registering a schema. This example illustrates registering two schemas using the Gradle Schema Registry plugin.
In the build.gradle script, you need to provide a subject entry in the register block which includes the path to the schema file and the schema type. The register block can include one or multiple subject entries and each entry can be either the AVRO, PROTOBUF, or JSON_SR schema type.
Using a plugin with your development environment could potentially be the easiest approach to registering schemas. Just the one registerSchemasTask command will register all the entries in the register block.
When you are using a Kafka producer, you can enable it to “auto-register” a schema. In this case, if a producer is unable to retrieve the ID for its schema from Schema Registry because it has not been previously registered, it will respond by registering the schema. While this is very handy for development, it is not something you want to do in production.
At some point you will likely need to update previously registered schemas. To do so, you will use the same methods that you used to first register the schema. Provided you are making compatible changes (compatibility is covered in an upcoming module), Schema Registry will simply assign a new ID to the schema and a new version number. The new ID is not guaranteed to be in sequential order. But the version number will always be incremented by one, hence it will be in sequential order.
You will see shortly how the version number is more important than the ID when viewing the schema.
Here are the options for viewing or downloading a schema. Notice that you provide the version number and the subject name when you are pulling the schema from Schema Registry. You can also leave out the version number and use the word latest which gives you the latest version available for the schema.
The Gradle plugin also has an option (not shown here) to view the schema called downloadSchemasTask.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Danica Fine, thanks for joining me for another Schema Registry module. In this module, we'll see how to manage both Protobuf and Avro schema files. Schema management starts with registering schemas, but further in the life cycle, it also includes updating schemas and viewing or downloading them. Later in this course, we'll also cover evolving and testing a schema. For now, let's see how to register a schema. Suppose we've already created some schema and we want to register it. Before we get into the ways that we can register a schema, let's first talk about what happens during registration. When you register a schema, you provide the subject name in addition to the schema itself. The subject name is what we'll use to identify the schema, and it's usually just the namespace for the schema, almost like a key when you use a HashMap. The standard naming convention is the topic name followed by key or the topic name followed by value, depending on whether you're defining a schema to be used for a key or a value. It doesn't just end there, though, there are some other strategies for subject names that we'll cover in a later session. Once Schema Registry receives the schema, it assigns it a unique ID number and a version. Unsurprisingly, the first time you register a schema for a given subject name, it's assigned a version of 1. After this happens, we're done, the schema has been registered. All right, let's get back to the registration process. Like I said, there are several methods at your disposal for registering a schema. You can use the Confluent CLI, the Schema Registry REST API, the UI on Confluent Cloud, or the Maven and Gradle plugins we discussed earlier. The world is your oyster. The example seen here uses the Confluent CLI, take note of the subject name, "purchases-value". This indicates that this schema represents the value part of the key-value records within the "purchases" topic. If we also wanted to have a schema to define the key, the subject name of that schema would be "purchases-key". And as a side note, while you can have a schema for the key, it's worth mentioning that scalar values, like string and integer, are better suited for the key. With that in mind, we'll only discuss using schemas for the value for the rest of this course. Going back to the command here, note that when using Protobuf, you need to explicitly specify the schema type as PROTOBUF. For Avro, you don't need to provide the type, Schema Registry uses Avro by default. Let's see the registration process using the Schema Registry REST API. In this example, we use the command line JSON tooL, JQ, in conjunction with the REST API. Since Avro schemas are defined in JSON, you can use them as-is in the command. Protobuf schema definitions are not defined in JSON, so you need to first get them into JSON format before you can register them using the REST API. In this example, we use a helper script to parse and format the schema into JSON. The curl command then uses this JSON-formatted output from the script. The Gradle and Maven plugins we discussed earlier are also capable of registering a schema. This example illustrates registering two schemas using the Gradle Schema Registry plugin. In the Gradle build script, you need to provide a subject entry in the register block which includes the path to the schema file and the schema type. This block can include one or multiple subject entries, and each entry can be either the AVRO, PROTOBUF, or JSON_SR schema type. Simple! Using a plugin with your development environment could potentially be the easiest approach to registering schemas. Just the one registerSchemasTask command will register all of the entries in the register block. And finally, when you're using a Kafka Producer, you can also enable it to auto-register a schema. Although, while this is very handy for development, it's not something I'd recommend doing in production. All right, so we've created and registered schemas, what do we do when we need to update a schema? Well, you can actually use any of these register methods to update the schema as well. Provided your making compatible changes , Schema Registry will simply assign a new ID to the schema and increment the version number. While the new ID is not guaranteed to be in sequential order, the version number will always be incremented by one. And as you'll see in a second, the version number is actually more important than the ID when viewing the schema. So, we've updated schemas, but how do we view or download a schema? Again, you have a couple options. All you need to pull a schema from Schema Registry is a subject name, and surprise surprise, a version number. If you don't know which version number to use, you can also just use the keyword "latest" which simply gives you the latest version available for the schema. The Gradle plugin also has an additional downloadSchemasTask command to view the schemas. And with that, you have a better idea of how to register and evolve your schemas over time. See you in the next course module where we'll explore how to apply our knowledge to client applications.