If you’re a programmer you’ll be familiar with APIs, whether it’s a RESTful API or an object interface between your code and some other module or library. APIs provide a contract between two programs or modules. A contract usually encapsulates state and behavior. In event-driven programming there is no parallel for behavior—it’s a pure data discipline—but schemas and the schema registry provide the explicit contract that a program producing events provides to other programs that are consuming them. Implementing schemas over your data is essential for any enduring event streaming system, particularly ones that share data between different microservices or teams—programs that will vary independently of one another and hence value having a well-defined contract for the data they share.
In this module you will learn about the key concept of using schemas as contracts, stored in a schema registry. You will learn about Kafka’s loosely coupled design and how it solves one problem but opens the door for client applications to possibly get out of sync with each other as you make changes to the design and structure of your model objects. We’ll also describe how a schema registry provides what you need to keep client applications in sync with the data changes in your organization or business.
So let’s dive in and learn about the schema registry now.
Applications that leverage Kafka fall into two categories: they are either producers or consumers (or both in some cases).
This creates an implicit contract between the two applications. There is an assumption from the producer that the consumer will understand the format of the message. Meanwhile, the consumer also assumes that the producer will continue to send the messages in the same format.
But what if that isn’t the case?
Applications are rarely static. Over time, they evolve as new business requirements develop. These new requirements may result in changes to the format of the message. This could be something simple such as adding or removing a field. Or it could be more complex involving structural changes to the message. It may even mean a different serialization format, such as converting Avro to Protobuf.
When this happens, if both the producer and consumer are not updated at the same time, then the contract is broken. You could end up in a situation where the producer uses one format, but the consumer expects something different. The result is that messages could get stuck in the topic and the consumer may be unable to proceed.
In the real world, when you want to establish a concrete set of rules for how two parties should behave, you sometimes do that in the form of a legal contract. If either of the two parties violate that contract, there are consequences.
In software, you want to provide a similar contract. If either your producer or consumer fails to meet the rules established by the contract, there needs to be a consequence. This contract is codified in something called a schema.
A schema is a set of rules that establishes the format of the messages being sent. It outlines the structure of the message, the names of any fields, what data types they contain, and any other important details. This schema is a contract between the two applications. Both the producer and consumer are expected to support the schema. If the schema needs to change for some reason, then you need to have processes in place to handle that change. For example, you may need to support the old schema for a period of time while both applications are being updated.
Returning to the real world, when a contract is written, someone is assigned to arbitrate that contract and ensure it is followed. This is generally handled by lawyers. In software the arbiter of the contract is something known as a schema registry. The schema registry is a service that records the various schemas and their different versions as they evolve. Producer and consumer clients retrieve schemas from the schema registry via HTTPS, store them locally in cache, and use them to serialize and deserialize messages sent to and received from Kafka. This schema retrieval occurs only once for a given schema and from that point on the cached copy is relied upon.
Confluent provides a schema registry that is integrated directly into your Kafka applications. It supports Avro, Protobuf, and JSON schema formats. While much of the information in this course is applicable to any schema registry, the focus is on Confluent Schema Registry. As such, from this point forward the course will refer to it simply as the schema registry with the understanding that it is the Confluent Schema Registry being discussed.
Start Kafka in Minutes with Confluent Cloud
This course introduces you to Confluent Schema Registry through hands-on exercises that will have you produce data to and consume data from Confluent Cloud. If you haven’t already signed up for Confluent Cloud, sign up now so when your first exercise asks you to log in, you are ready to do so.
Review your selections and give your cluster a name, then click Launch cluster. This might take a few minutes.
While you’re waiting for your cluster to be provisioned, be sure to add the promo code
SCHEMA101 to get an additional $25 of free usage (details). From the menu in the top right corner, choose Administration | Billing & Payments, then click on the Payment details tab. From there click on the +Promo code link, and enter the code.
You’re now ready to complete the upcoming exercises as well as take advantage of all that Confluent Cloud has to offer!
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.