Confluent Schema Registry

9 min

Tim Berglund

VP Developer Relations

Gilles Philippart

Software Practice Lead

Confluent Schema Registry

In Apache Kafka®, data is produced and consumed as messages with structured formats called schemas. As more consumers start reading from topics—sometimes written by different teams or even entirely separate departments—it's crucial that they understand the schema of the data they're receiving. Over time, schemas evolve—fields may be added, data types may change, and new versions are introduced. To manage these changes seamlessly and prevent breaking consumers, Kafka relies on the Confluent Schema Registry.

What is the Confluent Schema Registry?

The Schema Registry is a standalone server that stores and manages schemas for Kafka topics. It’s not part of Apache Kafka itself but is widely adopted as a community-licensed component by Confluent. To Kafka, it just looks like another application producing and consuming messages, but its purpose is specific: it maintains a database of schemas for topics it monitors. This database is internally stored in a Kafka topic, ensuring high availability and durability.

How Does It Work?

When a producer sends a message, it first contacts the Schema Registry through its REST API to register the schema if it’s new. The producer then includes the schema ID in the message before sending it to the Kafka topic. This ID allows consumers to look up the schema when they receive the message, ensuring they know how to deserialize and interpret it correctly.

On the consumer side, the consumer checks the schema ID from the message against the Schema Registry. If the schema matches what it expects, it proceeds to consume the data. If not, the consumer can throw an exception, signaling that it can't process that version of the schema. This mechanism prevents consumers from breaking when message formats change unexpectedly.

Schema Compatibility Rules

The Schema Registry supports compatibility rules that help manage schema changes gracefully:

Forward Compatibility: Old consumers can read messages produced with newer schemas.
Backward Compatibility: New consumers can read messages produced with older schemas.
Full Compatibility: Both forward and backward compatibility are maintained.

These rules are critical for deciding how changes are rolled out. For instance, if you control the producers and update them first, you might choose forward compatibility, consumers you don't necessarily have control over can be updated at a later date. If you are in control of the consumers instead, backward compatibility ensures they can still process older message versions, as producers are updated progressively (or maybe not at all) by other teams to use the newer schema.

Serialization Formats

Schema Registry supports three major serialization formats:

Avro – Highly compact and widely used with Kafka.
JSON Schema – Human-readable, great for web applications.
Protobuf – Efficient and language-neutral, popular with gRPC services.

For Avro, the schema is typically defined in an AVSC file (Interface Description Language). The Schema Registry uses this to validate messages, ensuring producers only write data that fits the schema and consumers read it correctly.

Why Schema Registry is Essential

In any non-trivial system, the Schema Registry is indispensable. As applications grow, new consumers emerge, and schemas inevitably evolve. The Schema Registry provides:

Centralized schema management – All producers and consumers share a common understanding of message formats.
Compatibility enforcement – Prevents breaking changes by validating schemas during production and consumption.
Governance and collaboration – Teams can negotiate schema changes safely, using IDL files as a shared contract.
Compile-time checks – You can verify compatibility before deploying code, avoiding runtime surprises.

With Confluent Cloud, Schema Registry extends even further, offering governance tools to manage schema changes across large organizations, enabling data contracts, data discoverability and better integration for enterprise-scale deployments.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

Use the promo codes KAFKA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud storage and skip credit card entry.

Get Started

Confluent Schema Registry

This is the introduction to Apache Kafka on the Confluent Schema Registry. Now, once applications are busily producing messages into topics and consuming messages from them, two things are gonna happen. First, new consumers of existing topics are going to emerge. There's gonna be data in there. Maybe you put that data there. Other people are gonna think, wow, that data is useful. I can do something cool with it. And those consumers are brand new applications. They could be written by the same team that wrote the original producer of those messages, or maybe it's another team. Maybe people you don't even know, depending on how big your organization is. That's a perfectly normal thing for new consumers to do, to emerge, written by new people. And they're gonna need to understand the format of the messages in the topic.

That's the first thing that's gonna happen. The second thing is, shocker, schemas are gonna change. Maybe your original data model was absolutely perfect, but the world is a thing that undergoes development and mutation and change, and so our software has to change too. You need a way to keep track of those changes. Will my consumers down the line be okay when I produce messages of a different format? Do I have many producers upstream, some of which are the new version, some of which are the old version? Will my consumer be okay? Consuming different versions of those messages, schema changes, schema changes roll out in complex ways, and we need help with this.

The Confluent Schema Registry exists to solve these problems. So, schema registry is a standalone server process that runs on a machine external to the Kafka brokers. It's not a part of open source Apache Kafka. So it is this other component, a community licensed component made by Confluent, but it's widely adopted, highly standardized in the Kafka world. To the Kafka cluster, it looks like another application. I said in an earlier module that everything that's not a broker is either a producer or a consumer, and that's the case with the schema registry. It's Kafka just sees somebody producing and consuming to some special topic, but its job is to maintain effectively a database of all of the schemas that have been written into topics into the cluster for which it is responsible.

And that database is, of course, persisted in an internal Kafka topic, and it should come as no surprise to you. It's cached in the schema registry for low latency access. So don't worry, there's not too much network round-tripping happening here. The schema registry can be run in a redundant high availability configuration if you like, so you're not just dependent on one instance, it could be a little cluster there.

Now, how does this work? What's the flow here? When I'm producing a message, I'm gonna configure the producer with that properties map, that little list of key-value pairs. I'm gonna say, hey, there's a schema registry and it's here on the network. You give it a name or an address of where the schema registry is. When the producer produces a message, it's going to have an object that it's serializing, not just a string, but some compound object type in your language's type system that will have internally with it an ID. And it's gonna tell the schema registry, hey, I've got this object.

The producer now is connecting to the schema registry using the schema registry's REST API saying, look, I've got this new object, I would like you to know about it. And the schema registry either says, I've never seen that one before, thank you, check, got it. Or yes, of course, I know that one, please go on. Then the producer will take that schema ID, put that in the message and produce it to the topic.

On the consume side, the consumer also has this object, right, that looks like, if it's Java, it looks like a POJO, a plain old Java object, nothing plain about it. There's a lot of machinery in there, but that object knows the schema ID and the consumer will say, okay, this is the schema ID I'm using, consume message from the topic, is this compatible? Is this the same schema I'm expecting? And will go on to consume or not consume accordingly.

Now, a part of the system here is that you get to configure your topic with a compatibility rule, whether it wants to be forward compatible, backward compatible, or both, or I suppose neither is an option also, but that wouldn't be very interesting in terms of the schema registry. The basic idea is you want to know which order you plan to roll out upgrades in. So do you have more control over the produce side when you've got a new version of a schema? Do you know you're going to be able to roll out new versions of the producer first and consumers might have to deal with versions in the future that they haven't seen before, then it'd be forward compatible?

Or do you know that on the consume side, you'll be able to roll out new versions of consumers when your schemas change and there's lots of producers, you don't have control over all those. And so your consumers might have to be backward compatible. They might have to get old versions of messages of a previous schema version and deal with them in a functional way without throwing an exception. You've got that forward or backward or both compatibility there.

And what this does is in cooperation with the schema registry, it makes the producer throw an exception if it's going to produce a message that the consumers won't be able to handle or the consumer will throw an exception if it's about to consume a message that it thinks it won't be able to handle. Since you can't predict in the code what you might have to do with a truly unknown format, schema registry basically stops you, causes a failure in a predictable way of a produce or consume operation that would have broken something. That's the key idea.

The Confluent Schema Registry supports these three serialization formats as of this recording, JSON Schema, Avro and Protobuf. And depending on the format, if it's, say, Avro, you could have an IDL, an Interface Description Language available to you in the form of an AVSC file like you see here. And there's tooling and in the case of Java, a Gradle or a Maven plugin that'll turn that AVSC file into the Java object, the POJO-looking thing that again, all kinds of interesting machinery under the covers there, you should check it out. But that transformation is automatic.

So now you know you have the object that represents that schema. There are a number of cool things about this, but one is that it drives collaboration around schema change to the IDL file, the AVSC file, so that you and your team, when you wanna negotiate change, here's a text file you can use to negotiate that change. It can happen through a pull request, it can happen through conversation, but this is the source of truth.

The build process can generate the POJO and the schema registry compatibility rules can do the work for you of making sure either your producers or your consumers check themselves before they do something that will cause an exception of an unpredictable kind. There are also compile-time checks where if you can access the schema registry, you could say, hey, this topic, this new version of the schema, whatever the compatibility rules are, is this one gonna be okay? So you don't have to deploy code and wait to see if it works. You can check at compile time, which is pretty darn cool.

Now I'd go so far as making this slightly opinionated statement that in any non-trivial system, using the schema registry is non-negotiable. Now, there are gonna be people writing consumers at some point that maybe you haven't talked through everything with. Again, in a non-trivial system at organizational scale, and they're gonna need a standard way of learning about that schema and some kind of automated way of negotiating the changes around that schema. No matter how good of a job you do defining things upfront, the world out there changes, your schemas are gonna change too.

This is another thing we'll talk about briefly in the last module. Confluent Cloud has features on top of the schema registry that help you govern data and publish catalogs and manage change at scale in a way in a large organization that you're always gonna have to. So I think a non-negotiable part of your non-trivial Kafka-based system.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

Modules: Start from lesson 1
Total 16