Get Started Free
‹ Back to courses
course: Apache Kafka® for Python Developers

Integrate Python Clients with Schema Registry

7 min
dave-klein-headshot

Dave Klein

Developer Advocate

Use the promo code PYTHONKAFKA101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Integrate Python Clients with Schema Registry

Hi, Dave Klein here again with the Apache Kafka for Python Developers course. In this module, we'll learn how to use schemas and the Confluent Schema Registry to provide structure and consistency for our event-driven applications. Let's get started. In Kafka applications, producers and consumers are completely decoupled. Producers don't know who's going to consume the data they produce and consumers don't know where the data they're consuming is coming from. This is by design and it helps us to build more loosely coupled applications and pipelines. However, producers and consumers do need to agree on the structure of the data they're working with. Producers need to know how to serialize the data to bytes before sending it to Kafka and consumers need to know how to deserialize the bytes they're receiving from Kafka. For this, we use schemas with the help of the Confluent Schema Registry. Schema Registry provides a way to store, discover and evolve schemas in either JSON Schema, protocol buffers, or Avro formats. This enables us to build more resilient pipelines and applications with greater data integrity and flexibility. We could go on about the benefits of schemas when working with Apache Kafka but now let's take a look at how to use them in our applications with the Confluent Python Library. The key players when working with schemas in Python are serializers, deserializers and the SchemaRegistryClient. Serializers are used in producer applications to convert keys and their values into bytes in the Kafka protocol. Deserializers are used by consumers to turn those bytes back into something we can use in our applications. And the SchemaRegistryClient is used to provide the connection to the schema registry as we'll see shortly. Not all serializers interact with the schema registry, there are also serializers for basic data types; string, integer and double. These are often used for keys which are frequently simple data types. You can find these serializers in the serialization module in the Confluent Kafka package. The more interesting serializers are found in the schema registry module. These convert complex objects from either JSON Schema, protobuf or Avro formats into bytes ready to be written to Kafka. There are many similarities in how each of these classes is used but there are some differences as well. So we'll look at examples of each one but first let's look at the schema registry client since each of the following serializers will need an instance passed to its constructor. We'll not usually need to access the schema registry client directly but the serializers will use it behind the scenes. To construct an instance, we need to pass in a dictionary that contains at a minimum the endpoint where the schema registry can be reached. Often we'll also need to include authentication information. In this example, we are passing username and password via basic auth. For confluent cloud, these would be the API key and secret. JSON Schema provides structure and validation to JSON, a very popular data format. To create an instance of a JSON serializer, we pass a schema string which defines the structure and constraints for your JSON objects, an instance of the schema client, and finally a function that will turn your target object into a Python dictionary. You'll notice that in our example, we're using a string serializer for the key to our event, since our key is a UUID. Serializers are all callable objects which means that the serializer class implements the called_under method. In order to use our JSON serializer, we use the object instance as a function and pass in the object to be serialized along with a serialization context which contains the topic we are producing to and a constant signifying whether we are serializing the key or value of the event. In this case, it's the value. The return value of the serializer is a sequence of bytes ready to be written to Kafka. The protobuf serializer works with Google's protocol buffers to do the same basic thing that we did with the JSON serializer but in a more space efficient way. Protobuf is a binary format with its own schema definition language. With protobuf, we can compile our schema to produce a Python module containing a class based on our schema. When constructing an instance of the protobuf serializer, we pass in that class along with the schema registry client. Since our data is in a protobuf class instance which handles mapping we do not need a to_dict function with this serializer. Currently though, we do need to pass in a config dictionary containing the use.deprecated.format property. This is to prevent clashes with older code. Once the serializer instance is created, the usage is identical to the JSON serializer. Avro, which is an Apache project, is very popular among Kafka users and other languages. It has some of the benefits of JSON Schema as well as some of the benefits of protobuf. Like protobuf, Avro is a binary format that can significantly reduce the size of our event payloads. But like JSON Schema, the schema is written in JSON so it's easy for both computers and people to understand. Constructing an Avro serializer is similar to the JSON serializer except that we pass in an Avro schema string and the order of parameters is slightly different. This serializer also requires a function to map our object to a dictionary. To recap, serializers are used in producer applications to convert keys and or values to bytes that can be written to Kafka. There are serializers for simple data types as well as for more complex objects using schemas. The three schema formats supported by the Confluent Schema Registry are JSON Schema, protobuf and Avro. Now let's talk about their counterpart, deserializers. Deserializers are used mainly in consumer applications. Just like the serializers, deserializers are callable objects, so we can use instances of these classes as functions. We'll see examples of this shortly. The simple data type deserializers found in serialization module of the Confluent Kafka package will take bytes from a Kafka event and turn them into a string, integer or double as appropriate. We also have deserializers in the schema registry module for JSON Schema, protocol buffers and Avro. There are again, similarities and differences between them as far as how they are constructed so we'll take a look at examples of each now. The constructor for the JSON deserializer takes the same schema string that we use to create the JSON serializer instance and a function that will map a dictionary to our required object. To use the deserializer, we call it like a function and pass the data we want deserialized, in this case, the event value. And just as with the serializers, we pass in a serialization context which contains additional information about the data being deserialized. The protobuf deserializer takes the class that we generated from our protobuf schema and a configuration dictionary with the use.deprecated.format set to false. Because it has the class of the objects we want in return, we don't need to pass a from dictionary function. Calling this deserializer is identical to calling the JSON deserializer. In fact, all three of these deserializers are called in the same way. They only differ in the way they're constructed and in what they return. The Avro deserializer requires the schema registry client just as the Avro serializer did along with the Avro schema string and similar to the JSON deserializer, a function that will produce an object instance from a dictionary. And now let's head to the next module where we'll get some hands-on experience with building a producer and consumer using JSON Schema and some of the classes we just learned about. If you are not already on Confluent Developer, head there now using the link in the video description to access the rest of this course and it's hands-on exercises.