Get Started Free
course: Schema Registry 101

Working with Schema Formats

5 min
Danica Fine

Danica Fine

Senior Developer Advocate (Presenter)

Protobuf

sr101-m5-01

This is the Protobuf file we looked at in the previous module. Let’s dive in to discuss its format.

Protobuf defines all fields of a message in type - name format. In this example, the types are scalar values of string and double. Other scalar types supported by protobuf include float, int32 (int in Java), int64 (long in Java), bool, and bytes.

Each field in the message definition contains a unique number and Protobuf uses them to identify your fields in the message binary format. Once you’ve defined your schema you should not change the number or the order of the fields once it’s in use.

Protobuf supports a number of more complex field types. Let’s take a look at some of them now.

Protobuf Collections

sr101-m5-02

Protobuf supports collection types like a list or a map. Here you see both being used to add fields to the Purchase message. For a list you use the repeated keyword. In Java, this will translate to a List. For a map you use the keyword map. Note that the key can only be a string or integer type (int32, int64, for example) but the value can be any type, just not another map type.

Protobuf Enumerations

sr101-m5-03

Protobuf also supports an enumeration type. Note that the first element must always map to 0 value so that the default value of 0 can be used.

Importing Protobuf

sr101-m5-04

You can use definitions from other .proto files by adding an import statement at the top of your file.

Consider this schema that tracks the online events generated by a customer. The purchase and page_view event types are already defined in separate .proto files so you can import them and then list the fields in your .proto file.

Alternate Values for a Field

sr101-m5-05

Protocol buffers also support having a field that could be one of an arbitrary number of values. Let’s take a look at the CustomerEvent schema again. If we know that only one of the page_view or purchase fields will be populated, we can define a single customer_action field that could be oneof these possible events.

To help determine which object fills the field, Protobuf generates an enum named <field name>Case. With this example, the enum would be CustomerActionCase and the value of it would be either PURCHASE or PAGE_VIEW depending on what value actually fills the field.

Default Values

sr101-m5-06

If a field is not present when it’s serialized, Protobuf assigns a default value based on the type of the field. The default for a string field is an empty string, a number field is 0, and a boolean is false.

It’s also worth noting that if a field is set to the default value, for example the total_cost of the sale is zero, it’s not serialized and sent across the wire.

Now let’s move on to Avro schemas.

Avro

sr101-m5-07

Here’s the Avro definition of the Purchase schema we’ve seen so far. Avro schemas use JSON to define the schema.

  • You will always use a type of record when defining a schema.
  • The namespace field is a way to prevent name collisions with other generated Avro objects. When using Avro with Java, the namespace becomes the package name.
  • You define the fields for an Avro object as a JSON array and each field is defined as a JSON object with the name of the field and the type. Avro supports the usual scalar types for fields – string, int, long, double, boolean, float, bytes. Avro also supports more complex types which we will look at next starting with collection types.

Avro Collections – Arrays

sr101-m5-08

Avro supports array types. Notice here that you nest another JSON object when declaring the array. The coupon_codes field could also be a complex type instead of the string shown here.

Avro Collections – Maps

sr101-m5-09

In Avro, maps are also defined using a nested type. The keys of a map in Avro are assumed to be strings. But you can also have complex types for the values of a map.

Avro Enumerations

sr101-m5-10

Avro supports enumeration types as well.

Avro Records in a Schema

sr101-m5-11

Avro permits having another record as a field type. You can either have the full JSON definition in the schema or use the fully qualified name of the record as shown here. It is recommended that if you have a schema that references other record types you use the name of the record so that when you make changes, other Avro files that reference it won’t have to be updated.

Avro Unions

sr101-m5-12

Similar to protobuf, Avro also supports having a field that could contain one of multiple values. This is represented in an Avro schema by using array notation for the type and it will contain the different types that could be in the field. Note that unlike Protobuf, Avro does not provide any support for determining what type is present. The generated code will have a type of Object for the action field and you would have to determine the type by using the instanceof operator in Java.

Avro Default Values

sr101-m5-13

Avro has default values like Protobuf but in Avro you need to explicitly provide them in the schema.

Working with Generated Objects

sr101-m5-14

To work with the generated object from either Avro or Protobuf you need to follow the builder pattern. You first need to create a builder instance, then set the desired fields, and call the build to get the concrete object type.

Changing State of Generated Objects

sr101-m5-15

Avro provides setter methods on the generated objects that allow you to directly change their state. With Protobuf, the objects returned by the builder are immutable. To update the value of a field with Protobuf, you need to pass the object into a builder, update the field(s) you want to change, and then call the build again, resulting in a brand-new object.

Avro builders also have an overloaded constructor that accepts an object of the same type that the builder returns.

Use the promo code SCHEMA101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.