Senior Developer Advocate (Presenter)
This is the Protobuf file we looked at in the previous module. Let’s dive in to discuss its format.
Protobuf defines all fields of a message in type - name format. In this example, the types are scalar values of string and double. Other scalar types supported by protobuf include float, int32 (int in Java), int64 (long in Java), bool, and bytes.
Each field in the message definition contains a unique number and Protobuf uses them to identify your fields in the message binary format. Once you’ve defined your schema you should not change the number or the order of the fields once it’s in use.
Protobuf supports a number of more complex field types. Let’s take a look at some of them now.
Protobuf supports collection types like a list or a map. Here you see both being used to add fields to the Purchase message.
For a list you use the
repeated keyword. In Java, this will translate to a List.
For a map you use the keyword
map. Note that the key can only be a string or integer type (int32, int64, for example) but the value can be any type, just not another map type.
Protobuf also supports an enumeration type. Note that the first element must always map to 0 value so that the default value of 0 can be used.
You can use definitions from other
.proto files by adding an
import statement at the top of your file.
Consider this schema that tracks the online events generated by a customer. The
page_view event types are already defined in separate
.proto files so you can import them and then list the fields in your
Protocol buffers also support having a field that could be one of an arbitrary number of values. Let’s take a look at the
CustomerEvent schema again. If we know that only one of the
purchase fields will be populated, we can define a single
customer_action field that could be
oneof these possible events.
To help determine which object fills the field, Protobuf generates an enum named
<field name>Case. With this example, the enum would be
CustomerActionCase and the value of it would be either
PAGE_VIEW depending on what value actually fills the field.
If a field is not present when it’s serialized, Protobuf assigns a default value based on the type of the field. The default for a string field is an empty string, a number field is 0, and a boolean is false.
It’s also worth noting that if a field is set to the default value, for example the
total_cost of the sale is zero, it’s not serialized and sent across the wire.
Now let’s move on to Avro schemas.
Here’s the Avro definition of the Purchase schema we’ve seen so far. Avro schemas use JSON to define the schema.
recordwhen defining a schema.
namespacefield is a way to prevent name collisions with other generated Avro objects. When using Avro with Java, the namespace becomes the package name.
Avro supports array types. Notice here that you nest another JSON object when declaring the array. The
coupon_codes field could also be a complex type instead of the
string shown here.
In Avro, maps are also defined using a nested type. The keys of a map in Avro are assumed to be strings. But you can also have complex types for the values of a map.
Avro supports enumeration types as well.
Avro permits having another record as a field type. You can either have the full JSON definition in the schema or use the fully qualified name of the record as shown here. It is recommended that if you have a schema that references other record types you use the name of the record so that when you make changes, other Avro files that reference it won’t have to be updated.
Similar to protobuf, Avro also supports having a field that could contain one of multiple values. This is represented in an Avro schema by using array notation for the type and it will contain the different types that could be in the field. Note that unlike Protobuf, Avro does not provide any support for determining what type is present. The generated code will have a type of
Object for the
action field and you would have to determine the type by using the
instanceof operator in Java.
Avro has default values like Protobuf but in Avro you need to explicitly provide them in the schema.
To work with the generated object from either Avro or Protobuf you need to follow the builder pattern. You first need to create a builder instance, then set the desired fields, and call the build to get the concrete object type.
Avro provides setter methods on the generated objects that allow you to directly change their state. With Protobuf, the objects returned by the builder are immutable. To update the value of a field with Protobuf, you need to pass the object into a builder, update the field(s) you want to change, and then call the build again, resulting in a brand-new object.
Avro builders also have an overloaded constructor that accepts an object of the same type that the builder returns.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.