Schema Evolution

Schema Evolution is an important aspect of data management. Similar to how APIs evolve and need to be compatible with all applications that rely on older or newer API versions, schemas also evolve, and likewise need to be compatible with all applications that rely on older or newer schema versions.

Problem

How can I restructure or add new information to an event, ideally in a way that ensures Schema Compatibility?

Solution

schema-evolution

One approach in evolving a schema is to evolve the schema "in place" (as illustrated above). In this approach, a stream can contain events that use new and previous schema versions. The schema compatibility checks then ensure that Event Processing Applications and Event Sinks can read schemas in both formats.

schema-evolution

Another approach is to perform dual schema upgrades, also known as creating versioned streams (illustrated above). This approach is especially useful when you need to introduce breaking changes into a stream's schema(s) and the new schema will be incompatible with the previous schema. Here, Event Sources write to two streams:

One stream that uses the previous schema version, such as payments-v1.
Another stream that uses the new schema version, such as payments-v2.

Event Processing Applications and Event Sinks then each consume from the stream that uses the schema version with which they are compatible. After upgrading all consumers to the new schema version, you can retire the stream that uses the old schema version.

Implementation

For "in-place" schema evolution, an Event Stream has multiple versions of the same schema, and the different versions are compatible with each other. For example, if a field is removed, then a consumer that was developed to process events without this field will be able to process events that were written with the old schema and contain the field; the consumer will simply ignore that field. In this case, if the original schema was as follows:

{"namespace": "io.confluent.examples.client",
 "type": "record",
 "name": "Event",
 "fields": [
     {"name": "field1", "type": "boolean", "default": true},
     {"name": "field2", "type": "string"}
 ]
}

Then we could modify the schema to omit field2, as shown below, and the event stream would then have a mix of both schema types. If new consumers process events written with the old schema, these consumers would ignore field2.

{"namespace": "io.confluent.examples.client",
 "type": "record",
 "name": "Event",
 "fields": [
     {"name": "field1", "type": "boolean", "default": true}
 ]
}

Dual schema upgrades involve breaking changes, so the processing flow needs to be more explicit: clients must only write to and read from the Event Stream that corresponds to the schema that they can process. For example, an application that can process a v1 schema would read from one stream:

CREATE STREAM payments-v1 (transaction_id BIGINT, username VARCHAR)
    WITH (kafka_topic='payments-v1');

And an application that can process a v2 schema would read from another stream:

CREATE STREAM payments-v2 (transaction_id BIGINT, store_id VARCHAR)
    WITH (kafka_topic='payments-v2');

Considerations

Follow Schema Compatibility rules to determine which schema changes are compatible and which are breaking changes.

References

See also the Event Serializer pattern, in which we encode events so that they can be written to disk, transferred across the network, and generally preserved for future readers
See also the Schema-on-Read pattern, which enables the reader of events to determine which schema to apply to the Event that the reader is processing.
Schema evolution and compatibility covers backward compatibility, forward compatibility, and full compatibility.
Working with schemas covers how to create schemas, edit them, and compare versions.
The Schema Registry Maven Plugin allows you to test for schema compatibility during the development cycle.

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog