Schema Validator

In an Event Streaming Platform, Event Sources, which create and write Events, are decoupled from Event Sinks and Event Processing Applications, which read and process the events. Ensuring interoperability between the producers and the consumers of events requires that they agree on the data schemas for the events. This is an important aspect of putting Data Contracts in place for data governance.

Problem

How can I enforce a defined schema for all Events sent to an Event Stream?

Solution

schema-validator Validate that an Event conforms to the defined schema(s) of an Event Stream prior to writing the event to the stream. You can perform this schema validation in two ways:

  1. On the server side, the Event Streaming Platform that receives the Event can validate the event. The Event Streaming Platform can reject the event if it fails schema validation and thus violates the Data Contract.
  2. On the client side, the Event Source that creates the Event can validate the event. For example, an Event Source Connector can validate events prior to ingesting them into the Event Streaming Platform, or an Event Processing Application can use the schema validation functionality provided by a serialization library that supports schemas (such as the Confluent serializers and deserializers for Kafka).

Implementation

With Confluent, schema validation is fully supported with a per-environment managed Schema Registry. Use the Confluent Cloud UI to enable Schema Registry in your cloud provider of choice. Schemas can be managed per topic using either the Confluent Cloud UI or the Confluent Cloud CLI. The following command creates a schema using the CLI:

ccloud schema-registry schema create --subject employees-value --schema employees.json --type AVRO

Considerations

  • Schema Validator is a data governance implementation of "Schema on Write", which enforces data conformance prior to event publication. An alternative strategy is Schema On Read, where data formats are not enforced on write, but consuming Event Processing Applications are required to validate data formats as they read each event.
  • Server-side schema validation is preferable when you want to enforce this pattern centrally inside an organization. In contrast, client-side validation assumes the cooperation of client applications and their developers, which may or may not be acceptable (for example, in regulated industries).
  • Schema validation results in a load increase, because it impacts the write path of every event. Client-side validation primarily impacts the load of client applications. Server-side schema validation increases the load on the event streaming platform, whereas client applications are less affected (here, the main impact is dealing with rejected events; see Dead Letter Stream).

References