Sometimes compression can reduce message size but there are various use cases that entail large message payloads where compression may not be enough. Often these use cases are related to image, video, or audio processing: image recognition, video analytics, audio analytics, etc.
How can we handle use cases where the Event payload is too large or too expensive to move through the Event Streaming Platform?
Instead of storing the entire event in the event streaming platform, store the event payload in a persistent external store that can be shared between producers and consumers. The producer can write the reference address into the event streaming platform, and downstream clients use the address to retrieve the event from the external store and then process it as needed.
The event stored in Kafka contains only a reference to the object in the external store. This can be a full URI string, an abstract data type (e.g., Java object) with separate fields for bucket name and filename, or whatever fields are required to identify the object. Optionally, the event may contain additional data fields to better describe the object (e.g., metadata such as who created the object).
The following example uses Kafka's Java producer client. Here, we keep things simple as the event's value stores no information other than the reference (URI) to its respective object in external storage.
// Write object to external storage storageClient.putObject(bucketName, objectName, object); // Write URI to Kafka URI eventValue = new URI(bucketName, objectName); producer.send(new ProducerRecord<String, URI>(topic, eventKey, eventValue));
The Event Source is responsible for ensuring that the data is properly stored in the external store, such that the reference passed within the Event is valid. Since the producer should be doing this atomically, take into consideration the same issues as mentioned in Database Write Aside.
Also, if a Compacted Event Stream is used for storing the "reference" events (e.g., topic compaction in the case of Kafka), then the compaction will remove just the event with the reference. However, it will not remove the referenced (large) object itself from the external store, so that object needs a different expiry mechanism.