Infinite Retention Event Stream

Many use cases demand that Events in an Event Stream will be stored for forever so that the dataset is available in its entirety.

Problem

How can we ensure that events in a stream are retained forever?

Solution

infinite-retention-event-stream

The solution for infinite retention depends on the specific Event Streaming Platform. Some platforms support infinite retention "out of the box", requiring no action on behalf of the end users. If an Event Streaming Platform does not support infinite storage, infinite retention can be partially achieved with an Event Sink Connector pattern which offloads Events into permanent external storage.

Implementation

When using Confluent Cloud, infinite retention is built into the Event Streaming Platform (availability may be limited based on cluster type and cloud provider). Users of the platform can benefit from infinite storage without any changes to their client applications or operations.

For on-premises Event Streaming Platforms, Confluent Platform adds the ability for infinite retention by extending Apache Kafka with Tiered Storage. Tiered storage separates the compute and storage layers, allowing the operator to scale either of those independently as needed. Newly arrived Events are considered "hot", but as time moves on, they become "colder" and migrate to more cost-effective external storage like an AWS S3 bucket. As cloud-native object stores can effectively scale to infinite size, the Kafka cluster can act as the system of record for infinite Event Streams.

Considerations

  • Infinite Retention Streams are typically used to store entire datasets which will be used by many subscribers. For example, storing the canonical customer dataset in an Infinite Retention Event Stream makes it available to any other system, regardless of their database technology. The customer's dataset can be easily imported or reimported as a whole.
  • Compacted Event Streams are often used as a form of Infinite Retention Event Stream. However compacted streams are not infinite. Instead, they retain only the most recent Events for each key, meaning their contents matches the dataset held in an equivalent CRUD database table.

References