How can we ensure that events in a stream are retained forever?
The solution for infinite retention depends on the specific Event Streaming Platform. Some platforms support infinite retention "out of the box", requiring no action on behalf of the end users. If an Event Streaming Platform does not support infinite storage, infinite retention can be partially achieved with an Event Sink Connector pattern which offloads Events into permanent external storage.
When using Confluent Cloud, infinite retention is built into the Event Streaming Platform (availability may be limited based on cluster type and cloud provider). Users of the platform can benefit from infinite storage without any changes to their client applications or operations.
For on-premises Event Streaming Platforms, Confluent Platform adds the ability for infinite retention by extending Apache Kafka with Tiered Storage. Tiered storage separates the compute and storage layers, allowing the operator to scale either of those independently as needed. Newly arrived Events are considered "hot", but as time moves on, they become "colder" and migrate to more cost-effective external storage like an AWS S3 bucket. As cloud-native object stores can effectively scale to infinite size, the Kafka cluster can act as the system of record for infinite Event Streams.
Infinite Retention Streams are typically used to store entire datasets which will be used by many subscribers. For example, storing the canonical customer dataset in an Infinite Retention Event Stream makes it available to any other system, regardless of their database technology. The customer's dataset can be easily imported or reimported as a whole.
Compacted Event Streams are often used as a form of Infinite Retention Event Stream. However compacted streams are not infinite. Instead, they retain only the most recent Events for each key, meaning their contents matches the dataset held in an equivalent CRUD database table.