Get Started Free
‹ Back to courses
course: Event Sourcing and Event Storage with Apache Kafka®

Incorporating Event Storage into Your System

4 min
Anna Mcdonald

Anna McDonald

Principal Customer Success Technical Architect (Presenter)

Ben Stopford

Ben Stopford

Lead Technologist, Office of the CTO (Author)

Incorporating Event Storage into Your System

In the previous modules, you’ve learned about the significant advantages of event sourcing systems, including their evidentiary qualities, the way in which their behavioral views benefit analytics, their fast reads and writes, etc. However, you should also be aware of their primary downside: for several reasons, event sourcing systems are harder to implement than state-based methods.

CRUD vs. CQRS

Traditional systems often have just an application and a database. But in a CQRS system, there is at least one more moving part—the Apache Kafka topic that separates the write side of the application from the read side:

CRUD-vs-CQR

A second complexity stems from the fact that as applications evolve over time, their data models change. This leads to events with many different versions of the same schema being stored inside a single topic. (This is similar to what happens in a schemaless database such as Apache Cassandra or MongoDB.) As the number of schemas increases, you need to add different parsing code to allow the applications to understand each schema. This is a perfectly valid strategy for building software, but it nonetheless adds complexity to the system.

Finally, as mentioned earlier, any CQRS system is only eventually consistent, meaning that you can't immediately read your own writes. So basically, there is some extra complexity with CQRS, but the model works well for many different kinds of use cases.

Use Case: The New York Times Website

A good example of a CQRS use case is the New York Times website: each edit, image, article, and byline from the newspaper, all the way back to 1851, is stored in a single Kafka topic. The raw data is stored as events that are then summarized into tables to be delivered on the website. CQRS works well here, because the CMS where the content originates doesn't need to be immediately consistent with the serving layer—thus, eventual consistency is satisfactory.

nyt

Change Data Capture (CDC)

In fact, event sourcing isn't the only way to provide event-level storage as a system of record. Change Data Capture (CDC) provides most of its benefits without altering the underlying data model.

Change-Data-Capture

In the figure above, there is a mutable database table representing a shopping cart. A CDC connector pulls from the table as rows are added or changed. These are pushed into a Kafka topic, from which they can be consumed by other systems. Because CDC records changes at an event level, we get all of the evidentiary benefits of event sourcing. The main problem with a CDC setup is that it is not replayable like an event sourcing solution. However, this can be rectified with the outbox pattern.

Outbox Pattern

The outbox pattern extends replayability to a CDC solution. It features a regular table that you mutate, as well as an events table that is append-only. A trigger appends entries to the events table when inserts or updates are made to the original table. (If the database doesn't support triggers, then the updating application needs to write to both tables in the same transaction.) The events table is then CDC'ed into Kafka. The advantage of this approach over the regular CDC approach is that you can easily replay events directly from the database, because they are stored there in full.

This provides an alternative to storing everything in Kafka, as the New York Times does in their CQRS implementation. As we’ll see in the next section, storing data in Kafka rather than an outbox table works better where there are many systems that need access to the stored data. The outbox pattern remains a good compromise though for simple use cases that are not performance bound.

CQRS-implementation-image

Use the promo code EVENTS101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud usage and skip credit card entry.

Incorporating Event Storage into Your System

Based on the previous modules event sourcing should sound quite appealing. You get an evidentiary log, you get fast reads and fast writes. You get a behavioral view that can be used for analytics. So what's the catch? Event sourcing is harder to implement than traditional state-based methods for a number of reasons. In a traditional system there is often just an application in a database. In a CQRS system there is at least one more moving part. The Kafka topic that separates the right side of the application from the read side. A second complexity is that as applications evolve over time, the data model changes also. This leads to events with many different versions of the same schema being stored inside a single topic. Some schemas old, some more recent. This is similar to what happens in schema-less databases like Cassandra or MongoDB. As the number of schemas increases different passing code must be added to understand each one. This is a perfectly valid strategy for building software, but it adds complexity to the system nonetheless. Finally, any CQRS based system is eventually consistent, meaning you can't necessarily read your own writes. So there's a little bit of extra complexity with CQRS, but the model works well for many different kinds of use cases. A good example of this working well is in New York Times website. This application stores every single edit, image, article and byline from the newspaper, right back to 1851. All this data is stored in a single Kafka topic. The raw data is stored as events, which are summarized into tables to be delivered on the website. This use case works well because the CMS or the content originates doesn't need to be immediately consistent with the serving layer. So being eventually consistent works well. But event sourcing isn't the only approach we can take to provide event level storage as our system of record. Change data capture provides most of the same benefits without changing the underlying data model. In this figure, we have a mutable table in the database representing the shopping cart. A CDC connector pulls events from this table as rows are added or changed. These are pushed into a Kafka topic where they can be consumed by other systems. Because CDC records changes at an event level, we get all the evidentiary benefits of event sourcing. Sadly, unlike event sourcing the mechanism is not replaceable. However, the outbox pattern can be used as a means to rectify this. In the outbox pattern, there are two tables; a regular table that we mutate and an events table that is append only. A trigger is used to append entries to the event table as rights and updates are made to the original table. If the database doesn't support triggers, then the updating application needs to write to both tables in the same transaction. The events table is then CDC into Kafka. The advantage of this approach over the regular CDC approach is that you can easily replay events directly from the database as they are stored there in full.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.