Principal Customer Success Technical Architect (Presenter)
Lead Technologist, Office of the CTO (Author)
In the previous modules, you’ve learned about the significant advantages of event sourcing systems, including their evidentiary qualities, the way in which their behavioral views benefit analytics, their fast reads and writes, etc. However, you should also be aware of their primary downside: for several reasons, event sourcing systems are harder to implement than state-based methods.
Traditional systems often have just an application and a database. But in a CQRS system, there is at least one more moving part—the Apache Kafka topic that separates the write side of the application from the read side:
A second complexity stems from the fact that as applications evolve over time, their data models change. This leads to events with many different versions of the same schema being stored inside a single topic. (This is similar to what happens in a schemaless database such as Apache Cassandra or MongoDB.) As the number of schemas increases, you need to add different parsing code to allow the applications to understand each schema. This is a perfectly valid strategy for building software, but it nonetheless adds complexity to the system.
Finally, as mentioned earlier, any CQRS system is only eventually consistent, meaning that you can't immediately read your own writes. So basically, there is some extra complexity with CQRS, but the model works well for many different kinds of use cases.
A good example of a CQRS use case is the New York Times website: each edit, image, article, and byline from the newspaper, all the way back to 1851, is stored in a single Kafka topic. The raw data is stored as events that are then summarized into tables to be delivered on the website. CQRS works well here, because the CMS where the content originates doesn't need to be immediately consistent with the serving layer—thus, eventual consistency is satisfactory.
In fact, event sourcing isn't the only way to provide event-level storage as a system of record. Change Data Capture (CDC) provides most of its benefits without altering the underlying data model.
In the figure above, there is a mutable database table representing a shopping cart. A CDC connector pulls from the table as rows are added or changed. These are pushed into a Kafka topic, from which they can be consumed by other systems. Because CDC records changes at an event level, we get all of the evidentiary benefits of event sourcing. The main problem with a CDC setup is that it is not replayable like an event sourcing solution. However, this can be rectified with the outbox pattern.
The outbox pattern extends replayability to a CDC solution. It features a regular table that you mutate, as well as an events table that is append-only. A trigger appends entries to the events table when inserts or updates are made to the original table. (If the database doesn't support triggers, then the updating application needs to write to both tables in the same transaction.) The events table is then CDC'ed into Kafka. The advantage of this approach over the regular CDC approach is that you can easily replay events directly from the database, because they are stored there in full.
This provides an alternative to storing everything in Kafka, as the New York Times does in their CQRS implementation. As we’ll see in the next section, storing data in Kafka rather than an outbox table works better where there are many systems that need access to the stored data. The outbox pattern remains a good compromise though for simple use cases that are not performance bound.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.