Principal Customer Success Technical Architect (Presenter)
Lead Technologist, Office of the CTO (Author)
You now know that event sourcing is an alternative way of storing data, using a log of events rather than a table that is updated, read, and deleted. You also know that reading data with event sourcing requires an additional step:
Basically you need to read all of your events in, then perform the extra step of a chronological reduce to get to the current state. So you get to collect your detailed events, while the transformation step ensures that your application logic also gets the data in the shape that it needs, i.e., a compact, table-shaped description of what's in the shopping cart.
A problem with this setup occurs, however, when the number of events becomes very large. Lots of events will make the reduce operation take a long time to execute and may use more clock cycles than you might be comfortable with. This may not be an issue with a small shopping cart, but if you were to perform a chronological reduce across the entire history of a checking account, for example, it could be lengthy.
The solution to this problem is Command Query Responsibility Segregation (CQRS), which performs computations when the data is written, not when it's read. This way, each computation is performed only once, no matter how many times the data is read in the future.
CQRS is by far the most common way that event sourcing is implemented in real-world applications. A CQRS system always has two sides, a write side and a read side:
In the write side (shown on the left side of the diagram), you send commands or events, which are stored reliably. The read side (shown on the right side of the diagram) is where you run queries to retrieve data. (If you are using Apache Kafka, it provides the segregation between the two sides.) So unlike in vanilla event sourcing, the translation between event format and table format in CQRS happens at write time, albeit asynchronously.
Separating reads from writes has some notable advantages: You get the benefits of event-level storage, but also much higher performance, since the write and read layers are decoupled. Of course, there is also a tradeoff: the system becomes eventually consistent, so a read may not be possible immediately after an event is written. This must be taken into account when designing a system.
To implement CQRS with Kafka, you write directly to a topic, creating a log of events. (This is faster than writing to a database, which has indexes that must be updated.) The events are then pushed from Kafka into a view on the read side, from where they can be queried. Chronological reductions are performed before the data is inserted into the view. If you have multiple use cases with various read requirements, it's common to create multiple views for them, all from the same event log. To make sure that this is possible, events should be stored in an infinite retention Kafka topic so they're available to the different use cases should the read side need to be rebuilt, or a new view need to be created from scratch.
To actually implement CQRS, you can use Kafka with Kafka Connect and a database of your choice, or you can simply use Kafka with ksqlDB.
ksqlDB provides both the required streaming computation—the chronological reduce—as well as the materialization of the view to read from, all within one tidy package. As mentioned, the reduce is done at write time, not at read time. So as you write a new event, the effect of that event is immediately incorporated into the view. This means that the contents of the shopping cart are calculated once and are updated every time you add, remove, or delete an item from the cart, much like how you might use a regular database. However, all of the events are held in the event log in Kafka, preserving all of their detail.
Now that you understand the basics of event sourcing and CQRS, get some practical experience with events on Confluent Cloud by completing the exercise in the next section.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
As we covered in modules one and two, event sourcing is an alternative means of storing data using a log of events rather than a table that is updated, read and deleted. That approach you're no doubt familiar with from building systems that use databases in a traditional way. In event sourcing, you create a table and append events to it as they occur, building up your event log. But with event sourcing, reading data has an additional step. First you read the data in, points one and two on the diagram. Then you combine the events together at runtime, a kind of chronological reduce that gets you to the current state, point 3 on the diagram. The key point is that your data is physically stored in event format and you get all the benefits that come with that. The transformation step ensures your application logic also gets the data in the shape it needs, I.E. the tidy table shaped description of what's in your shopping cart. One problem with this approach occurs when the number of events gets very large. Lots of events will make the reduce operation take a long time to execute and use more clock cycles in the process than you might be comfortable with. This probably isn't an issue for a shopping cart, but other use cases that incorporate a large number of events, for example, displaying the balance of a checking account, that's a chronological reduce over your entire payment history. It can take a while to compute. Not great if you're clicking buttons and expecting things to happen then and there. A solution to this problem is to do the computation when the data is written, not at the time it is read. This way each computation is done just once no matter how many times the value is read in the future. This is a technique called CQRS, Command Query Responsibility Segregation and is by far the most common way in which event sourcing is implemented in real world applications. Let's look at an example. CQRS implementations always have two sides, a write side and read side. When we implement CQRS with Kafka, it's Kafka that provides the segregation between the two sides. In the write side, on the left side of the figure, you send commands or events which are stored reliably. The read side, over to the right, is where you run queries to get data back. Unlike in the event sourcing case, the translation between event format and table format now happens at write time, albeit asynchronously. In a database, both of these concerns are conflated into a single layer. CQRS segregates the two from one another, hence the name, Command Query Responsibility Segregation. Segregating reads from writes has some notable advantages. We get the benefits of event level storage as we discussed in module three. We also get much higher performance as the write layer and read layer are decoupled. Of course, this is a trade-off and there is a price to be paid for both fast reads and writes. The system becomes eventually consistent. So a read might not be possible immediately after an event was written. This is something that must be taken into account when you design your system. To implement CQRS with Kafka, we write directly to a topic, creating a log of events. This is a fast mechanism for storing data compared to writing to a database which has indexes that must be updated, a time-consuming operation. The events are then pushed from Kafka into a view on the read side from where they can be queried. The chronological reduce we discussed previously is performed before the data is inserted into the view. This process makes the event log the system of record. The read optimized view is created to specifically suit the use case we have. If we have multiple use cases with different read requirements, it's common to create multiple views for these different use cases all from the same event log. To ensure this is possible, events should be stored in an infinite retention Kafka topic so they're available to these different use cases should the read side need to be rebuilt or a new view created from scratch. To physically implement CQRS, you can use Kafka with Kafka Connect and a database of your choice. Or you can simply use Kafka with ksqlDB. Here ksqlDB performs both the required streaming computation, the chronological reduce, as well as materializing the view we read from all within one tidy package. Note that in contrast to the original event sourcing example, the reduce computation is done at write time, not read time. So as we write a new event, the effect of that event is immediately incorporated into the view. This means the contents of the shopping cart are calculated once and updated every time you add, remove or delete an item from the shopping cart, much like you might use a regular database. However, unlike a database, all the events are held in the event log in Kafka, preserving all the wonderful, amazing event level detail. Now that you got all the basics, it's time to try this out and you can do this in just a few minutes using Confluent Cloud. Just follow the steps in the next module and you'll create your first event sourcing application.