Principal Customer Success Technical Architect (Presenter)
Lead Technologist, Office of the CTO (Author)
In the previous module, you learned how event sourcing differs from traditional forms of data management since the source of truth is an immutable event log rather than a mutable table. But why would you consider implementing event sourcing in your system? There are three primary benefits of event sourcing: it's evidentiary, it's recoverable, and it's insightful.
Unlike a database table where rows are updated with new values, events simply accumulate in an event log, providing the perfect evidentiary basis for a system. This is similar to the way that accountants perform double-entry bookkeeping, a method where no numbers are changed, ever. Instead, entries are always appended to the ledger. (You may have heard the old adage, "Accountants don't use erasers.") Accountants work this way because it's evidentiary: If a calculation goes wrong for whatever reason, they can always go back and figure out why.
Since it's append only, event sourcing is similar. You can look back in time at the event log and figure out what really happened or why things went wrong. This is a huge advantage when trying to figure out why a problem occurred or why a result has incorrect figures.
The second advantage of event sourcing is recovery through replayability, which is particularly important for data systems. Implementing a fix for a standard bug, like a formatting failure on a web application form, is generally a straightforward process: You change the code to fix the bug, and you ship. But data-related problems are often not so easy. If your service performs a computation such as calculating interest on an account and there is a bug in the computation, fixing the software likely isn't enough. There will be a significant number of accounts whose data has been corrupted as a result of the bug. Fortunately, with an event-based model, the problem is simple to fix: First fix the bug, then rewind back to a point before the bug surfaced, and replay the old events. Both the software and its resulting data are repaired in one go.
The final advantage of event sourcing comes as a result of its collecting detailed, event-level data: This data can be put to great use in analytics systems, whether for machine learning or for other types of analysis. Returning to the e-commerce example above, using events to represent the cart gives you an accurate, truthful record of the user's entire journey. This lets you solve useful problems that would be difficult to address otherwise. For example, you can use the data to figure out why people aren't buying much in your shop at a particular time or within a given category. This is in stark contrast to what a CRUD data model is capable of: simply representing the end state.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
In module two, we discussed the basics of event sourcing. But you might wonder, "We've been building systems with traditional databases for decades, why should we change?" Storing data as events rather than mutable tables has three main advantages, which we're gonna cover in this module. The first is that events are evidentiary. So what does that mean exactly? One reason is that events are immutable. They never change. So unlike a database table where we update different rows with new values, events simply accumulate in an event log. So we can feel safe in the knowledge that they can never, ever change. This provides the perfect evidentiary basis for a system, one that allows you to look back at what happened at a previous time. This approach is quite similar to the way accountants do double entry bookkeeping, a method where no numbers are changed ever. Instead, entries are always upended to the ledger. This is where the old adage, "Accountants don't use erasers" come from. Accountants do this because it is evidentiary. If a calculation goes wrong for whatever reason, they can always go back and figure out why, because all the data for the previous steps remains present. Being append only, event sourcing is similar. We can look back in time at the event log and figure out what really happened or why things went wrong. This is a huge advantage when trying to figure out why a problem occurred or why some result has the incorrect figures. The second advantage of event sourcing is replayability. And this is of particular importance for data systems. To understand this, first consider the typical bug. Say a formatting failure in one of the input forms of a web application. This kind of bug is usually pretty easy to fix. You change the code, fix a bug, and ship it. But data-related problems are often not so easy to fix. If your service does a computation, for example, calculating interest on an account, and there is a bug in the computation, fixing and releasing the software likely isn't enough. There will be a whole number of accounts whose data has been corrupted as a result of the bug, but with an event based model, the problem is simple to fix. First, fix the bug. Then rewind back to a point before the bug surfaced and replay the old events. Thus both the software and the resulting data are fixed in one go. The final advantage comes from collecting such detailed event level data that it can be fed into an analytic system, whether for machine learning or for other types of analysis. To return to the shopping cart we used as an example in module two, using events to represent the cart gives an accurate, truthful record of a user's behavior. Just like the chess game for module one, we're tracking the whole game, not just the end-state. This lets us solve useful problems that would be really hard to solve otherwise. For example, figuring out why people aren't buying that much in our shop at a particular time or within a given category. This sort of analysis is possible with the event-based model because the user shopping behavior has been captured, i.e. exactly what they did: add to the shopping cart, remove from the shopping cart, et cetera. This is in stark contrast to what a CRUD data model is capable of, simply representing the end-state. So, storing data as events comes with significant benefits. But the context thus far has been simple, monolithic applications. What happens when our architectures grow larger? What happens when we incorporate event streaming as a storage media? We'll find all this out in module four when we discuss event sourcing with Kafka.