Staff Technologist, Office of the CTO (Presenter)
The third dimension to consider when designing events and event streams is the relationship between your event definitions and the event streams themselves. One of the most common questions we receive is, “Is it okay to put multiple event types in one stream? Or should we publish each event type to its own stream?”
This module explores the factors that contribute to answering these questions, and offers a set of recommendations that should help you find the best answer for your own use cases.
The consumer’s use case should be a top consideration when deciding how to structure your event streams. Event streams are replayable sources of data that are written only once, but that can be read many times by many different consumers.
We want to make it as easy as possible for them to use the data according to their own needs.
As we saw in Dimension 1, deltas work very well for the general change alerting. Applications can respond to the delta events exposed from inside of an application. Splitting up events so that there is only one type per stream provides a high granularity, and permits consumer applications to subscribe to only the deltas they care about.
But what if a single application needs to read several deltas, and the ordering between events is very important? Following a one-event per stream strategy introduces the risk that events may be read and processed out of order, giving inconsistent sequencing results. While both Kafka Streams and ksqlDB contain logic to process events in both ascending timestamp and offset order, it is merely a best-effort attempt. Out-of-order processing may still occur due to intermittent failure modes such as network and application failures.
But what do you do if you need something with stronger guarantees? A precise and strict ordering of events may be a significant factor for your business use case.
In this case, you may be better off putting all of your events into a single event stream, so that your consumer receives them in the same order as they are written. You also need a consistent partitioning strategy to ensure that all events of the same key go to the same partition, as Kafka only guarantees order at a per-partition basis.
Note that this technique is not about reducing the number of topics you’re using—topics are relatively cheap, and you should choose to build your topics based on the data they’re carrying and the purposes they’re meant to serve—not to simply cut down on topic count. Apache Kafka is perfectly capable of handling thousands of topics without any problem.
Putting related event types into a single topic partition provides a strict incremental order for consumer processing - but it does require that all events are written by a single producer, as it needs strict control over the ordering of events.
In this example, we have merged all of the adds, removes, and discount codes for the shopping cart into a single partition of a single event stream.
Zooming back out, you can see a single consumer coupled on this stream of events. They must be able to understand and interpret each of the types in the stream. It’s important not to turn your topic into a dumping ground for multiple event types, and expect your consumers to simply figure it out. Rather, the consumer must know how to process each delta type, and any new types or changes to existing types would need to be negotiated between the application owners.
You can also use a stream processor like ksqlDB to split the single cart events stream up into an event stream per delta, writing each event to a new topic. Consumers can choose to subscribe to these purpose-built delta streams, or they can subscribe to the original stream and simply filter out events they do not care about.
A word of caution, however. This pattern can result in a very strong coupling between the producer and the consumer service. Usually it is only suitable for applications that are intended to be strongly coupled, such as a pair of systems using event sourcing, and not for general purpose usage. You should also ask yourself if these two applications merit separation, or if they should be redesigned into a single application.
Fact events present a much better option for transferring state, do not require the consumer to interpret a sequence of events, and offer a much looser coupling option. In this case, only a single event type is used per event stream - there is no mixing of facts from various streams.
Keeping only one fact type per stream makes it much easier to transfer read-only state to any application that needs access to it. Streams of Facts effectively act as data building blocks for you to compose purpose-built applications and services for solving your business problems.
The convention of one type of fact per stream shows up again when you look into the tools you can build your applications with—like Kafka Streams or ksqlDB.
In this example, a ksqlDB application materializes the item facts into a table. The query specifies the table schema, the Kafka topic source, the key column, and the key and value schema format.
ksqlDB enforces a strict schema definition and will throw away incoming events that do not adhere to it. This is identical to how a relational database will throw an exception if you try to insert data that doesn’t meet the table schema requirements.
You can leverage ksqlDB’s join functionality when consuming multiple types of facts from different streams, selecting only the fields you need for your own business logic and discarding the rest. In this example, the ksqlDB application consumes from both inventory and item facts, and selects just the id, price, name, and stock, but only keeps records where there is at least one item in stock. The data is filtered and joined together, then emitted to the in-stock items facts stream, which can be used by any application that needs it.
When recording an event, it’s important to keep everything that happened in a single detailed event.
Consider an order that consists of both a cart entity and a user entity. When creating the order event we insert all of the cart information as well as all of the user information for the user at that point in time. We record the event as a single atomic message, and do not split it up into multiple events in several other topics. This lets us maintain an accurate representation of what happened, but also gives our consumers the freedom to select only the data they really want. You can always split up the compound event, but it’s much harder to reconstruct the original event if you split it up too soon.
A best practice is to give the initial event a unique ID, and then propagate it down to any derivative events. This provides event tracing. We will cover event IDs in more detail in the best practices module.
Consumers can also compose applications by selecting the fact streams that they need, and combine them with selected deltas.
This approach is best served by single types per event stream, as it allows for easy mixing of data according to each consumers needs.
Single streams of single delta types make it easy for applications to respond to specific edge conditions, but they remain responsible for building up their own state and applying their own business logic.
You can also put multiple event types in the same stream if you are concerned about strict ordering purposes.
And finally, use a single event type for fact streams. Your consumers can mix, match, and blend the fact streams they need for their own use-cases, with the support of stream processing tools such as Kafka Streams and ksqlDB.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.
Hi, I'm Adam from Confluent. And today we're gonna talk to you about modeling as single versus multiple event streams. The third dimension to consider when designing events and event streams is the relationship between your event definitions and the event streams themselves. One of the most common questions we receive is "Is it okay to put multiple event types in one stream?" "Or should we publish each event type to its own stream?" In this module, we'll explore the factors that contribute to answering these questions and come up with a set of recommendations that should help you find the best answer for your own use cases. The consumer's use case must be a top consideration when deciding how to structure your event streams. Keep in mind that event streams are only written once, but can be read again and again by any new or existing consumers that need them. We want to make it as easy as possible for consumers to use the data according to their own needs. As we saw in dimension one, Delta's worked very well for general alerting of changes. Consumers can respond to the Delta events exposed from inside of an application. Splitting up events so that there's only one type per stream, provides a very high granularity and permits consumer applications to subscribe to only the Deltas that they care about. But what about if a single application needs to read several Deltas and the ordering between the events is very important? Following a one-event-per-stream strategy introduces the risk that events may be read and processed out of order, giving inconsistent sequencing results. While both Kafka streams and ksqlDB contain logic to process events in both ascending timestamp and offset order, it is merely a best effort attempt. Out of order processing may still occur due to intermittent failure modes, such as network and application failures. But what do you do if you need something with stronger guarantees? A precise and strict ordering of events may be a significant factor for your business use case. In this case, you may be better off looking at putting all of your events into a single event stream, so that your consumer receives them in the same order that they are written. You'll also need a consistent partitioning strategy to ensure that all events of the same key go to the same partition, as Kafka only guarantees ordering at a per partition basis. Note that this technique is not about reducing the number of topics you're using. Topics are relatively cheap and you should choose to build your topics based on the data they're carrying and the purposes they're meant to serve, not to simply cut down on the topic count. Putting related event types into a single topic partition provides a strict incremental order for consumer processing. But it does require that all events are written by a single producer, as the producer needs strict control over the placement and ordering of events. In this example, we have merged all of the add, remove and discount code events for the shopping cart into a single partition of a single event stream. Zooming back out, we can see a single consumer coupled on the stream of events. They must be able to understand and interpret each of the event types in the stream. It's important not to turn your topic into a dumping ground for multiple event types, and expect your consumers to simply figure it out. Rather, the consumer must know how to process each Delta type. And any new types or changes to existing types would need to be negotiated between the producer and the consumers. You can also use something like ksqlDB to split the single cart event stream up into multiple streams. Consumers can choose to subscribe to these purpose built streams, or they can subscribe to the original stream and simply filter out the events they don't care about. A word of caution, however. This pattern can result in a very strong coupling between the producer and the consumer service. Usually it's suitable for applications that are intended to be strongly coupled, such as a pair of systems using event sourcing and not for general purpose usage. You should also ask yourself if these two applications merit separation or if they should be redesigned into a single application. Fact events present a much better option for transferring state, do not require the consumer to interpret a sequence of events, and offer a much looser coupling option. In this case, only a single event type is used per event stream. There is no mixing of facts from various streams. And what you'll find is that by keeping only one fact type per stream, that it's a lot easier to transfer read-only state to any application that needs to access it. Streams of facts effectively act as data-building blocks for you to compose purpose built applications and services for solving your business problems. And the convention of one type of fact per stream shows up again when we look into the tools we can build our applications with, like Kafka streams or ksqlDB. In this example, the ksqlDB query materializes the item facts into a table. The query specifies the table schema, the Kafka topic source, the key column, and the value schema format. ksqlDB enforces a strict schema definition and will throw away events that do not adhere to it. This is identical to how a relational database will throw an exception if you try to insert data that doesn't meet the table schema requirements. You can leverage ksqlDB's join functionality when consuming multiple types of facts from different streams, selecting only the fields you need for your own business logic and discarding the rest. In this example, our ksqlDB application is consuming from both inventory and item facts and is selecting just the ID, price, name, and stock, but only keeping the records where we have at least one item in stock. The data is filtered and joined together, then emitted to the in-stock item facts stream, which can be used again by any application that needs it. When recording an event, it's important to keep everything that happened together. Consider an order that consists of both a cart entity and a user entity. When creating the order event, we insert all of the cart information as well as all of the user information for that user at that point in time. We record the event as a single atomic message, and we don't split it up into multiple events in multiple topics. This lets us maintain an accurate representation of what happened, but it also gives our consumers the freedom to select only the data they really want. You can always split up this event later, but it's much harder to reconstruct it if you split it up too soon. As a best practice, you can give the initial event a unique ID, and then propagate it down to any derivative events. This lets you figure out where this event originated from. We'll cover event IDs in more detail in the best practices module. Consumers can also compose applications by selecting the fact streams that they need and combine it with selected Deltas. This approach is best served by single types per event stream, as it allows for easy mixing of data according to each consumer's needs. Single streams of single Delta types make it easy for applications to respond to specific edge conditions. But they do remain responsible for building up their own state and applying their own business logic. You can also put multiple event types in the same stream if you're concerned about strict ordering purposes. And finally, just use a single event type for each fact stream. Your consumers can mix, match, and blend the fact streams they need for their own use cases, with the support of tools such as Kafka streams and ksqlDB.