Staff Technologist, Office of the CTO (Presenter)
What do distributed asynchronous computing, event-driven microservices, data in motion, and the modern data flow all have in common? They start with the event: a single piece of data that describes, as a snapshot in time, something important that happened.
But what is an event? What goes inside of it? How do you choose what to include, and what to avoid?
Properly designing your event and event streams is essential for any event-driven architecture. Precisely how you design and implement them will significantly affect not only what you can do today, but what you can do tomorrow. Unfortunately, many learning materials tend to gloss over event design, either assuming that you know how to do it or simply ignoring it all together.
But not us, and not here.
In this course, we’re going to put events and event streams front and center. We’re going to look at the dimensions of event and event stream design and how to apply them to real world problems. But dimensions and theory are nothing without best practices so we are going to also take a look at these to help keep you clear of pitfalls and set you up for success.
Let’s start with a bit of context to make sure we’re on the same page. There are two important questions we need to ask ourselves:
A common first step many businesses take is to connect their existing systems together—reading data from a source, and writing it to a sink. Connectors reduce the burden of writing custom business logic to get data into Kafka topics, where it can then circulate to the systems that need it most.
Connectors automatically convert extracted data into well-defined events that typically mirror the source schema. Consumers rely on these events for dual purposes:
While connectors are a common first use case, native event production by event-driven applications is a close second.
For example, an application may publish facts about business changes to its own event stream—details about a sale, about incoming inventory, or a flight ticket booking.
Similarly, a user's behavior may be recorded in a stream of events as they navigate through a website—products they’ve looked at and flight options they’ve clicked on.
Event streams can also be natively consumed by whichever consumer services need the events. Stream processor frameworks like ksqlDB and Kafka Streams step in as a great choice for building event-driven consumer applications.
First and foremost, events provide a record of something that has happened in the business. These business occurrences are modeled as individual events, and record all the important business details about what happened and why.
Events are named based on past-tense terminology. For example, booking a flight may result in a flight_booked
event, while the completion of an e-commerce order may result in an order_shipped
event.
Once an event is recorded, they are written into an event stream.
A log of events in a stream can be used to construct a detailed picture of the system over time, rather than just as a snapshot of the present.
But what do you put into an event?
Some of the initial questions you need to ask yourself are: Who is the intended user of the data? Is this an event made for your own internal usage? Or are you looking to share it across your boundary to others?
It’s useful to start thinking about the boundaries of your systems. Any given service has its own service boundary—a division between where the internal world transitions to the external world, where encapsulation of data models and business logic give way to APIs, event streams, and remote procedure calls.
In the world of domain-driven design, the boundary between the internal and external world is known as a “bounded context.”
Let’s take a closer look at what the internal world can look like in practice.
Here are two examples of possible “internal worlds.”
On the left is a typical relational-database backed system—data is stored in database tables, modeled according to the application’s needs and use cases. This data is intended to power the regular application operations.
On the right, is an application powered by event sourcing—the internal event streams record the modifications to the domain that when merged together form the current state of the system.
Data on the inside is private, and is meant specifically for use inside the system. It is modeled according to the needs of its service. It is not meant for general use by other systems and teams.
Now, let’s take a look at the external world, and most importantly, the data that crosses the boundaries from the external to the internal.
Data on the outside is composed of sources that have deliberately been made available for others to use and consume. In this case, as events served through event streams.
In contrast to internal world data, data on the outside is purpose-built, as a first-class citizen, to share with other services and teams.
There are several questions that you need to ask yourself for designing data on the outside:
One way to think about data on the outside is to picture the event as a data transfer object. A data transfer object is an encapsulated object that contains data purpose-built for communication to a process or system on the outside.
Data on the inside tends to be more fluid and can change more frequently depending on business requirements. Data on the outside ideally won't change too often because that creates more volatility in the system and may require more extensive changes by downstream consumers.
One challenge of designing events is determining what that data transfer object should contain. Another challenge is creating the code and processes that take data from the inside, and convert it into a format that is suitable for use on the outside.
Precisely how you model data on the outside can be a bit complicated, but we get into the thick of it in the upcoming modules.
To start, we look at four important dimensions to consider when you design your events.
We also go through some hands-on exercises focused on showcasing how these dimensions work in practice.
Following the four dimensions, we look at a series of modules focused on event design best practices, including everything from schemas, to naming, to IDs, and more.
This course introduces you to designing events and event streams through hands-on exercises using Confluent Cloud and ksqlDB. If you haven’t already signed up for Confluent Cloud, sign up now so when your first exercise asks you to log in, you are ready to do so.
Review your selections, give your cluster a name, and click Launch cluster. This might take a few minutes.
While you’re waiting for your cluster to be provisioned, be sure to add the promo code EVENTDESIGN101
to get an additional $25 of free usage (details). From the menu in the top-right corner, choose Administration | Billing & Payments, then click on the Payment details tab. From there click on the +Promo code link, and enter the code.
You’re now ready to complete the upcoming exercises as well as take advantage of all that Confluent Cloud has to offer!
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.