Hello, you are listening to the Streaming Audio podcast. We spend a lot of time on this podcast talking about the design of event systems, but I think it's high time we had an episode talking about how to design the events themselves. Getting that data model right, it's a core part of building a system that does its job elegantly, efficiently, well. Joining us today, we have Adam Bellemare.
He's the author of two books about event systems design and a new course specifically about designing events and event streams. We're going to go through his four principles, four key factors you want to consider when you're deciding what events to write and what events to publish. How should they be structured? How should you design them? As ever, this podcast is brought to you by Confluent Developer.
More about that at the end, but for now, I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it. Joining us today, returning guest, Adam Bellemare. How are you doing, Adam?
I'm doing well, Kris. Thanks for having me on today.
I'm glad you're here from sunniest Canada.
It's actually incredibly foggy today, believe it or not. It's actually pretty rare to have that here, so it's a bit of a novelty.
I imagine either blinding sunshine or ice in your part of the world.
It's like you've been here.
That's for our different podcast, Streaming Weather. Let's get on track. You are a staff technologist here at Confluent.
Yes.
You have just released a new training course, free training course on Confluent Developer, which is designing events and event streams. Tell us the background to that, why did you want to do this course?
I want to break courses down, break content down into a couple of different aspects. We have a lot of content about how a thing works, how transactions work, how Schema Registry works, how say Kafka fundamentals, Kafka 101, and a lot of this is, these are tools that you have that you can do things with, and these are how these tools work. These are how you evolve schema and whatnot and so on.
What we're looking for was a relatively introductory course that is one sort of one abstraction layer up. You can register schemas and you can commit events and transactions, but how would you go about using it more holistically, let's say, in an environment, in a real world environment. The goal of this was to talk about the relationship between creating events and using events and how they end up in event streams.
The relationship between all of those components. Also, featuring cameos. This is when you would use transactions and this is when you'd use a schema registry, and this is what the partition log matters and so forth.
It's partly that thing there are no end of tutorials about how you store an event, but what event do I actually want to store? What events are interesting?
Rightly so, a lot of the materials, not only that we've created, but a lot of materials that I personally have seen often leave it as an exercise to the reader. They'll present an event as a matter of facts, say, "Here's an event." Might be two dozen fields in there, but why are they in there? Why are they not? Does that model the internal domain? Does that model the external domain?
There can be a lot of questions around that, but that's usually not the core focus of whatever the work or the presenter is speaking about.
It's like we know how to say it, but we don't know what to say. Finding your voice as an event designer.
Yes, exactly that's a good way to put it.
This course you've done, it breaks down a few concepts. The first one you've just mentioned that I wanted to get into the idea of internal versus external events.
Yes.
What's that?
One of the things that events, there's something I want to talk about a bit later, so I'll leave that for later. Basically, one of the things with data on the inside and data on the outside, or internal models and external models. This isn't something that's explicitly the event domain. This is true of any computer systems. I have my own private data inside my database. I might expose a bit of it outside through the materialized view or an API.
With event streams, it's very much the same idea. The data on the inside can really be structured in any particular way. It could be event sourced, a set of event sourced events. It could be a relational database, it could be a document database. That internal model is really specific to that system. However, they wanted to structure and model their data. The data on the outside, given that we're listening to a podcast about event streams and streaming is an event.
We publish information about what went on inside that internal model to the outside and the selection and the modeling of that translation from the internal to the external, some people can call it an abstraction layer. I believe the domain driven design term is an anti-corruption layer. Effectively, what we're doing is we're trying to insulate the insides and the outsides so that they can both evolve independently, yet still maintain a compatibility across that boundary.
What are you considering to be the boundary of inside versus outside? Is that cross department, cross company?
Well, that's-
Microservice.
Yes, yes, and yes. The reality is you can draw those dotted lines anywhere. If you plat out your whole, this is our entire technology map, where you draw those dotted lines is usually a matter of semantics. This isn't to be dismissive in any way, but very much so your data on the outside, sorry, your data on the inside, the safest, smallest unit is probably a single system. Now, you might have multiple owners and domains within that system I think like a monolith.
Then you might actually have further dotted lines inside those monoliths. But let's just say for the sake of simplicity, it's granular, it's at a system level. Now, you may have multiple systems owned by a single team and you draw the dotted line around there, and that could all be considered the inside data. They may be sharing data between them relatively freely. Outside of that boundary, again, that is now data on the outside.
Now, if you zoom out even further, so you have the company wide view. Well, from an external company over there that's really outside. If you draw the dotted line around the whole company that's still data on the inside. Really, it's a bit of a hierarchy and you would have concentric circles. To simplify things, it's basically the real question is do you want the outside to have access to that particular piece of information?
That's really the acid test for things like private identification info like names, phone numbers, and addresses and credit cards. That's a pretty clear-cut one where usually you say, "No, I don't want to share that."
This is definitely just private inside data.
Exactly right. For other ones, you may want to make that more widely available, sanitized, highly clean data, and that might be a better candidate for data on the outside.
It reminds me of like Conway's law where if you've got three teams, you're inevitably going to end up with three separate systems that need some way to communicate with each other.
That's exactly it. It's a matter of data on the inside and data on the outside does become much more of a factor as your teams grow, as your company grows, as you have independent teams because they're independent. Again, they're all working at the same company. There's these certain common basic pieces of data that even if you split people up into different teams, they find that they need.
I'm an e-commerce company, I need access to item information. More than one team will need that access for various purposes. Effectively, what we would like to do with event streams and well-designed ones is make it easy for people to get that data on the outside while maintaining that healthy boundary between the teams or a services inner domain, and then the outer domain.
You're considering kind of events as being a primary collaboration tool?
Yes.
That design you want to get right?
Yep.
We know where we are. You broke the course down into four dimensions.
Yes.
Let's go through those and you can explain what you're talking about there. First one on your list is fact versus Delta. What's that?
Yes. I'm going to open this one up with facts are what I've defined as a complete picture of context and effectively state. It's very, very much synonymous with, if you think of a relational database row, all of the data in that row is effectively a fact event.
It's a snapshot of the later state.
A Delta event indicates a change of a particular field, parameter, or even of context. For example, a very contrived e-commerce example, a shopping cart, here's a fact of the shopping cart, and this is everything that's in it right now. If you add something or remove something, you'll still get a new event with a complete picture, including sorry, previous items that were added in there.
A Delta event would say, I've added something or have I've removed something, but you'd have to aggregate them all together to get the final complete picture.
Let me see that distinction.
Now, one of the things I want to call out here, because I know for sure there's going to be some people listening who are like, "Wait a minute, that's not my definition of a fact." Delta, that ones tends to be less controversial. The thing is naming things is hard. I want to call this out because naming things is hard, I don't want people to get too hung up on this specific name, but rather more the concept.
As close contenders to just fact, fact and state and noun were all fairly like, "Gosh, should we call it this? Should we call it that? There was a bit of debate back and forth, but we had to call it something." Whereas Deltas is probably the most ubiquitous, but we are also debating calling it an action, but actions also connotate a command. We're like, "Let's not call it that." Also, verbs, added to cart, removed from cart, some sort of transformation of a thing and that thing being an aggregate.
Capturing the idea of what's changed.
Yes. Naming things is hard. One of them is, here's a current picture of something that happened, the results of what happened, here it is. The other one is, this is describing the change that you would then apply.
With those concepts firmly in hand, and we'll just label them that way for now.
Yes, state and Delta. Sorry, fact and Delta.
We've got the distinction now, what design guidance would you give someone looking at that distinction?
I like to take a step back here and talk again about data on the inside, data on the outside. If you're familiar with event sourcing, event sourcing is you take a series of immutable events and the events are recording the changes that you are trying to impose or add into your system, and then you build up the materialized view of it or you build up the view of it by aggregating all those together.
Delta events are very well-suited for that. Most event source systems, I dare say all I'm, I'm not a total expert on that are very much Delta driven, add this to cart, remove this from cart, et cetera. The business logic to understand and process and apply those Deltas correctly are also encompassed within the definitions of those events. It's all within the same data on the inside.
Fact events work really well when you are doing what we call event carried state transfer, which is I want to provide you with data, you're on the outside, I want to provide you with some data out there, but I don't want to explain for a shopping cart, if it says add or a remove, you may say, "Wow, that's really easy, that's really simple." It's not that hard. The reality is most business problems end up more complicated than that.
If you just give them a fact event with everything computed, you can define everything very clearly. This is the basket, this is the checkout, current checkout price, this is what's on sale. These are any coupon codes you may have applied, but all of the complex logic about building it and aggregating it, that downstream consumer doesn't need to know. It's abstracted away from them and it's kept inside.
I can see how that make the downstream person's job a lot easier.
It makes it a lot easier, yeah. Now, of course there's nothing stopping you from exposing all of those Deltas, but you also expose the need for that downstream consumer to have to know how to handle them all properly. With great power, it comes great responsibility sort of thing. We can give you this, I don't want to use any, give you like this well-formatted, easy to consume block of data.
If you flip that on its head for the writing side, you're essentially saying from to the producer, we can accept a Delta. We don't want to give you the inconvenience of having to send us the whole state at once.
Absolutely, yep. You could do that for sure.
Is it fair to say accepting Deltas is a good way to make life easier for the writers and emitting facts is a good way to make life easy for the readers?
I didn't think about it that way. In some ways I would say that, that's a good one.
I'm asking you to commit to a heuristic on the fly here.
Yes, that's fair. Let me augment this. Building a fact is probably going to be a bit more complicated for the producer, but that producer is also going to likely need that complete picture of internal information for its own purposes. If your domain, Conway's law, these guys, this team, let's say they're responsible for receiving sales and making sure that they have a standardized view of data relating to sales they're still going to compose one of those internally.
What they select to expose, there's a fairly clean transition between the internal model to the external model. They can select some stuff out, obscure some other things, cut some things, maybe de-normalized it a bit, and write that data out. With Deltas, it is easier if a consumer says, you know what, let me rephrase this. I care if they sign up for a newsletter through the checkout process. That's the only thing I care about.
You can definitely expose that data selectively with the understanding that downstream user, basically, once that downstream user starts need to accepting multiple Deltas and those Deltas can relate to each other, then things start getting complicated. If it's just, "I only care about this, just send this out." That's usually not a big deal. Although, it does lead to another problem, which is what I call hyper specific Deltas.
I only care when this newsletter is signed up. Fair. But what if your consumer only cares about, say, "I want to send them a promotion when there's $50 worth of products in the shopping cart." $50. I don't want to have to maintain a state about what's been added or subtracted to the shopping cart. I also don't want to maintain, maybe there are no fact events, maybe we're not sending those. How do they know that there's $50 in that cart?
Well, it's either up to the producer to keep track of that state and then tell them, but it's only the consumer that cares about that business logic. They may decide $50 is too low. Maybe they say it's actually should be $60. Now, you have all these hyper specific events that aren't useful. Really what you're trying to do is you're trying to avoid having one central brain that does all this thinking and just sends out these reaction signals.
Rather you're trying to provide useful information like general purpose, useful information, and have your consumers have to do intelligent work, apply their own business logic, build their own state within their own domain. It's finding that right balance there.
How can I empower people without giving them a crutch in a way?
Becoming beholden to their business logic?
Yeah.
You want to make sure there's enough autonomy that these folks can change their business logic, the consumers as they need, when they need, how they need, and have access to data that'll help them make those decisions.
Ideally, you want the same event data that you publish to be potentially used by lots of departments. You don't want to bespoke feed for everyone that defeats, correct?
Not only for the maintainability, but if you have bespoke feeds for multiple teams, are you sure they're precisely in agreement? Just a touch of background. I worked in big data for about 10 years and that was our number one chronically plaguing problem. You'd have data sets that were supposed to be the same, that maybe had a bit of different logic applied, a little bit of different, I don't want to say filtering because we weren't trying to filter out.
Then, you would compare them and there might be out of a billion rows, there might be say 113 that didn't agree. This one had a little extra, that one had a little extra, and there was some overlap. [inaudible 00:21:25]
I've had that in the past with marketing department saying, "Why are the figures in this table not the same as the figures in this table?" You look back, and the answer is because when you asked for each one to be defined, you asked a slightly different question.
Stuff like that. Then, there's semantical differences that are baked into some codes somewhere. Less is more, if you can publish fewer event streams that have data that's a bit more, what I would say generally usable by many, it makes using it easier. It might make for a bit more work at the consumer side, but you end up with a much more consistent view.
These systems will be sourcing their data from the same source and therefore any discrepancies would be localized within their boundary as a consumer, not necessarily chained three or four or five systems upstream for their bespoke feed.
I'm already getting the feeling like I will point people to the course again because there are hands-on exercises in the course, and that will help to make some of this stuff more concrete.
I took us out of the weeds a bit there.
No, no, that's great. I think we've got the idea of facts versus Delta, the good shape of it. Let's move on to point number two, which is a classic battle in the world of data systems denormalization versus normalization.
The normalized tables in a relational database make for lousy events, they do. The reason they do is that you have to, as a consumer, do a lot of join. You got to join stuff together, a lot of joins. Now, document style databases that largely de-normalized. Those are how we would like to have our events structured by and large. I'm thinking this is more about fact events, but this is also about a Delta.
It could really quite be either. I add items to the shopping cart. Now, those items, if you're using say a relational database style, which a lot of quite frankly relational databases are phenomenal. They're not magic, but they're close and they just do so much important and powerful. They're purpose built to have a very efficient way of handling relations. That's in the name those engines, relational database engines will join 12 tables together without blinking an eye.
To do the same thing in event streams is much more challenging because you are dealing with time and you're also dealing with discreet streams of events that will arrive asynchronously. You have to also specify, are you joining, say on a streaming window or are you joining on in Kafka streams, we call them KTables, ksqlDB they're just tables where you materialize the entire stream.
If you're transferring state, like say with facts, you have all these highly relational event streams, you're going to oftentimes you'll have to join them together if you want to make decisions based on say, properties about the item. If you are, let's say you're building a UI and that UI has different treatment for more expensive items, well you're going to need to know what the prices are, and maybe you want different treatment for which merchant owns it.
You want different treatment for proximity and you want different treatment for things that are in stock, product reviews, descriptions related items. You now have all of these components that in a relational database would reasonably be each in their own table. You've produced this item and the bare-bones item says, "Well, here's the item ID and here's a bunch of relational IDs that you would, foreign keys, you go join them off on other tables."
You get that event and you're like, "Well, this is useless." I mentioned this specifically because many people, myself included, started our event journeys with change data capture, and we capture data directly from these relational databases and they end up in relational form. One thing I remember, so at the company I worked at a while ago called Flip, we basically an e-commerce type style without getting super deep into the weeds.
I remember sitting at lunch back when we still went to offices and saw our colleagues in person. I overhear one of my colleagues complaining about how he has to join all his, basically just what I discussed, all these events, all these relational things together. I'm thinking, you know what? I have to do that too for the work I was doing because I was trying to rebuild, restructure some of this data for some big data purposes for streaming.
He's got to do that and I got to do that. Then one of our other third colleagues at the table says I forget the person's name now. They're like, "I had to do some of that too." What ended up happening is instead of writing business logic, a lot of our teams were basically just using Kafka Streams and joining together the same six, seven, eight, nine tables, their streams in table form because they just needed a document style representation of these core business entities like items and merchants and stores.
People would literally just go find someone else's code and then copy and paste all of it into their own thing. It's like, "There's an indicator here that we're doing something wrong that if everyone needs this and everyone's doing something very similar to the point where we're literally copying and pasting code, maybe we should do some denormalization further upstream."
I would say the right solution to that is it to get the database team if you can, to publish a de-normalized outbox stream or have your own aggregate stream that consumes all these changes, joins them together for you as a team at the lunch table, then you all consume off that.
Well, those are both reasonable answers in their own right. At Flip, we ended up doing the outbox pattern. We would do denormalization at the database table side. We had to make some decisions around handling what we call slowly changing dimensions.
Sounds very Dr. Who.
Basically, the question is, I won't get too far into the weeds on that, but basically we would do some denormalization at the relational database side. We would pump, but some and that's the weasel words there well, how much? What we tried to do is basically it was a heuristic of all these people already want this data, so what is it they actually want?
We talked to the lunch table folks and extracted the common patterns effectively building the pathways where people are paving the road where people were already walking. We relied on that intrinsically, but in hindsight, we could also look at it and say, "I can see why we would probably de-normalized these things given what we know about our business domain, given what we know about what people are looking for and what we're serving and how we're using our data."
We made some relatively educated guesses, but estimates about, "We think people would want to join, have this data readily available, these dimensions very seldomly change. Most store merchants don't change their name. Most store locations don't physically move." They do, new building.
Maybe it would be okay to have a separate stream for those changes.
One of the things, for example, we wouldn't join in would be a live updated stream of inventory because that's changing every single time someone makes an order or purchase. If you're selling thousands of items, boom, boom, boom, boom, boom, boom, boom, boom that's a lot of data that's going to be pouring out.
I see. You'd avoid the stuff that hardly ever changes. You'd also avoid the stuff that changes so often.
Sorry, we would actually keep the stuff that seldomly changes in there because that data is, when it does change, that's fine. We'll update the whole set of events that relates to it. It so seldomly changes that most of the time that's not going to be a factor.
I misunderstood.
No, no, no, it's fair. It's finding that sort of balance about what works or not. That's how to box table. Now, the other one you mentioned is to paraphrase, deputizing one of the lunch little lunchroom people to be like, "You're in charge of joining these together and making sure they work at the stream level." That's also a valid pattern.
That's one that I have called, and my colleagues have called eventification, where you basically take streams and you make them useful events. I can give you a stream of highly relational data that's not super easy to use, or I can make it easy to use as events. This could look something like you build a ksqlDB application that joins the tables together, or you build a Kafka Streams one. The reality is, most of the systems that handle streaming joins for indefinite, materialized table type joins, they're all Java-based.
Flink, Spark, Kafka Streams, KSQL. The reason I mentioned that is there's a lot of folks who don't want to use Java or they want to use something else. They want to use Go, they want to use Rust, they want to use Python, they want to use Pascal, whatever. This is basically all to say that denormalization simply about making it easy to use so I don't have to write all these joints, it's about unlocking the ability to use clients that don't necessarily natively support joins.
You can leverage expertise and tools and frameworks and stuff that you've already built in a different language. The purpose-built stream joiner, that pattern is also often done when your database admin says, absolutely not you can't denormalize in our database.
We don't have the luxury of choice. [inaudible 00:33:06]
Then you say, "Well, we'll do it the next closest step, which is immediately after let's say we use change data capture, got that data out, we'll denormalize it right away, and then produce an easy-to-use denormalized stream for general purpose consumption.
You can also get that when you're dealing with a third-party company's stream. Where you have no chance of affecting their systems. That's two of four. Number three, and this is a one I have no good heuristics on. When do you put a single event type per topic versus when do you have multiple different event types per topic?
I'll start with the easy one. The easy one is if you're building fact events, I would say you use one type per topic. Let's go back to the relational database. In a relational database, you have a table per entity. Now you could just union them all together and put them in one big nonsensical table.
I've actually seen it done.
I'm not surprised, but I am a disappointed.
Postgres is key value store approach.
You could do that, but typically we don't, and we don't because you want to have those strong typing in a database, strong typing, like this field can't be null. This field maybe can be null, this field has a default. We do the same thing with events. The fact-based event, single event per stream, that's basically how you would do it. The other thing is most tools, stream processing tools, expect that convention to, if you're going to materialize a stream into a table Kafka streams expects one specific definition for that key in value.
Of course, if you're using, which you should be, Avro or protobuff or whatnot for your Schema format, it can handle compatibility forwards and backwards depending on your rules. Your tools are basically built around expecting one definition per stream for facts for materializing.
Especially, when you're aggregating things.
Exactly, yes. Now with Deltas, you have a lot of choices. With Deltas, you can do multiple types for one stream. One of the cases where you might see that, for example, is if you're choosing to use an event stream as your list, your sequence of event sourced events, you can write all of the different changes in your domain to that single stream.
Maintaining, ordering.
Maintaining, ordering. Now of course, if we're using a partition stream, the caveat here is you're only going to have order within a partition. If you have a stream with a single partition, there you go, you have a strict ordering of everything sequentially that's occurred. Then when you go to consume it, you can apply those changes very strict in that same consistent order. Now, that makes it harder though, to figure out what specific data is in that stream.
If you have 50 different event types, you'll have to know somehow all of that data that's in there of the different formats. That's important because when you go to consume it, if you get an event, you have no idea what it means or what to do with it, what do you do? Probably throw an exception, maybe you throw the event out, but now your aggregate's going to be totally off.
You need to know exactly what's going on there. There is again, a tighter coupling between the consumer logic and each of the pieces of data in that stream. Again, that's something we've already discussed earlier.
Would you then lean towards not having multiple events per topic for an outbound topic?
Again, yes, you could do that. If you have, say, let's go back to that newsletter example. I signed up for a newsletter as I was checking out, maybe that's literally the event name signed up for newsletter while checking out. Naming things is hard. You have that event and distinctly just that event type is in that stream. If you're a consumer and you want to listen to that event, there you go. You know what's in there, you know what to expect.
That's the only thing that's going to be in there. Now, the challenge starts coming in where let's say this downstream service also needs to keep track of maybe some newsletter signups from different domains. Also, people who say, I don't want to receive this newsletter anymore. Now, you might have three, four, five inbound ones and maybe one or two I don't want this newsletter anymore. They're all going to be timestamped in a certain way. Let me get a bit deeper into that.
There'll be timestamped either with the time that the event occurred locally at the system, assuming all the clocks are synced, you're probably close enough, a couple 100 milliseconds here or there. It'll be timestamped with the time that the event broker received it, which might also have some skew if someone signs up for the newsletter and then says, "Actually you know what, I don't want it. There's only a one or two-second delay.
If your screen processor isn't processing things in a very strict sequential offset and timestamp order, there's a chance that you might do the unsubscription before you do the subscription. The person doesn't get unsubscribed, they keep getting those annoying emails and then they start reporting you a spam, let's say, which affects your emailing campaigns. All of this is to say that multiple event types for Deltas will work, but the more topics your consumer's going to need to listen to, the more tightly coupled it becomes on the source domain.
The more appealing it would look like to have a single topic with multiple event types so that it's guaranteed that strict ordering again. Basically, the more independent Delta events a consumer listens to, the tighter the coupling becomes, and the more it needs to know how to handle and process these Deltas.
That makes sense to me. That's a good heuristic. That leads me on to part 3B. One other question on that topic is, one thing I do see fairly often is people split out the same event type over multiple topics usually for reasons of scale. We have small customers on the purchase for small customers' topic. Is there ever any ignoring space and size and [inaudible 00:40:40] planning, do you think there's ever any logical reason to have the same event type on multiple different topics?
Probably the most common one I've seen is largely due to legal requirements. I'm a Canadian, and the reason why I mentioned this is for a period of time Canada, Canadian.
Are you a real Canadian?
I swear I am. We had legalized marijuana and I mentioned that because as certain e-commerce companies dealing with multinational different accounts in different places of the world, didn't want to route Canadian data that could be related to marijuana purchases, to countries where it's illegal to purchase that. Similarly, maybe it would be GDPR, perhaps data where your data also has to reside within the country of origin.
Splitting up data feeds in that way makes sense because the schema might be exactly the same, the contents may be exactly the same, but you're routing it. This is an explicitly different topic than this one because of legal reasons or because of, well, what another reason could be priority topics. Some applications would have, "This is a premium user. They've spent money for the premium treatment for the lowest SLA, et cetera."
Their data's going to go into a high priority topic and then everyone else is going to be in the free tier topic. The application that's consuming and processing this will always treat the high priority data first. It'll drain that topic and make sure it's empty before it even looks at the other one.
That makes sense, those are good examples. I think my favorite one now is the partitioning strategy of partition by narcotic. [inaudible 00:43:00] Canada, love it. Which takes us to our final of the four, which is, and you're going to have to explain this one in depth, discreet versus continuous.
This one was one that, I will admit, I struggled with this one a bit. I'm going to back up. It's the relationship between events and the completion of let's say a workflow. This one is a bit, I would say out of the four, this one's a bit out there, it's more nebulous than the other three. The other three I found were a little easier to give practical examples. This one's basically about when you're communicating data, what is it that you are expecting?
Partially, what is it that you're expecting the users to do, but how do you signal that something is complete like a purchase order? I put in an order for let's say machined parts or something with a lead time. Something that it's not just click buy ship all in one day. I put in an order that goes through, it goes into receivables, then they allocate it somewhere to say, okay, we're going to have to get this work done. This gets deployed to a shop, this shop's assigned to it.
They have to procure the materials for it, they have to build it, they have to market for shipment, and then it gets shipped out. That could be part of a workflow that is discreet. It has a very discreet start point and a very discreet endpoint. Now, that endpoint might be the start point of another workflow, which is the customer happy? Do they want to return? Do they need modifications? That might loop back around to another work order in and of itself, or that might go to a termination that says end.
I'm building the flow chart in my head.
For people who are familiar, one of the things I didn't want to get into in this because I thought it was going to be a bit too much, is business process modeling notation. It's a whole field business process modeling is a massive field of study. They talk about these sorts of discrete workflows with a domain language that was really beyond what we could put into say like a 15-minute video for this course. If you are interested in discrete workflows, there is a whole body of information there that is rich with very intelligent options, solutions, and so forth.
A continuous workflow is something that would react, say, near very tight intervals or nearly continuously. Think of temperature control settings. The thermostat in your house, even the older ones, even the old mechanical ones, those were the ultimate example of a continuous event flow because of the physical properties. It was continually trying to say, "Is it cold enough? Cold enough? Cold enough? Now, it's cold enough, I'm going to turn the heater on."
The continuous event flow is like once it turned the heater on, it's not done it's still tracking. Now, is it too hot? Is it too hot? Do I need to turn it off? Effectively, the continuous event flow is really around things such as internet of things, sensors, measurements. You can even build continuous workflows around say like meta analysis of clicks, click streams, what are people looking at? What are people buying? What are people viewing?
Those people of course have their own discreet workflow where they go through your website, click on stuff, buy to get shipped out, done. That can all be events, and those can be discreet workflows and discreet chains of events, but you can also harvest them as a continuous stream for other purposes.
Number of Christmas trees bought [inaudible 00:47:30]. A number that spikes at times.
A lot of this one is about thinking of how you're creating this data and the relationship between what people want to do with it and how it's structured. Again, I did say this was maybe the most nebulous of the three or the four. Hopefully, that gives you a bit of a sample.
That seems almost uniquely in this set of four things, something that's kind of outside of your control.
It is to some extent. I think part of it is if I wanted to tie it back to say the data on the inside concept, a discrete workflow for example, would be one that you would probably have a series of dotted lines around. Maybe within a certain part of that workflow, you are going to be using a lot of Delta events inside data on the inside. Then, when you hand something off to the next part, it might be a fact statement like, "We've done all this work, here you go. Here's a fact for the event carried state transport part."
It's within your control. I'm trying to bring to light the idea that there are discrete workflows and there are continuous ones and they do relate and they may use the same events, but some events may not be suitable for one, but more suitable for the other.
Well, that's the tour. Again, I think some people are going to want to go to the course and do some of the hands-on exercises.
The hands-on ones. They all use ksqlDB you can do it all in the browser.
Low code.
It's actually literally you can copy and paste the code in. Obviously, you can try to solve it yourself, but the solutions are there.
Guided learning.
Yeah, guided learning, there we go.
Thank you for the overview. One other thing before we let you go, you have a new book out, right?
I do, yes.
The moment where we let the guest plug the book.
I just completed right before current 2022 that we had this year. Come to the next one everybody if you didn't go to this one.
Coming up in August '23.
It's called Practical Data Mesh. One of the things that I do talk about there is data on the inside and data on the outside, but it's about how do you make data readily available? How do you make data accessible for others to use? How do event streams play? What roles do they play? What are they really good at in the data mesh? This book is, I think we're going to attach the URL to the preamble of the description of this blog so we can-
Link to that book in the show notes.
In the show notes, there we go. I won't bother reading out the URL for you to type in. Please check it out, it's freely available. It's on our Confluent website.
I was at Current and you were doing a book signing of that book. I missed the book signing, I got a copy of the physical book, but haven't got it signed by you.
Next time I see you, Kris.
You're going to have to sign my book for me. Adam, good to see you as always. Thanks for joining us.
Wonderful to see you again, Kris. Thanks for everything.
Catch you again. Bye. That was Adam Bellemare coming to us live from Canadia. His word, not mine. I'm sure he's a real Canadian. You can tell from the vowel sounds, he's definitely genuine. Before we go, shall I leave you with a bonus Adam fact? I think I will. He's surprisingly capable at throwing an axe. Axe throwing, grab it and hit it at target. Don't ask me how I know, but he is, I would absolutely hate to be a moose when he's having a bad day. There you go.
As we said, his course on designing events and event streams is available now for free, covering the ideas we've just gone through in a more structured way. It's got some hands-on exercises that I'm sure you'd find useful. You'll find a direct link to it in our show notes or just head to developer.confluent.io, or you'll find that course and many others to teach you what we know about building event systems successfully.
If you enjoyed this episode, please do leave us a like or a rating, or a comment or share it with a friend, because feedback is always great, of course. There'll be another episode along next week, so if you're not ready to subscribe, consider subscribing now. With all that said, it remains for me to thank Adam Bellemare for joining us and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.
What are the key factors to consider when developing event-driven architecture? When properly designed, events can connect existing systems with a common language and allow data exchange in near real time. They also help reduce complexity by providing a single source of truth that eliminates the need to synchronize data between different services or applications. They enable dynamic behavior, allowing each service or application to respond quickly to changes in its environment. Using events, developers can create systems that are more reliable, responsive, and easier to maintain.
In this podcast, Adam Bellemare, staff technologist at Confluent, discusses the four dimensions of events and designing event streams along with best practices, and an overview of a new course he just authored. This course, called Introduction to Designing Events and Event Streams, walks you through the process of properly designing events and event streams in any event-driven architecture.
Adam explains that the goal of the course is to provide you with a foundation for designing events and event streams. Along with hands-on exercises and best practices, the course explores the four dimensions of events and event stream design and applies them to real-world problems. Most importantly, he talks to Kris about the key factors to consider when deciding what events to write, what events to publish, and how to structure and design them to trigger actions like broadcasting messages to other services or storing results in a database.
How you design and implement events and event streams significantly affect not only what you can do today, but how you scale in the future. Head over to Introduction to Designing Events and Event Streams to learn everything you need to know about building an event-driven architecture.
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us