Anna McDonald is an occasional guest on this show, a tour de force in the Kafka community. Always great to have a conversation with her. Today, we talk about her course, her Confluent Developer Course, on event sourcing. We're not going to go over all the content of the course, you can watch it and take the course for that purpose but really she lays some foundational ideas that I think will prepare you to benefit from that course, a little bit better.
Streaming Audio is brought to you by Confluent Developer. That's developer.confluent.io. And, of course on event sourcing is there along with a number of other video courses, patterns for event-driven architecture, executable tutorials on basic Kafka concepts, Kafka streams, ksqlDB, all that kind of stuff. The examples, a lot of the labs for the courses and Kafka tutorials take place in Confluent Cloud. When you sign up for Confluent Cloud, use the discount code PODCAST100 for an extra 100 United States Federal Reserve Notes in credit, should get you through your course or whatever it is you're doing without you having to pay anything other than your time and attention. With that, be sure to check that stuff out, and now let's get to today's show.
Hello and welcome to another episode of Streaming Audio. I am as usual, your host Tim Berglund, and I'm joined in the studio by, I don't know, many time repeat returning guest, illustrating the triumph of hope over experience, Anna McDonald.
Hello.
Hi.
Hi.
Hi. Anna is a Customer Service Technical Architect.
Close. Customer Success Technical Architect.
Oh, my goodness. Okay. I still say TAM in my mind or technical account manager and probably most people don't.
I know, yeah.
But you coined the pronunciation siesta of [crosstalk 00:01:55]
Yes. I don't think I was the first one on my team to notice that, I think it might have been Doug. I don't know who picked that, but it was a money call because it's fun to say.
That's on brand for you. Yeah, yeah.
Yeah.
Anna is on the show today and Anna. Just a little peek behind the curtain of how this works in the calendar invite, whether you're a Confluent person or a non Confluent person, there's a link. We use a service called Zencastr, it's just a link and it's all through the web to record this and then a little link of audio and video best practices and this that, the thing. And a Google Doc for you to fill out your outline for what you want to talk about. Anna, I don't even know what we're talking about today, because-
That's totally cool. I think I do. We're talking about-
I like it better. I just want to know. I want you to know, I like it better this way. I don't feel like I really need to be guiding the conversation. I think I'm just here to-
Yeah.
...just talk to you. Yeah.
Yeah. I mean, I would agree. I think what we're going to talk about is, we're going to talk about events sourcing. I got contacted sometime in the early spring, and asked if I would fly out to tape an event sourcing course, an eventing in Kafka course. And, I said yes, for many reasons, because it was a worthy endeavor. Also, because I hadn't been anywhere in about 18 months.
There's that.
Yeah, I was fully vaccinated. I wanted to see Sanjana, Ahnapee, my friends. I hadn't seen them in a long time. I would... And the course-
I could add some reasons. You're a super good teacher and your presence on camera is amazing. there were all kinds of good reasons for you to do this.
Absolutely. And, most importantly, I love eventing. It was how I got started in Kafka was with event streaming. it was my first use case and I am very, very passionate about it. Very passionate about how you should bring it into your space. What orders you should go into in order to make sure you don't crash and burn in a fiery ball of death. And I have a fun analogy about that too.
A fiery balls... Okay. We'll get to the fireball of death-
Yes.
...Analogy. I want to say when it comes to event sourcing, so I'm going to ask you obviously to define what that is and walk us through the course. this will be like the podcast amuse-bouche before the meal, that is that course. Maybe it's a full on appetizer. It's an actual thing that you order, but anyway-
It's like those little mini beef Wellingtons, where you could just eat them all as a meal. I like those. They have them at weddings.
Yeah. But, it's just the one that comes there, yeah. This is like six of them for the table-
I eat all of them.
...And, everybody else is, "I want some" and you're, "well-
Too bad.
I don't make the rules, right. Should have got on that. Okay.
That's right.
I want to ask you what event sourcing is. And I just want to point out that this is one of those things, if you get up on a stage or I don't know, tweet something or write something, whatever. And you say, "Hey, I want to talk about event driven architecture." And you use that phrase, okay. People want to talk about it. It's interesting. You know there could be questions and interaction and people, there could be pushback on certain things. It can be a robust dialogue. I want to say, when I say event sourcing that is the biggest, well actually trigger that's kind of currently in my world that-
Yes, absolutely. Absolutely. It is. It is so gate keeping, it kind of makes me vomit because here's the thing. Is events sourcing a specific event pattern? Yes. Yes it is. Do people mean lots of different things when they say event sourcing erroneously? Yes. But there's no need to be rude about it. Event sourcing is not event driven everything. That's not event streaming. It's not event driven architecture.
It's its own thing. It's a narrower category.
It's a specific category. I like to refer to it as the Grand Poobah of eventing. It's like eventing taken to the extreme.
We may engage some corrective feedback. Who knows, could happen. But tell us what event sourcing is. And then let's just kind of walk through the course and just talk about it.
Yeah. I love corrective feedback. Like bring it. Like really do.
Yeah.
It'd be great. Yet the course starts off with what's the dif- And I like the way that it rolls out because it starts off with kind of, what's the difference? Between event driven architecture and also just our normal kind of at rest architecture. And I think we use like a chess game to describe it. And then we also use like Cartesian coordinates to describe it. There's a couple of like good analogies in there. But what it comes down to is it's about whether or not you look at the things, it's how you look at the things in your world. Whether that thing is an invoice, whether that thing is a person like an account holder.
Things happen to that. A person might update their address. An invoice might be paid, an invoice might be credited. Events happen to things. And unless you're using some kind of event streaming in some sort of eventing pattern, you're not doing that. And for the longest time we didn't do that. We were like, you're a person, you're a thing. You stay in this database and we're just going to continually just update the state of this person. It's like taking a Polaroid. I always like to say that you're on vacation and you're taking a Polaroid. And let's say that in one picture... Has this happened to anyone where like somebody breaks their arm on vacation? And in the first half of the vacation, they're like. And then in the second half, they're like. Or they got a broken leg and crutches and you like, how the hell? What happened?
Or like a real bad sunburn. And in the second half they've got like defined sunburn lines and the first half there's nothing.
Right. You're missing the-
Healthy body.
You know something has changed. You have no idea how? You only know the current state. You're like, I'm not... Wait, how did? And that's to me what I love about events because they model our real world. It's like, yo dude broke his arm needed to go to the hospital. We're going to tell you that. That's kind of-
There are things and there are things that happen. And both of those get to be first class citizens in-
Correct. Correct.
In event sourcing. Yeah.
Yes. And I love that because people are interested in events. Someone who broke their arm, probably a doctor wants to know that someone's coming in. Probably your Aunt Janice. Mine would because she would've been like, I told you not to let them climb on those walls. And you did.
Exactly. She would've had that sort of corrective feedback for you.
Yes.
That would've helped you heal.
Absolutely. There are people in real life interested in those events. And then there are a lot of people inside an organization that are interested in those events. Those aggregates or domains as they are called. And so I think the course kind of goes over that in a very strategic fashion to build a case. One of the things, and I like the way you say that this is kind of an appetizer, like the mini taco or beef Wellington, if you will, two of my favorite things at a wedding. I go there for the appetizers, that's... And of course the love to celebrate somebody's union.
Well, there's also love. Yeah.
Yeah. I'm sure it's great. But they usually have the tacos and like those little beef Wellingtons. I like those. One of the things that we don't go over-
Or is it beefs' Wellington.
Oh. That's like the mini pastry. It's beef and then there's pastry around it.
Yeah. I know. But you said beef Wellingtons, or I said, is it beefs' Wellington? I'm not sure. Anyway, go on.
You're right. I don't know. That's a good call.
I interrupted.
I don't know which one it is. Is it? I think it's beef Wellingtons.
Okay. All right.
Because the full-
But that's like the modifier.
The official name is beef Wellington.
Yeah. But that the modifier, the adjective is second, which is not the way we... Anyway.
Yeah.
We should have a different podcast episode on that. I'm sorry. I distracted you.
We should. I agree. And so one of the things we don't talk about in the course, and I think this is something that people ask me again and again. And I started out in Kafka doing event streaming and talking about event patterns. And since the first day that I decided to show my face to everybody, people have asked-
The day that would not be about.
I don't think a week has ever gone by where somebody has not asked me this question. And yet we don't cover it in the course. And so if this is the appetizer, I'd like to talk about that. Because the course will tell you what it is, how to do it, what to think about. But what it doesn't tell you is, how do I do this in my organization? How do I get started putting event driven things in my organization?
And some people will say, well, it's different for everyone. I'd say, no, it's not. There are clear ways to fail and I've seen them again and again and again. And I'll give you an example. And I thought about it today. And I just came up with an analogy. And I was like, yeah, baby. Because I think this will kind of be a way to talk about this. I'm sure everyone listening to this knows what a boil water advisory is. It's a boil advisory. Where if something bad happens to your water supply, there's a water boil advisory.
There might be pathogens.
Correct.
Because there was a break or a flood or whatever.
Boil your water. Let's say that you're a news station, back in the day. When there was a water boil advisory and you're a newspaper, or whatever. You'd have to like print that maybe in the evening edition. After people had probably already got Giardia, which is not very useful.
No.
It's kind of like a look back. Like, hey, by the way yesterday, I know why your stomach's not feeling good today. This is why. And then obviously you're like, crap, I got Giardia. I'm going to go to the doctor. I akin that to where you're starting with where it's a batch job. 24 hours later after you run something, you're like, well, crap, this is the stuff that didn't process or this was... It's almost like looking back at what happened during the day.
And that historically was how kind of newspapers ran. And then you have cable television or even just network news. And when we think about this, this is where it gets sticky. Because I postulate that the easiest way to gain traction is to provide immediate value without having to spend a year working on something to get it to production. To even look like one tiny slice. What's an immediate value you can provide with events? If we look at a water boil advisory, one of the things you can do is if I'm a newscaster, I can just say, "Hey, there's a boil water advisory in our region." I don't have to tell you even where it is. Because if there's one, you could say, well, crap, is it my region? Because I'm going to go look it up.
Ah.
And so-
Okay.
Yeah. That's akin-
An event without a case.
To event notification.
Yeah.
Where we just tell you what happened. We don't really tell you any details about it at all. We just tell you something happened. If we say tonight in Denver, there's a water boil advisory. For more information, call your town. Or whatever. Get on the phone, call your town and be like, "Hey am I going to get Giardia?" Yeah. Yes. No. It gives you the opportunity to act without necessarily telling you if it's specific to you, but it tells you enough. So you can go, oh, well, crap. I better go check that out.
And that's really easy to do. Now, does that provide value over you out a day later or the evening after you've already drunk the water? Yes.
It does.
Absolutely. And it's a tiny step. And so that is event notification. It could be very valuable and you can do it very quickly. It's like a quick win. And if we looked at that in a real context, it would be something like every time an invoice is cut. Every time an order is placed. Just tell me an order was placed. Maybe I go look it up. In my legacy database. But now all of a sudden, instead of having to wait for some kind of flat file every night, I know every time a new order is placed. It's kind of an interim step that's really easy to do.
That's kind of where I suggest people start because the next step is really quick to get to. And that's event carried state transfer. That would be going on the news and going, in the Southwest quadrant of Denver, there is a boil water advisory. It's not only telling you something happened, it's giving you enough information so you know what you need to act on. You know what's going on. In that case, I don't even have to make a phone call anymore. [crosstalk 00:15:12]. I just need to go, oh, well, do I live in that quadrant? Yes or no. That's even more valuable and quicker.
Now, let me tell you what event sourcing is. Event sourcing is, you know what I really need to do, from day one, I need to keep the state of the water in every single quadrant and track whether or not there's a boil water advisory for every single possible quadrant and street. Every time there is a boil water advisory, I'm going to keep a history on it. I also need to make sure that at any point in time, I can go back to the exact minute and tell you, was there a boil water advisory? Yes or no, for an entire city.
Okay. The complete state of the quadrant or the city or whatever the object is and history of that state.
Correct.
And notification of when the state changes.
Correct.
Okay.
And I can play that back at any point in time and say on that, now is that valuable for someone who's doing historical research about water quality? Sure. Do I need to do that first? No, you don't. Because it's a lot harder. That value that you're getting, it better be worth it because it is a lot harder for people to see that as valuable, than the first two. If I tell you the first two, I say, "Hey, look, anytime there's a boil of water advisory, I'm going to tell you about it." And then you'll just say, "Oh, I'm going to go call this and see if I need to stop drinking." Versus finding out a day later and getting Giardia. You're going to be like, yeah okay. I can see the value in that.
And then if I tell you, you know what, not only that I'm also tell you whether or not it's in your neighborhood, so you don't even have to call anymore. People are like, this is great. Awesome sauce. But if I, from day one, I go, look, this is what we're going to do. We're going to create a project to track the state of all boil water advisories for all time, historically. It's going to take about a year and a half maybe. And in that time you still won't know and you'll still have Giardia the entire time. But it's going to be super valuable for historical researchers. Can you see which one is a harder sell?
Yeah. The second one, who cares about the value for historical researchers? You're not getting immediate value.
Yeah. And I think and I think people get caught up in that, and I've seen this like where people are like, you ever see that thing on Saturday Night Live, years ago where they're like, if it's not Scottish, it's crap. Worst Scottish accent ever.
Mike Myers. I'm not going to try one because then Scottish listeners will let me know how bad it was.
I know. Well, I'm sure they'll let me know. And I apologize in advance. I really do.
That was Mike Myers thing. You weren't even really trying.
But no, I wasn't. I always put really bad accents on, so no one will know if it's me.
That way you're like, I wasn't even trying to do that. I was just changing my voice a little bit.
That's right. But we have a community built where a lot of people who have... I feel that if you're somebody who's well known and you're seen as someone who people look to advice. You kind of have a little bit of a responsibility not to take a whole, if it's not event sourcing, it's crap, point of view.
And I just want to tell people because I feel I have that responsibility because what ends up happening is people don't even enter the event space because they're like, well, there's no way I can take two years. I can't do that. I don't have the organizational pull. And when in reality, if you start small and you show that value, you can get more people behind you. And then those use cases where event sourcing is money for, then you've got all this, you've got a lot of street cred built up.
Success and credibility. Yeah.
Absolutely.
Which is always how technology adoption works. It doesn't work by creating the perfect finished thing that conforms to a platonic ideal in every way. And is just a shining example of perfection. You kind of usually build something that is valuable and a little bit crap and does something that like stakeholders in the business are going to recognize as good. There's lots of little, what's the word, minimum viable-
Yeah. Minimum valuable product. I like to call it that.
Valuable. Valuable is nice. Yeah.
I read that somewhere that somebody said we should move away from minimum viable and go to minimum valuable.
There you go. I buy that.
And I like that better.
Yeah. That's a good point, because that's always how we drive technology change. You get a little bit of a victory and it can be imperfect. What else? If you remember off top of your head, the other kind of sections in the course, these are really good foundational ideas, but what other things do you take people through?
Again, one of them is there's a lot of words, in computers. There always have been. We like to make up new words for things there's already words for.
Oh, we do.
Hashtag job security maybe? I don't know.
No, it's a linguistic thing. We have to like mark ourselves as insiders and it's specialized vocabulary is complex, but we do it.
It's true. There's a lot of definitions for things you might hear that you might have said, okay, well I think I know what CQRS is, but it'd be great to have an example in practice. And the other thing I like about it is you can try it all. One of the biggest barriers that people have as well is, if they are somebody in their organization, is brand new, they might not have a streaming library of choice. They might not be able to spin up... They might have an old, archaic Windows laptop that won't even run and just falls over if they try to even... And throughout the whole course there's Confluent Cloud, there's credits provided. You can use ksqlDB, which is fully managed. And you can just play around with it.
And that's the thing I like it the most. We go through an example of using CQRS. Where we're going to separate out that state building from where we're serving up a view from where we're keeping. Which is great. I like it when my applications do one thing and do it well, within reason. And I love the idea of stopping front ends from having to be so smart. And just saying, I'm just going to serve you up whatever view you want. Because I don't care because they're easy to make. Boom, boom, boom.
There you go.
And I think CQRS lends itself naturally to that because it's basically decoupling the idea of, I need to like store my data the same way it's... And then do this dance and magic either in the front end or I'll have another service at mass- Or like, blah, garbage. Just make a service, have that view be served up the way that your dashboard needs it. And then you're done and it's easy.
All of which, and this isn't really our topic today. But just that that pattern you described is kind of logs are down in the corner winking at you with that whole pattern. Because it makes it more sensible to keep your system of record as a log of events and then materialize whatever views are necessary.
Absolutely.
It's not required. You do CQRS with different underlying data infrastructure. But that emerges as the sensible option.
Yeah. And we talk about that. We talk about how if you have to do a map reduce every single time something's updated from the beginning, that's not very workable. It's a consistent view that keeps up. You're not starting from scratch every time.
And so we talk about that and there's some really neat things too coming and ksqlDB. And I hope I don't get in trouble for this. Probably not. I think we publish the blog post. I know that we're allowed to give it to customers. I don't know if it's been published yet, but we just did... They released scalable or are releasing scalable push queries. Which I like to call server side filtering for Kafka.
Oh.
Yeah. Yeah. It's really cool. And that's kind of that direction where I can actually tell ksqlDB, hey, I want this view and I want a long lived connection that's going to give me this view and give me all the updates. And it's great we're learning and the new release will scale that out to a significant number of clients that can do that at the same time. Which I think is it fits in with CQRS and right in with this, how do we make it easy for these people to have that consistent view? And then get consistent updates. From a client side. I think that's going to be cool too. That's not in the course.
It's funny. It's funny. You said we can say this to customers. I don't know if we say? I'm always in the opposite situation. You in a customer facing role, me in a community facing role. When I talk to customers, I'm like, wait, I know this, but can I say it? I don't know if I'm allowed to say it. I'm just the worst at that. It's funny. There's like zero memory for is that account referencable? I don't know. I'm not in sales.
Yes.
Anyway. Yes. Scalable server side filtering for Kafka. That is-
Yeah. And I think that's one of the strengths of the course too, is because if you're just getting started with this, it can be a bit Herculean to try to set up all of your own stuff, even just to play around with it. The course is nice that way that it leads you through the examples and you can just use Cloud and use all that stuff and get it going.
Yep. And I said this in the intro, hopefully you remember me saying it, but I'll say it in the message body as well, not just in the headers. But podcast100, if you are signing up for Confluent Cloud and use that code and you're doing exercises in Anna's event sourcing course. You get an extra hundred dollars of free stuff, which is nice. And since we're talking about it, I'll show you, you're going to use ksqlDB. Make sure you don't leave your application up and running forever. Because it is more expensive to keep that running because there's dedicated resources associated with it.
Yep.
So, chisel off some time, do your thing, shut down the application. Otherwise, your free credits will get eaten up. Be careful. But the idea is you should be able to do all this without paying anything. We give you enough startup stuff for that.
Yeah, absolutely. And I would even go one further and say, if you feel like, oh crap, if I shut this down, I'm going to lose everything. And it's not really quick to redeploy it. Shoot us an email and tell us exactly what you're having trouble with. Because it should be easy just to tear down and then just reapply whatever queries there are in the course.
And they're all copyable and pasteable so.
Yep. Exactly.
Nice. Well, Anna, what's next for you? You got anything else like this that you want to work on or?
Yes, I have a research spike I'm doing, if we can call that my personal research spike.
I like that.
Because I don't really, this is going to sound horrible. I really don't like the way that we do schema's today. At all. Any of them. They're necessary, but I want schema to be more than necessary. I want them to be amazing. And right now the way that you evolve schema's is incredibly restrictive and it just doesn't fit. It slows people down too much for me. And it doesn't represent a beautiful allowable contract. If you think about it, look at something like Kafka, the Kafka protocol. I can hook up pretty much any version of client to Kafka and Kafka will downcast up, do whatever it needs to do to talk to that client.
When we talk about protocol. Yeah, schema's aren't that easy at all to work that way. They break. Are we using subject? Are we using top? They just seem overly complex for something that I feel like we figured out exactly how to do in other areas. And I know data mesh is gaining a lot of steam and momentum. And when we look at treating data as a product, treating that data that you're using as a product, you've got to have compatibility. And I think that there's a better way to do that. And that all ties in with eventing. It's how do we, because you've said it best, Tim, multiple times, things change. They do. They change on ya. A lot. And so you can't fight that, you have to embrace it. And I feel like the schema's we have today do not embrace that. They tolerate it.
And this is consistent with when I talk to folks who are the big enterprise deployment people who are, as it were, doing events right. Kind of following after our recommendations and putting our worldview into practice and all that stuff that we want to see. Their questions are all about how to manage schema at that scale. There is something here and I'm glad that you are spiking on that because I suspect interesting product ideas could come of such a thing.
Yeah. I'm going to think about it while I take a turkey nap next week.
My guest today has been Anna McDonald.
I love turkey naps. Sorry. Happy Turkey Day, everybody.
Happy Thanksgiving Day and a thanks for a part of Streaming Audio.
Thank you very much.
And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer, that's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture design patterns, executable tutorials covering ksqlDB, Kafka streams, and core Kafka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code, PODCAST100 to get an extra a hundred dollars of free Confluent Cloud usage.
Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at TL Berglund on Twitter. That's T-L B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening or reach out in our community Slack or forum. Both are linked in the show notes. And while you're at it, please subscribe to our YouTube channel, and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcast, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support, and we'll see you next time.
What is event sourcing and how does it work?
Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern.
Anna is passionate about event-driven architectures and event patterns. She’s a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage with Apache Kafka course on Confluent Developer. In this episode, she previews the course by providing an overview of what event sourcing is and what you need to know in order to build event-driven systems.
Event sourcing is an architectural design pattern, which defines the approach to handling data operations that are driven by a sequence of events. The pattern ensures that all changes to an application state are captured and stored as an immutable sequence of events, known as a log of events. The events are persisted in an event store, which acts as the system of record.
Unlike traditional databases where only the latest status is saved, an event-based system saves all events into a database in sequential order. If you find a past event is incorrect, you can replay each event from a certain timestamp up to the present to recreate the latest status of data.
Event sourcing is commonly implemented with a command query responsibility segregation (CQRS) system to perform data computation tasks in response to events. To implement CQRS with Kafka, you can use Kafka Connect, along with a database, or alternatively use Kafka with the streaming database ksqlDB.
In addition, Anna also shares about:
EPISODE LINKS
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.
Email Us