Get Started Free
November 4, 2021 | Episode 184

Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck

  • Transcript
  • Notes

Tim Berglund:

Bill Bejeck is an Apache Kafka Committer and the author of Event Streaming with Kafka Streams and ksqlDB. He's also the author of the Kafka Streams course on Confluent Developer. So I had him on the show today to talk about that course. Really, I tried to take him through a whirlwind summary of that content. Can I compress that? It's 25 minutes, let's try. So listen to me and Bill talk about Kafka streams. Before we get there, a reminder, Streaming Audio is brought to you by Confluent Developer, that's developer.consultant.io. There's a, of course, a video course on Kafka Streams there, there are Kafka tutorials that show you step-by-step how to write code, do basic operations with Kafka Streams.

Tim Berglund:

There's a library of event-driven design patterns, all kinds of great resources for running Kafka. When you go there, if you do any of the exercises, you'll have to sign up for Confluent Cloud. You want to use the PODCAST100 code, which gives you an extra $100 of free usage. So, check out Confluent Developer, developer.confluent.io. And now, listen in as Bill and I talk about Kafka Streams.

Tim Berglund:

Hello and welcome to another episode of Streaming Audio, I'm your host, Tim Berglund. And I'm joined here in the studio today by my colleague and author of Kafka Streams in Action, Bill Bejeck.

Bill Bejeck:

Thank you.

Tim Berglund:

I paused for a second, there's several of these books. That's the one, yes. Manning's Kafka Streams in Action.

Bill Bejeck:

There's actually a second edition underway, Event Streaming with Kafka Streams in ksqlDB.

Tim Berglund:

Nice. Okay. That sounds like a whole new book.

Bill Bejeck:

Yeah, it's meant to be a second edition, but I decided to pull back the lens a little bit and cover the whole streaming platform. And then, of course, I look at what I wrote back then a couple of years ago and have to change it.

Tim Berglund:

You got to change it, you just have to. Yeah, this reminds me of, years ago I was taking a graduate philosophy course on metaphysics, and it was a super wild class. Loved the experience. Easily the most difficult reading I've ever done in my life. You could clock five to seven pages an hour.

Bill Bejeck:

Oh wow.

Tim Berglund:

And you'd have to go through it again. It was just at the limit of my ability to comprehend. And there was a philosophy of computation book, and I can't remember the name of the professor, but it was a second edition. And in the preface he said something like, this is the second edition of a book. I'm not sure it's the same book as the first book. And for a metaphysics book, that's actually really funny because of holes and parts. Anyway, this is not a philosophy podcast, this is a Kafka and events streaming podcast and you're a Kafka Committer and a guy who writes about streams and works in developer relations at Confluent. You're also the author of our course on Confluent Developer on Kafka Streams.

Bill Bejeck:

Yes.

Tim Berglund:

And so, I want to talk about that. I want to go over the whole thing in a rapid-fire fashion, if we can, just to give people a tease. But before I do that, I know you've been on the podcast before, but maybe current listeners weren't listening then. Tell us a little bit about what you do and just you, before we get into it.

Bill Bejeck:

Sure. Well, I'll just start at the top. I've been at Confluent for over four years now, which is hard to believe, it's gone by really fast. Spent the first about three-ish years as an engineer in the Kafka Streams team. And then I decided to follow ... I've discovered, partly through writing the book, the first one, the first edition, that I like to teach. And I also liked to present in a teaching manner. It's hard to describe the charge you get out of that.

Tim Berglund:

It's a powerful realization. If it is you and once you see it, it changes your choices.

Bill Bejeck:

Yeah. And I still love to write code, I still like engineering. There are parts of engineering that I've missed being an engineer full-time. But I think the balance, the scales tip a little bit more too, I'd like to be in developer relations that writes code versus being an engineering and does some developer relations stuff. So that's the balance. So I've been at DevX for over a year now. [crosstalk 00:04:43].

Tim Berglund:

I guess that's about right. And the team that Bill's on is, we call it Integration Architecture, but a more standard, as much as these things are standardized, it might be called Developer Relations Engineering in other teams. You write code for the purpose of explaining how things work.

Bill Bejeck:

Yeah. Yes, exactly. And that's one nice thing about our team. If anyone's thinking of joining, when they see about our team is, you've got a lot of latitude for different roles you can get into.

Tim Berglund:

Yeah. Very much. All right, so we've got this course on Confluent Developer. It's a video course on Kafka Streams. I think, of the courses that we have deployed in the last few months, we're recording this at the end of September 2021, easily the most comprehensive because it's written by you. Because you're a guy who likes to teach and you're kind of a world-leading expert on the subject matter. And this podcast is going to be a survey of the material in that course, not a substitute for it, it's more of a teaser. My goal is, if you're listening to this and you're interested in Kafka Streams, you're going to think, wow, I should really go take that course. So Mr. Bejeck ...

Bill Bejeck:

Yes.

Tim Berglund:

What's Kafka Streams?

Bill Bejeck:

Kafka Streams is the native streaming library, if you will, for Apache Kafka and it allows you to do stream processing, event stream processing, on the event you have going into a Kaka cluster. The nice thing is about it is, it's an abstraction over Kafka producers and consumers. So you could do the same things with consumers and producers but you have to handle a lot of the administrative type things when you commit to things along those lines. So it kind of frees you from that, lets you really focus on business logic and what you want to do to those records in the event stream.

Tim Berglund:

There you go. That's a point I make frequently and there's this joke I've been making for years. I have a slide that's just a picture of Admiral Ackbar because it's a trap for you to write that kind of framework code. It can seem fun, it is fun ... Actually, recreating Kafka Streams by yourself wouldn't be fun. That would be very hard. But little bits of framework things, when you encounter those problems and you're just writing consumers and you're thinking, "Oh, hey, the way I'm doing windowing, I could make that a little more sophisticated." And you start down that path and you're building code that isn't delivering value to a customer or a user.

Bill Bejeck:

Yeah, exactly. It's like, developers like to write code and that's all fun but then you get to the point where you're deviating off what the business needs are if you will. And you're spending lots of time ... And the thing is, not to belabor the point, but I think developers lose sight of what you write, you have to support. So the more infrastructure thing, as you said, that's not delivering business value, you still have to support that.

Tim Berglund:

It's going to need care and feeding and you never can give it as much as you want because there are business stakeholders needing you to do other things. And the tempting thing about the framework is, you're in charge. It's not some seemingly crazy business stakeholder who's asking for weird things that require you to do all kinds of special case things that aren't beautiful and violate your schema. the whole business software thing. And the framework is just us. It's just computer science and [crosstalk 00:08:32] be beautiful. So it's very tempting, but you never get to give it the care and feeding it needs. It's always buggy. It's always missing features. And so, using the standard thing, like Kafka Streams, is probably a good idea. Okay, so Kafka Streams is this computational framework that lets you focus more on business logic and less on the framework. It's an abstraction on top of consumers and producers that takes away the boilerplate and framework stuff you'd have to build. How does it scale?

Bill Bejeck:

That's one of my favorite parts about Kafka Streams. I mentioned it's an abstraction of producers and consumers and I like to bring it up intentionally because the Kafka consumer has the rebalanced protocol. Where you start up ... So you start up three consumers and they're part of a consumer group. Logically, it looks like one consumer to the broker. So if for some reason you stop a consumer, it's going to be rebalanced and assign the topic partitions that that consumer ... Say it's consumers A, B, and C. You stop consumer C. There's going to be a rebalance and whatever topic partitions consumer C was responsible for, get reassigned to A and B. You don't have to handle that. So Kafka Streams, natively, has that same thing. There are only two required configurations when you have a Kafka Streams app, and the application ID is one of them [crosstalk 00:10:04].

Tim Berglund:

Which belongs to the group ID, right?

Bill Bejeck:

Yeah, exactly.

Tim Berglund:

And bootstrap servers, you said. I just talked over you.

Bill Bejeck:

And you give it the application ID, and it's the same thing, so you spin up three Kafka streams applications ... Well, I'm sorry, let me start with a simple case. You start with one.

Tim Berglund:

Sure. One instance, one application. Application ID is Angry Monkey or something.

Bill Bejeck:

Yeah. And you want to have more processing power. You want to process more records. You spin up a second instance with that same application ID. Logically it's one application. Now you've got two separate applications running two separate [crosstalk 00:10:43].

Tim Berglund:

Two instances.

Bill Bejeck:

Two instances running, but logically it's one. And the same thing happens, whatever ... Let's just say you're processing a topic with four partitions. That initial instance is assigned all four partitions, but now you spin up your second one, it's going to dynamically rebalance ... Well, it's going to rebalance and then dynamically allocate two of those partitions to that new instance. And you could do that up to four application instances with the same ID. And they would all end up [crosstalk 00:11:15].

Tim Berglund:

Given that you've got four partitions.

Bill Bejeck:

Exactly.

Tim Berglund:

But Bill, says the skeptic, regular consumers do that.

Bill Bejeck:

Yes, that's true, and that's leveraging ... To me, that's the beauty of it because it's leveraging the same rebalanced protocol. So it's ...

Tim Berglund:

Exactly. So it's Kafka built out of Kafka. And we're going to talk about stateful operations later, there's some real money that comes in here. There are some pieces we don't have on the table, but if you're thinking we just talked about how consumer groups to scale, yeah we did. But your state comes with you, is the short story, and that's an amazingly good thing.

Bill Bejeck:

Yes, yes.

Tim Berglund:

Okay. The structure of the API, I read about a thing called a DSL and a declarative mode or the processing API. What's that all about? Tell us that.

Bill Bejeck:

Sure. Kafka Streams comes with ... It's got two APIs and you've got the DSL, which is, it follows a fluent pattern where you can say ... you create a streams builder object, that's what starts everything else, but then you say, build it at the stream and that returns a case stream object. And then you can just keep chaining different methods on top of that,

Tim Berglund:

Like methods that result in a modified case stream, return an instance of a case stream [crosstalk 00:12:42] dot, dot, dot, dot, dot.

Bill Bejeck:

Typical pattern. So you've got the DSL, and now I'm just going to talk about, you had mentioned stateful before, and now I'm just going to talk about stateless operators. So a typical pattern could be, say you build it at the stream and then you could say dot filter, where you pass in a predicate, I only want to see records that match this given condition. And then you can call a mapValues after that, where, okay, I want to change each key-value pair that's coming through that passes the filter. I want to modify it. I want to create a new value.

Tim Berglund:

Upend the last name and first name to be name or something like that.

Bill Bejeck:

Exactly.

Tim Berglund:

For some very sophisticated stream processing would take years of research.

Bill Bejeck:

Exactly. But with the DSL, it's the easiest way to get started and it's very opinionated with what you can do. But you're never telling Kafka Streams how to do it, you're just saying what I want to do the record. So in that respect, the burden on the developer is very light. Burden's not the right word, but you're really just focused on what you want to do. But as with any other framework, it's never going to solve every single problem for everybody. So there's something called the processor API, which is the ... not the opposite, but you're wiring everything up. With the DSL, you're just saying, build it at the stream, operator, operator, the operator that returns a modified case stream instance. And how it all gets wired up is handled for you. And then, you're really just providing the little bits of logic, because, under the covers, Kafka Streams has processors. But you're not providing all the code for that, you're just providing the snippet that does, when I mentioned the predicate, you're just providing ... Usually she's just a Lambda.

Tim Berglund:

Usually, yes, you pass in a lot of Lambdas.

Bill Bejeck:

Yeah, exactly. And you're just passing in the thing that does the evaluation. But under the covers, there's a processor, if you will, that's got a little more code to it. But that's all handled for you. So the processor API, you provide the entire processor. There's an interface you have to implement and you provide ... It's still pretty easy to do, but you're providing all the code there. But it gives you the ultimate in flexibility so you don't ... If there's something ... This comes into play more in stateful operators, but there's something you really want to do that's custom to your business, you can implement that. And really, the processor API, you're unlimited with what you can do.

Tim Berglund:

Right. I don't think I've ever said this on air before, I need to confess something. I love Kafka Streams, that's not a confession, that's just a bonafide. I think it's an amazing solution.

Bill Bejeck:

I do too.

Tim Berglund:

I know. You've dedicated some evenings to this. I've never liked the fact that we say DSL. I feel like we should say the fluent API or something like that. It's not a DSL. It's an API with nicely named methods, but, I don't know. Is this common? Am I the only one?

Bill Bejeck:

No, but that's great that you say that because that's what I was struggling for before, I was thinking of the fluent builder pattern. I didn't want to say that because it's not a builder. But fluent API is correct. And that's what you're doing, you're just going along, changing these methods and then-

Tim Berglund:

And it should not be otherwise. I mean, it's a functional stream processing API, it should use the fluid builder pattern. It's really good, it's super easy to read. I mean, you can write streams code that's not easy to read, but that's your fault. It's not the API, API is pretty easy to understand, so ... Anyway, we don't need to ... It's not DSL. But when you're writing the builder code, the DSL, I'll just say it. It's interesting that that code is creating data structures. That code executes, it's creating data structures inside the builder, which then get operated in these processors and inside workers and threads and all the stuff in the framework that you shouldn't necessarily need to care about. When you're doing the processor thing that's like, okay, here's a message. You've got access to all these states and storage and all these parts of the framework. You do what you want.

Bill Bejeck:

Yeah. That's a good point you raise because when you're using the fluent API, DSL, under the covers it's building topology and it's wiring everything up. Because when you create a source node, and then your source node [inaudible 00:17:37] a topic, it's going to consume those records or get those records in. But then you need to hand it off to another processor for it to do something. And when you're using the fluent approach, that's handled for you. You just say, "Build it at the stream," and that sets up the source node and lets it go back to the filter. And you say, dot filter. Under the covers, that's establishing the parent/child relationship, where the source node is the parent of the filter.

Tim Berglund:

Right.

Bill Bejeck:

But then if you put-

Tim Berglund:

Not modifying that first stream, you can't.

Bill Bejeck:

No, no, no, no.

Tim Berglund:

[crosstalk 00:18:13] a new one. Yeah.

Bill Bejeck:

Yeah. And so, that's the parent of the filter. So it's going to feed its records to the filter operator. Filter node, if you will. But then, if you do something after that, which you will, you were going to filter records, well then you want to do something with the ones that pass through the filter. That becomes the parent of the next note and so on. So it builds up this typology, which is really just a directed acyclic graph. DAG.

Tim Berglund:

Always a DAG in things like this.

Bill Bejeck:

And so these records come in and they flow through, a parent can have more than one child node. And so, my point in bringing that up is, when you go to the processor API, you have to do those relationships. You create your processor node and declare it as a source node. And then, when you create, and you give it a unique name. And then when you create your next processor, part of the arguments you provide is, you give it the parent name or names that are going to feed that one sensor.

Tim Berglund:

Ah, got you.

Bill Bejeck:

So you're explicitly [crosstalk 00:19:29]. Yeah. And it's not hard, but it's more, I like to use the term boilerplate type work.

Tim Berglund:

Yeah, yeah, you're doing more stuff yourself.

Bill Bejeck:

Yeah. Yeah.

Tim Berglund:

So my attempts at being rapid-fire are failing. We're going to go over, everybody. This podcast is going to go a little long, but not too long. Bill and I need to get through this stuff. And the DAG thing, it's funny, you should be able to look at the fluent builder API and just smell a DAG. It's like walking down South Broadway by all the antique shops, there's this thing that you smell. It's just, you know that it's there. And so, yeah, good point that you bring up that particular data structure. Tables, we didn't really define streams. We're talking about Kafka, there are topics, there are messages, I get it. Tables now are a part of our life. What are tables all about?

Bill Bejeck:

Sure. So I mentioned K-Stream before. K-Stream is considered an event stream. And what that means, and I'll tie this into, you mentioned tables, I'll tie this in with tables in a second. Under the covers it's Kafka, so everything's a key-value pair. And in an event stream, records, key values where the keys are the same, don't relate to each other. If we have a banking app and we're looking at customer interactions, and let's just say the key is the customer name. And we see some records coming in where Tim Berglund's the key, even though we've got multiple keys with Tim Berglund as a key, they're not related to each other. They're independent events. Now we've got ... A K-table is, that's considered an update stream. And what that means, now the keys that are related to each other, the keys are the same, are related to each other. They're considered updates to previous ones.

Tim Berglund:

Got it. So my customer ID is 24601 and so every time that shows up, that's a Tim Berglund thing that subsequent ones are an update to the previous one.

Bill Bejeck:

Exactly. And banking is a great example. You withdraw cash. So that's your latest action. And then you're going to deposit cash or whatever, and it doesn't make sense what I just said, but still, those subsequent things, the subsequent action, is an update to the previous one. So you really only want to ... usually you're only concerned about the latest one.

Tim Berglund:

The latest one. So a table gives you that, an in-memory view of the current state of those users, or whatever that object is in the stream.

Bill Bejeck:

Yeah, yeah. And I guess maybe stock prices are a better analogy.

Tim Berglund:

Yeah. I always use a user account. So you could put that in a database table, if you're doing this in an event-driven way, every time somebody updates creates, deletes, you're producing a message with a serialized copy of the user object.

Bill Bejeck:

Yeah. And you had mentioned keeping ... It was a great segue. You mentioned, okay, we've had a table now and it's got this is in-memory version if you will. Whatever. It's actually, K-tables leads us into stateful [crosstalk 00:22:52].

Tim Berglund:

Just going to pivot us there, so let's do it.

Bill Bejeck:

And a K-Table is ... K-Streams are stateless and they don't keep anything. Records come in, you do a filter and it comes in, true/false, it's done. K-Table, it's stateful, and by default Kafka Streams ... they're just [inaudible 00:23:15] underneath the covers, and by default, it uses RocksDB as the implementation for persistent stores. Now you can, if you want to use in-memory stores, there's [crosstalk 00:23:26].

Tim Berglund:

Or you could plug in anything else you wanted if you wanted to, right?

Bill Bejeck:

Yeah, and also with the API, there's just an interface that you have a state store supply I believe. You just have to-

Tim Berglund:

Somebody did that with ScyllaDB once. I talked to a person who had done that.

Bill Bejeck:

Yeah. And you just have to provide it and it just has to adhere to a few methods on the interface and that's it. But by default, it's stateful. I'm sorry, it's a persistent store, on a local disk. And that's one of my favorite things, it's on a local desk. This means that when it comes in when it goes to look up the value, key-value pairs, it's key-value stores, it's right there. There's no network hop, no going over, it's right there at a local desk. And then, they're backed up by what's called a Change Log topic. So you never really have to worry about it, you lose your Kafka Stream instance, struck by lightning, boom, gone. All that data that was in that store has been persisted in what's called a changelog topic. So when you spin up an instance to replace the one you lost, it's going to replenish the state store from the Change Log topic. So you are back in business. And that's-

Tim Berglund:

On local disk and it's in the cluster if you need to go to the network and bring it in off the cluster, it's persisted there and replicated and everything that things in your cluster are.

Bill Bejeck:

Yeah, exactly. And I had mentioned, you can opt-in and do in-memory stores. In-memory is going to be a little faster for you because it doesn't go to disk read. But when you shut down an instance, even if it doesn't get struck by lightning, you just decide you need to turn it off [inaudible 00:25:10]. It's going to lose everything because it was only in memory. But when it comes back up, when you start it back up, again the Change Log topic, which is your best friend, it's going to replenish it. So even though it's in-memory, you pick up right where you left off.

Tim Berglund:

Tell us about timestamps. This is a tough one to do quickly, I know. What's time all about?

Bill Bejeck:

Okay. Back in Kafka, I think it was 0.11, timestamps were introduced to the records. So when you produce a record, if you don't ... Part of the producer record, one of the overloaded constructions, you can give it a timestamp. If you don't, the producer puts one in for you and timestamps do all kinds of wonderful things. And that allows the brokers to decide, okay, you can specify how long you want records to live and the timestamps help drive that because it's going to do that. It's going to go find, okay, this record is past the time, whatever. Or the segment. I'm sorry, I'm probably going into too much detail, but it finds a segment that's ready to go and will delete that segment.

Bill Bejeck:

Okay. So we've talked about timestamps. Kafka Stream uses those timestamps for driving behavior with stateful applications. With windowing and ... Let me back this up. There's a notion of stream time, which is the highest timestamp that it's seen so far. And Kafka Streams uses those timestamps to determine which record to process next. So it's going to look at its import partition and the one with the lowest timestamp, the topic partition with the lowest timestamp on it, is what gets picked next for processing records.

Tim Berglund:

Got it.

Bill Bejeck:

And then, well, I guess this kind of ... So we've got this notion of stream time, I mentioned-

Tim Berglund:

Which you can't really talk about without talking about windows. So if you want to go to windows, go to windows.

Bill Bejeck:

Yeah. And so stream time is the ... I'm sorry, did I say lowest? Stream time is the furthest timestamp you've seen so far.

Tim Berglund:

Okay. Most recent.

Bill Bejeck:

Most recent timestamp.

Tim Berglund:

But most progressed in time.

Bill Bejeck:

And the significant thing about that is is as records come in with timestamps that are greater than that, that's what advances stream time. But if you get what's known as an out-of-order record, that's earlier than stream time, it doesn't advance stream time. That record gets processed, but it doesn't advance. So that helps us pivot into windowing. Stateless operators, I'm sorry, stateful operators. Think of an aggregation count as a good one, you just want to do a count of [crosstalk 00:28:22].

Tim Berglund:

How many times user, 24601 updated his account in the last five minutes. Weird aggregation, but let's go with it.

Bill Bejeck:

Yeah. So records come in, you're keeping track by key, and you're going to do a count. But that's just going to keep growing over time. Customer 24601, let's just say this person's really active all the time, that count just keeps getting bigger. So windowing gives you a way to bucket it, for lack of a scientific term. And you can say, I only want to know, well, not that I only want to, it'll give you the count for a defined window. And there are different windows in Kafka Streams. You've got just, what's called a tumbling window, which opens a certain ... You say windowed by a tumbling window of, say, an hour. It's going to give you the count for an hour. The last hour when that customer did something.

Tim Berglund:

[crosstalk 00:29:21] Kafka sets at the end of that hour.

Bill Bejeck:

Yeah, exactly.

Tim Berglund:

Those are epic aligned, right? So that's on an hour boundary.

Bill Bejeck:

Exactly. But driven by the time, this is where the timestamps come into play, driven by ... The window size is driven by the timestamps on the record. So customer 24601, super busy, but then it doesn't do anything for ... let's say he stops working at 12:00. His last record's at 12:00. Another record doesn't come until 2:00 PM by our clock on the wall, that's still within the window of the aggregation.

Tim Berglund:

Yes.

Bill Bejeck:

So it's driven by the timestamps. So the windows only advance based on the timestamps of the records themselves.

Tim Berglund:

Got you. Oh, okay, so the record has to happen. There's no thread sitting there with a timer saying, "Okay, your window's closed."

Bill Bejeck:

Exactly. Exactly. So the timestamps of the records are what drive the behavior of the windowing.

Tim Berglund:

Yeah. Yeah. Vastly simpler that way. I wouldn't want to think of the timers, that sounds terrible. I'm sorry I even said that on air. We're not going to edit it out though, I'm just going to take responsibility for those words. Finally, last question, testing. If I want to do unit testing of a streams job, a streams topology, and I don't want to have a Kafka Cluster around because that wouldn't be a unit test and it would be overall terrible. What's my story?

Bill Bejeck:

Yes. That's one of the, and again, a great gem in the Kafka Streams library, there's something called the Typology Test Driver. What that does is that allows you to write an end-to-end unit test of your Kafka Streams application. But there's no broker involved. So you provide the input records and you specify the input records and it'll run through your entire topology. Even if you have state in there, your stateful operators, it hits those. And then you extract the output, there are methods for saying, okay, give me the output records. And then you can validate that your entire topology, you're expecting some kind of final output.

Tim Berglund:

Nice. And those final outputs come in terms of your types. Your [inaudible 00:31:44] objects, whatever the types of the stream or tables or whatever it was. So it's not like Kafka ByteArrays or anything like that.

Bill Bejeck:

Yeah, exactly. And it comes out with the expected types. And if you do need to do more of an integration-type test, I would point people to test containers. I was pointed to that by my good friend, Viktor Gamov and test containers are very nice because-

Tim Berglund:

A long-time advocate of test containers before it was cool.

Bill Bejeck:

Yeah. And it really makes ... if you need to use a [inaudible 00:32:26] broker, it makes life a lot easier for you so that you're not handling ... It integrates very well with the J-unit test framework.

Tim Berglund:

You thinking about writing any other courses?

Bill Bejeck:

Yes. I would love to do a course on schema registry. And part of that course would be multiple events in a topic because it's not a ... I wouldn't say it's like a use case. Everyone needs [inaudible 00:32:55], but when you need it, there's some extra consideration you [inaudible 00:33:00]. But there are cases where, if you have records that need to be distinct objects in your domain model if you will, but they're closely related so you'd like to process them in the same stream if you will. And I find that to be useful ... That could be a useful thing to do, but there are some things you need to know. Tricks of the trade, if you will, that you need to know about doing that.

Tim Berglund:

Well, I love your work. So I hope we get to do that one soon.

Bill Bejeck:

Yeah, all right.

Tim Berglund:

My guest today has been Bill Bejeck. Bill, thanks for being, once again, a part of Streaming Audio.

Bill Bejeck:

All right. Thanks for having me, Tim.

Tim Berglund:

And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer. That's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture and design patterns, executable tutorials covering ksqlDB, Kafka streams, and core Kaka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code PODCAST100 to get an extra $100 of free Confluent Cloud usage. Anyway, as always, I hope this podcast was helpful to you.

Tim Berglund:

If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening, or reach out in our community Slack or forum. Both are linked in the show notes. And while you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support and we'll see you next time.

Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic’s message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Confluent Developer, where he delves into what Kafka Streams is, how to use it, and how it works. 

Kafka Streams provides the abstraction over Kafka consumers and producers by minimizing administrative details like the need to code and manage frameworks required when using plain Kafka consumers and producers to process streams. Kafka Streams is declarative—you can state what you want to do, rather than how to do it. Kafka Streams leverages the KafkaConsumer protocol internally; it inherits its dynamic scaling properties and the consumer group protocol to dynamically redistribute the workload. When Kafka Streams applications are deployed separately but have the same application.id, they are logically still one application. 

Kafka Streams has two processing APIs, the declarative API or domain-specific language (DSL)  is a high-level language that enables you to build anything needed with a processor topology, whereas the Processor API lets you specify a processor typology node by node, providing the ultimate flexibility. To underline the differences between the two APIs, Bill says it’s almost like using the object-relational mapping framework (ORM) versus SQL. 

The Kafka Streams 101 course is designed to get you started with Kafka Streams and to help you learn the fundamentals of: 

  • How streams and tables work 
  • How stateless and stateful operations work 
  • How to handle time windows and out of order data
  • How to deploy Kafka Streams

Continue Listening

Episode 185November 9, 2021 | 12 min

Confluent Platform 7.0: New Features + Updates

Confluent Platform 7.0 has launched and includes Apache Kafka 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking.

Episode 186November 16, 2021 | 37 min

Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell

If you ever wondered what exactly dead letter queues (DLQs) are and how to use them, Jason Bell (Senior DataOps Engineer, Digitalis) has an answer for you. Dead letter queues are a feature of Kafka Connect that acts as the destination for failed messages due to errors like improper message deserialization and improper message formatting. Lots of Jason’s work is around Kafka Connect and the Kafka Streams API, and in this episode, he explains the fundamentals of dead letter queues, how to use them, and the parameters around them.

Episode 187November 23, 2021 | 29 min

Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger

Many of us find ourselves in the position of equipping others to use Apache Kafka after we’ve gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream processing.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free