September 15, 2020 | Episode 118

Top 6 Things to Know About Apache Kafka ft. Gwen Shapira

Transcript
Notes

Tim Berglund (00:00):

If you subscribe to this podcast, you already know who Gwen Shapira is. But if you're new and you don't, it's okay, because you can meet her today as we talk about the six things she wants everyone to know about Kafka in honor of Confluent's sixth birthday. It's all on today's episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund (00:27):

Hello and welcome back to another episode of Streaming Audio. I am joined today by my Confluent podcast co-host, Gwen Shapira. Gwen, welcome.

Gwen Shapira (00:38):

Hey, Tim. So awesome to be here.

Tim Berglund (00:41):

I love it when we do shows together. And you know I think we don't do it enough, and we should do more, but I'm glad you're here.

Gwen Shapira (00:49):

Super happy to be here.

Tim Berglund (00:50):

Now the occasion, besides the fact that it's just good to talk, is that Confluent is turning six. The company's sixth birthday is this month. And, boy, I'm embarrassed. I don't know the day. What's the day? Is it the 11th? I thought it was [crosstalk 00:01:10].

Gwen Shapira (01:09):

I don't know the day either.

Tim Berglund (01:09):

Yeah.

Gwen Shapira (01:09):

I should.

Tim Berglund (01:09):

Like some otherwise famous day, or infamous. Anyway, I guess we're both fired for not knowing enough Confluent trivia. But it is in September and this is six years. And just a couple of weeks ago, you and Michael Noll and Ben Stopford and I talked about you all having been in the company for five years. So this is a similar occasion. You're a long-tenured employee, who everybody knows and has made massive contributions to the community. And so, just thought, "Hey, we should talk about Confluent turning six." And in honor of its sixth birthday, I wanted to ask you for a list of six things you think people should know about Kafka.

Gwen Shapira (01:59):

Wow. That's a big one.

Tim Berglund (02:01):

Yes, it is. There are-

Gwen Shapira (02:03):

[crosstalk 00:02:03].

Tim Berglund (02:03):

... so many things.

Gwen Shapira (02:03):

Can I start by lodging a protest that Confluent's turning six, and I did not get a cupcake? You know that if we still had an office, we would be getting cupcakes.

Tim Berglund (02:15):

We would so be getting cupcakes. And I agree, that is terrible. And I hope we're meeting in our office again by next June, because that's my birthday, and I want to be there for my birthday day. I was, I think, last year and I ate a cupcake. And I was on Instagram and all those kinds of things that a guy like me likes. Well, I guess I like being on Instagram more than I like cupcakes, but anyway..., Yes, I agree. Protest lodged.

Tim Berglund (02:43):

Anyway, six things people should know. And this is the woman who has literally written the book, helped many actual paying customers, who are building real systems with money on the table, make the systems work, and now Engineering Manager on the Cloud Kafka team. I didn't really need to introduce Gwen to your listeners, but you should probably listen to her list. So what's first?

Gwen Shapira (03:14):

Yeah. I think the first thing is that people sometimes forget that Kafka has brokers and clients.

Tim Berglund (03:20):

Right.

Gwen Shapira (03:22):

And the clients are actually like... They're not like rest where you have like a very, very thin layer on top of a protocol. They're actually very... They have a lot of logic in them. And it's a core component of your system in a way that similar to how the brokers are. And people kind of forget about it. They don't always collect metrics from the client. Clients have very useful metrics about latency, about errors. If a client tries to send something, and the broker responds with an error, the client will actually count how many errors, how many retries. All those things are presumably important. And people just forget to look at them.

Gwen Shapira (04:02):

And then, clients also have logs. They could have important error messages that you probably should collect and look at and maybe even alert on. So it's an important part of the system. And I know that people who are into distributed tracing, because they never forget about the clients, because for them, it's another [inaudible 00:04:20] in their spans and tracers, and it's one of the things that they always look at. But I don't know if distributed tracing is already a universally-used practice. So regardless, don't forget that clients exist and have important information.

Tim Berglund (04:41):

Yeah. Good. I think it's something a lot of people don't realize how much logic is in the client. I just think of things like consumer group rebalancing. You're tempted to think that's a thing that the brokers do, and they kind of tell you what partitions you have, but there's a lot of code in the client to do that.

Gwen Shapira (05:02):

And [inaudible 00:05:02] and also for the producer... the way we manage batches, the way we manage the reconnects with specific brokers, leader elections, transactions for that matter. A lot of stuff is happening, both on the producer and on the consumer. Obviously, if you're a Confluent Cloud user, the client is all you have. We have a metric API. You can look at some broker metrics. We aggregate them. We clean them up. We don't want you to panic about stuff that we, at the broker end, are taking care of you, but the clients are all yours. They're on your side. You're responsible for them. You should be monitoring them.

Tim Berglund (05:40):

Exactly. And that's.... I always try to nuance that when we talk about Cloud lifting the operational burden. It would be typical language coming out of maybe a marketing team to overstate that and say, "Well, all the operational burden is gone." Well, no. I mean, operating services in the Cloud is a thing. It's easier. You want that and plus, you've got your programs and you need to operate those and monitoring is part of that.

Gwen Shapira (06:15):

I feel like everyone who is really running stuff in the Cloud, probably not just running Kafka in the Cloud. They're running a lot of stuff in the Cloud. It's kind of hard to miss that. You still need to manage everything you have in the Cloud.

Tim Berglund (06:27):

It is.

Gwen Shapira (06:28):

I've spent a lot of this morning on AWS support on the phone, trying to get them to reproduce a certain load balancer issue to see if our fix for handling actually fixes the problem. Technically, it [inaudible 00:06:44] manage load balancer, and I shouldn't be spending the morning trying to figure out if my workaround for an issue actually fixed the problem or not, and here we are.

Tim Berglund (06:51):

Uh-huh (affirmative). But as we know, and I think you and I talked about that kind of thing before, that's sort of halfway in the background cloud-managed networking resource that isn't from the typical perspective you take on stuff in the Cloud. You're thinking of storage and compute and maybe some other kind of thing that does something interesting between storage and compute. Those are the biggies. And "Oh, yeah. Sure. There are load balancers. I have one in my Terraform config. It gets set up. Whatever." It's just kind of back there. But there are a finite number of those. They don't always work right. And you have to know these things, but anyway... Clients have metrics.

Gwen Shapira (07:29):

Okay. So [crosstalk 00:07:31].

Tim Berglund (07:31):

Yes.

Gwen Shapira (07:31):

The other thing that they really wish everyone paid attention to, or more attention to, is bug-fix releases. So Apache Kafka has major releases, things like 1.0, 2.0, soon to arrive 3.0. And then we have minor releases, things like 2.1, 2.2, now just released 2.6. And then, for every one of those, we also have a bug fix release, so 2.5.1, 2.5.2, et cetera. And the bug fix releases have fixers for bugs. They don't have new features. We are very picky about what we backport into bug fix releases. It has to be well tested. It has to be low risk. And the goal is that people who encounter bugs will install a bug fix release and will have their bugs fixed and not get any new bugs. And this is something we've been putting a lot of effort into. So basically monitor the Kafka mailing list for release announcements. And if you see a bug fix release for the version you own, you should install it with celerity. I learned a new word recently, and now I'm overusing it.

Tim Berglund (08:52):

Ooh. Okay. Who? What is that word again?

Gwen Shapira (08:53):

Celerity.

Tim Berglund (08:56):

Celerity.

Gwen Shapira (08:57):

I'm probably mispronouncing it because I learn it from a book, but it's with urgency, with a sense of urgency, basically. And I'm reading the General Grant biography and the part with that was... I don't know why, but half the book is he's doing stuff with celerity.

Tim Berglund (09:16):

Well, okay. That is how you should approach them. So bug fix releases include fixes for bugs. But, Gwen, there isn't a release video for those with me in front of a river somewhere, and there isn't a blog post. So I guess people just think they're not exciting or what?

Gwen Shapira (09:35):

That's the whole point. They should not be exciting. Excitement is good, but when you run stuff in production, you very often try to minimize excitement.

Tim Berglund (09:44):

Mm-hmm (affirmative).

Gwen Shapira (09:45):

The goal of bug fix releases is absolutely to minimize the amount of excitement in your life. For example, bugs that get your replicas stuck in a way that doesn't allow a broker to come up, and then you have to delete a bunch of files in order to and replicate in order to even start something. It's super exciting. Some of them were the most exciting nights in my life has been spent trying to recover from those bugs. And we're trying to minimize excitement. You know, rivers are amazing. You can do a lot of exciting things. You can go rafting and things on rivers. I don't really feel like Kafka releases should be your source of excitement in life.

Tim Berglund (10:23):

You want them not to be. Exactly. You want the excitement to be all of the great systems you've built on top of Kafka and the great applications that you've written that serve people. That's exciting.

Gwen Shapira (10:37):

And the time you don't have to spend troubleshooting all those bugs that were already fixed, and therefore, you can go do rock climbing and river rafting and mountain biking [crosstalk 00:10:46].

Tim Berglund (10:46):

Or film making or hardware hacking or home brewing or maybe you write poems. Any of those kinds of things are there for you.

Gwen Shapira (10:54):

I heard kombucha is where it's at these days.

Tim Berglund (10:57):

I have a friend who thinks that. That's all I'll say. He definitely-

Gwen Shapira (11:02):

It's [inaudible 00:11:02] an acquired taste.

Tim Berglund (11:03):

Yeah. No, I'm not anti-kombucha. It's just he's definitely... He's brewing his own, and he's pretty into it. His name is Tim, also. So, Tim, shout out to you and your kombucha. You and Amber and the kids are awesome. Okay. So install bug fix releases, people. Clients have metrics. Bug fix releases include fixes for bugs, which are good. What is your third thing you want people to know about Kafka?

Gwen Shapira (11:32):

Idempotent producer.

Tim Berglund (11:33):

Ooh.

Gwen Shapira (11:34):

The fact that you can switch a small config and avoid duplicates, which is almost [inaudible 00:11:41] exactly once for 90% of the use cases.

Tim Berglund (11:44):

Right. For single-partition use cases, that gets you there.

Gwen Shapira (11:51):

Yeah. Exactly.

Tim Berglund (11:52):

Is that statement true? Okay. I think that statement's true.

Gwen Shapira (11:55):

Yeah. [inaudible 00:11:56] it is. And a lot of use cases still use Kafka as a big pipe.

Tim Berglund (11:59):

Right.

Gwen Shapira (11:59):

So you write stuff into Kafka, and then they use a connector to dump into a big query. That's still very, very common pattern. And I think idempotent producer is a pretty big deal in those cases.

Tim Berglund (12:15):

Yeah. Yeah. Could you talk to us about how it works? I know it's a config switch. You said that. And listeners, if you didn't know that, it is. But walk us through it. First of all, if there's anybody who's new to this and doesn't know what idempotent means, tell them what that means. And then, how does it get the job done?

Gwen Shapira (12:36):

Yeah. It's funny. I forgot that there are maybe people who don't know what idempotent means. So idempotent basically means that if you do the same operation one after another, it doesn't change the result. So if you put... Normally without it, if you produce an event to Kafka, and then you produce the exact same event again, you'll have the same event twice. Even worse, if you produce an event once, and the producer sends it to Kafka, Kafka gets the event, writes it to the log and for whatever reason, the producer doesn't know that it happened and automatically retries and sends it again, you will get a same even twice and you don't even know it. You didn't intend for it to show up twice. You didn't do anything different, and here you are with the same event twice.

Tim Berglund (13:20):

Right.

Gwen Shapira (13:20):

Idempotency basically means that no matter how much the producer retries, you will only get the event one time, which a lot of use cases where you count stuff or anything financial, it's obviously very, very, very important.

Tim Berglund (13:33):

Yes. So idempotency is... I always compare it to turning on a light switch in a room. So if you walk into a room and it's dark and you're groping for the switch and you swipe your hand up over the switch, it's on. If you do that a second time, it's still on. It's not any more on, and it didn't turn off. How does it actually work? Could you tell us what is going on under the covers?

Gwen Shapira (13:59):

Yeah. It's actually fairly simple and like all good ideas in my mind, are [inaudible 00:14:04] from TCP, anything around reliability. I feel like that was the pioneering of a lot of reliability ideas. So we have a sequence of numbers, and when the producer sends an event, it attaches a sequence number. And if the broker gets the same sequence number twice from the same producer... that bit is important... it basically says, "Hey, sorry I already got that." And the producer will log, "Hey, I have a duplicate," so the broker did not write the duplicate. And I think it shows up as either in forewarning in your logs, so you will see that it happened but nothing to worry about, because it means that stuff worked as expected, and you only got it once.

Tim Berglund (14:51):

So that's from a single producer which has a unique ID-

Gwen Shapira (14:56):

Exactly.

Tim Berglund (14:56):

[crosstalk 00:14:56].

Gwen Shapira (14:56):

That's a big part of the key is the unique producer ID.

Tim Berglund (15:00):

Right.

Gwen Shapira (15:00):

And it turned out that just a unique producer ID and just a sequence number was not enough. There's a lot of tricky edge conditions. For example, how do producers know the next sequence number to use? And so, there is a lot of stuff around that. And one of the edge conditions that was really interesting... I don't know if you used it. You probably got since all kinds of errors around unknown producer kind of thing.

Tim Berglund (15:30):

No. No. Go on.

Gwen Shapira (15:32):

You have not seen those? That's good.

Tim Berglund (15:34):

Yes.

Gwen Shapira (15:35):

Yeah. So there is some logic around what if two producers show up with the same ID and how to handle that. And that actually got a bit better in... I'm trying to remember the release... 2.4, I want to think.

Tim Berglund (15:54):

That sounds right. There was something idempotent producer-related in 2.4 that I remember, but my memory for these things is awful.

Gwen Shapira (16:02):

[crosstalk 00:16:02]. I remember KIP, because it was KIP 360, and it's just this really cool number. So a lot of improvements around the producers and transactions and basically fencing. If you have something restart, how do you fence things off has ended up over there. So, as I said, usually good to have newer versions where we fix things, but...

Gwen Shapira (16:27):

Oh, and even starting before that, we had some improvements. It used to be that you needed to have only one in-flight request in order to enjoy the idempotent producer. You can now do it with more. So there have been a lot of improvements that weren't in [inaudible 00:16:45] releases [crosstalk 00:16:46].

Tim Berglund (16:44):

Excellent.

Gwen Shapira (16:45):

Yeah. Just getting better and better. [crosstalk 00:16:49].

Tim Berglund (16:49):

Is generating the sequence number on the... This is an unreasonably detailed question. I almost feel guilty for asking, but... Does generating the sequence number on the producer side effect the thread-safety of the producer?

Gwen Shapira (17:01):

The producer itself always has one thread, so you can [inaudible 00:17:07] send from a lot of threads, and that will actually be [inaudible 00:17:11] correctly. [crosstalk 00:17:13].

Tim Berglund (17:14):

There you go. Because internally, there is one thread doing that...

Gwen Shapira (17:16):

Exactly.

Tim Berglund (17:16):

... which is-

Gwen Shapira (17:16):

That's one of, I think, best design decisions that have happened to the producer, and it's... Obviously, that's a JAVA producer we're talking about.

Tim Berglund (17:27):

Yes.

Gwen Shapira (17:27):

It's not true for like... I don't know what other clients are doing, but it's so much easier to reason about what is happening, especially around sequence numbers and retries, when you only have one thread to worry about.

Tim Berglund (17:42):

Because otherwise... Yeah, with retries and multiple threads, I would get a different job. Without retries-

Gwen Shapira (17:49):

You would get a different job.

Tim Berglund (17:53):

What'd you say?

Gwen Shapira (17:56):

Just saying that I think a lot of engineers...

Tim Berglund (17:58):

Yeah, yeah. That's not pleasant stuff to lock on that kind of thing.

Gwen Shapira (18:04):

Yeah. [crosstalk 00:18:08].

Tim Berglund (18:08):

So, the idempotent producer. There's a thread. It maintains a sequence number in the producer. Producers themselves have a unique ID. Given that, you can tell if you've seen this movie before, as it were.

Gwen Shapira (18:21):

Yeah. Exactly.

Tim Berglund (18:24):

All right. That is three. So we're up to Confluent's third birthday so far. I think I joined when Confluent was three. That was 2017.

Gwen Shapira (18:32):

And that's actually only shortly after we released exactly [inaudible 00:18:38] with them, but [inaudible 00:18:39] the producer, so I may be still doing things in a logical order.

Tim Berglund (18:42):

Well, that blog post was, I want to say, right after I joined or right around the time I joined. And they were like, "Hey, Tim. You run the blog now." And I'm like, "Uh, what? Oh, okay. That's-

Gwen Shapira (18:55):

I don't remember that you run the blog, but I remember that, at some point, they wanted me to emcee Kafka Summit. And the moment you joined, I just [inaudible 00:19:06] out of there immediately.

Tim Berglund (19:07):

I do remember that, however, I didn't mind.

Gwen Shapira (19:12):

Oh my God, and you're so good at. [inaudible 00:19:15] out of that was the best decision I've made.

Tim Berglund (19:16):

That's... Thank you. That's actually probably my favorite thing that I do at work is that. It only happens once a year. All right. Number four. Confluent's turning four. What is the fourth thing people should know about Kafka.

Gwen Shapira (19:35):

So I want to remain in that kind of 2016-ish, '17-ish, set of mind and remind people that Connect is part of Kafka. Kafka Connect is [inaudible 00:19:47]. There are connectors that gets data from Kafka and write it elsewhere or get data from elsewhere and write it to Kafka. And it is actually a part of the Apache Kafka project, in the sense that the framework is part of the Apache Kafka project. Connectors end up being all over the place. It's just funny. Sometimes for fun, I go into GitHub and look for Kafka Connector elastic and count. And it's a high number and keeps going up, which I feel like... The success of the ecosystem is one of the most amazing things that the Kafka community ever, and I feel like Connect is a huge part of it. I'm really proud of the decision to put the Connect framework in Kafka and just let [inaudible 00:20:36] connectors bloom kind of thing.

Tim Berglund (20:37):

Yes. And I... Connect is such a good example of some positive dynamics and design decisions in the Kafka community. And I just... If you don't mind, I want to riff on this, because it gets me going. First of all, if you're brand new to the subject matter and you don't know what Connect is, we will link to some Connect episodes in the show notes. I believe I've had Robin on here, Robin Moffatt, talking about Connect. But briefly, it's a data integration framework. So if you could imagine you knew how to write code that reads and writes Kafka, and you need to connect it to something else outside, like you need to read from a topic and write it to Elasticsearch, it's not that hard to do. You could just bang that code out. But there's all these other problems that crop up around it. And it really needs to be its own framework, and Connect is that framework. So it's part of Kafka.

Tim Berglund (21:35):

And so, two things there. The part of Kafka, as you said, Gwen, there's all these connectors that are these little pluggable pieces of code. They're JAR files at the end of the day that anyone writes. Yes, go search for an Elasticsearch connector. It's like searching for Ruby libraries for Kafka. I mean, how many do we need? Apparently, a lot. And when-

Gwen Shapira (22:00):

[inaudible 00:22:00] zero.

Tim Berglund (22:01):

Yeah. "Always one more" is the answer. But Connect gives the opportunity for this ecosystem to emerge, which gives us a good opportunity to talk about licenses. Because Connect, like the server process, and to be fair, the file system connector always gets forgotten... Those are APL-licensed Apache open source things. Connectors are whatever. And if you go to Connect Hub... and I'll link that in the show notes, too. I always want to misspell that Connect Hub.

Gwen Shapira (22:38):

I like that.

Tim Berglund (22:40):

Yeah, me too. I think that's a positive thing. Connect Hub is good. Also, Connect Hub. You just scroll through there, and there's open source and there's community license source available and there are extremely proprietary Oracle license things. It's all over.

Gwen Shapira (22:56):

Yeah. Definitely. Yeah, and Connect does have a lot of good design in it that enables that. And I think that another thing that people miss is that it's not all connectors in the connector hub.

Tim Berglund (23:07):

They're not. I mean-

Gwen Shapira (23:09):

[crosstalk 00:23:09].

Tim Berglund (23:09):

.... that does not contain all known connectors.

Gwen Shapira (23:11):

You may get a transformation.

Tim Berglund (23:13):

Oh.

Gwen Shapira (23:14):

If you go to the Connect Hub and look, you could get...

Tim Berglund (23:17):

Yeah. You are absolutely right. Okay. That's kind of cool.

Gwen Shapira (23:22):

Yeah. I mean... People just... Connect is just crazy customizable, so it's not just you can write connectors. You can actually write your own custom simple transformations to plug in as well.

Tim Berglund (23:31):

Yes. And those are, again, pluggable little bits of JVM code. And so, Connect is, let's say, on the source side, reading from some external system and then producing into a Kafka topic. Maybe there's information in the source system for regulatory reasons or security reasons, you don't ever want it to be in a Kafka topic where you can filter it out, or maybe there's a field and a message that you want to be the key, and so you can have a little transformation that will extract that field from the message and make it your message key, or whatever. Just a little, stateless, functional transformations that... It's just a good design decision, because it's precisely the thing that you need, but limited enough that if you were tempted to get stupid and try to implement stream processing in Connect, you know...

Gwen Shapira (24:23):

Yeah. [crosstalk 00:24:25].

Tim Berglund (24:24):

It would always been painful enough that you'd be motivated to stop.

Gwen Shapira (24:27):

Yeah. I always felt like the best things that engineers can do, the most powerful part of the engineering job, is to find really, really great APIs, because this creates this huge force multiplier. And I'm pretty proud of how Connect turned out. It keeps evolving. We keep adding stuff to the APIs or keeping... We keep getting KIPs, but [crosstalk 00:24:51].

Tim Berglund (24:50):

Yes. Yes. And every major Kafka release has an interesting collection of Connect-related KIPs that have been merged into it. I talk more-

Gwen Shapira (24:58):

You talk about them on your videos.

Tim Berglund (25:00):

I do, standing in front of some river. These days it's always a river in Colorado. It's a good thing we have several.

Gwen Shapira (25:08):

Good thing you're not in California. They're drying out.

Tim Berglund (25:10):

Right. Right. No, we're doing okay here, although the wintertime releases are a bit of a struggle. You don't necessarily want to do those in the mountains. And I've got a river just 10 minutes from my house. It's the South Platte, and I did... I forget. I think maybe it was 2.4 was last January.

Gwen Shapira (25:30):

Oh, god.

Tim Berglund (25:32):

I recorded it in December, and it was fine. It was a nice day. It wasn't bad or anything. But up at 9,000 feet up in the mountains, it could be a struggle. So we'll see how that goes.

Gwen Shapira (25:42):

A good friend of mine in Colorado, I visited him in February, a few years back. And I remember it was minus... I don't know, 14 maybe? Snowstorm. Super crazy weather. We're all huddling next to the fireplace, and then he said, "It's time for barbecue." Sorry, "Time for grilling-"

Tim Berglund (26:02):

Grilling.

Gwen Shapira (26:02):

... which is different from the barbecue.

Tim Berglund (26:03):

Yes, it is.

Gwen Shapira (26:04):

And he literally opened his back door, stepped into a snowstorm on his back porch and fire up the grill.

Tim Berglund (26:10):

So all of that-

Gwen Shapira (26:14):

I've never been so impressed by someone in my life.

Tim Berglund (26:16):

To me, as a nearly entire lifetime resident of Colorado, that just sounds utterly normal, like, "Yes, you want to cookout, and..." Well, it snows sometimes, and then you do it in the snow. It's the really hardcore Colorado person who will do that in a sweatshirt and shorts. And I'm not that guy, but there's that guy who just wears shorts all winter, and I'm not-

Gwen Shapira (26:41):

I am trying to remember. There may have been shorts involved and [inaudible 00:26:46].

Tim Berglund (26:46):

It would not look weird. I'm saying I can't do it, but it's, again, extremely typical.

Gwen Shapira (26:50):

Yeah, I feel like sometimes the rest of the world really underestimates how determined Americans can be about certain things like grilling.

Tim Berglund (27:02):

Yeah, yeah. Like, "We're going to do it and the weather doesn't matter." Reminds me, Coloradans and Norwegians have a lot in common there, because in Norway, everybody is very outdoorsy. You've got the gear. You've got skills. You know how to go do things. Everybody it seems can build a fire effortlessly. And there's just all that kind of skill set, which is common here, too. And you know you're going to get cold and wet, and they're like, "Sure Yeah, it happens, but we got this."

Gwen Shapira (27:35):

[inaudible 00:27:35] people living in cold places.

Tim Berglund (27:35):

Yes. Yes. Okay. Connect. Did we... I feel like there's something else I wanted to say about Connect. Oh, yeah. Well, I mean, ecosystem, we've mentioned that, right? There's this... It's an API that naturally gives rise to an ecosystem. That's been successful. The thing I like best about it is it came along at a time in the life of Kafka where people were starting to use Kafka. It wasn't much of a platform. It was a pipe, and people's thinking about it was very “pipey” at the time. And then, the early adopters were building pluggable data integration frameworks. And if everybody is doing that non-differentiated tasks in their own way, then everybody has a buggy partial implementation of the framework component that ought to exist. And so, the community said, "Hey, let's build this once and pool effort on Connect as it should be and give an opportunity for people to pull effort on connectors as they should be," which is open source problem solving the way it emerges in a healthy way. That's exactly the kind of need that open source projects find and fix.

Gwen Shapira (28:46):

You are so right, Tim. And I want to strengthen that a bit, because there are a lot of unhappy Cloud customers. Well, not a lot, but every single unhappy Cloud customer I feel like I end up talking to. And good portion of them are people [inaudible 00:29:06] people homemade implementations of something that should have been done in Connect and would have saved them a lot of pain if it would have been Connect. And every single time I ask, "Well, why didn't you use Connect." And they say, "Oh, it was missing this small feature that was really, really important to us, and we couldn't live without it. So we had to rewrite the whole thing." Connect still takes contributions. Please consider fixing the tiny things that you really need in Connect, rather than writing your own homegrown Connect. I can make a fairly good bet that you will regret that decision somewhere down the line when it just explodes out of control. I've seen it.

Tim Berglund (29:53):

Every-

Gwen Shapira (29:54):

I don't see it in small cases, but every large use case where you talk about tens or hundreds or megabytes, maybe gigabytes, per second, you're getting to a point where it's hard to get it right on your own.

Tim Berglund (30:08):

Yeah. Yeah, it is. And I always remind people of the sage words of Admiral Akbar, "It's a trap." you think this is the time you should write your own, and you're wrong.

Gwen Shapira (30:26):

It's never the answer.

Tim Berglund (30:26):

It just isn't. And I know sending the pull request is scary if you're not involved in the project. And you think, "Oh, no. They're going to get mad because I didn't enough tests," or whatever. There's all that if you're not a habitual open source contributor. There are significant psychological barriers to sending your one PR every year or every other year because there was this thing you needed. But the Kafka community and that part of the Kafka community, who are the committers, that very elite section of the community, they're not mean.

Gwen Shapira (31:00):

And they're not that elite either. I wouldn't want to be seen that way.

Tim Berglund (31:05):

I mean just statistically you've got hundreds of thousands of people who use Kafka. And then, the exponent on the number of people who are committers is a smaller exponent. So just statistically elite, not-

Gwen Shapira (31:16):

[inaudible 00:31:16]. I just, I hate it when people treat it as a status symbol rather than a job that you do or the things that you committed to doing for the community. For me, it just take things in the wrong direction.

Tim Berglund (31:33):

It does. It's a job, and it's a role of service. I mean, that's kind of-

Gwen Shapira (31:39):

Exactly. Exactly. That's how I like thinking about it.

Tim Berglund (31:41):

Yeah. Yeah. So there aren't many of you, but you're not there lording it over-

Gwen Shapira (31:47):

[crosstalk 00:31:47]. Yeah.

Tim Berglund (31:47):

... those of us. And I include myself in this. I've never sent a commit to Kafka, and I seem to be getting along okay in life.

Gwen Shapira (31:56):

Yeah, [inaudible 00:31:57]. I'm also thinking that you interviewed tons of committers on your show here, so people should be able to get a sense that they're very much normal people.

Tim Berglund (32:04):

They are. Yeah, they are. They really are. So, send that PR. Listen to what Admiral Akbar says.

Gwen Shapira (32:13):

Yeah. Anyway, we went way into a segue.

Tim Berglund (32:16):

It was a good one though. This is Connect. Connect brings up philosophical issues, and they need to be explored. Okay. Number five. What is the fifth thing?

Gwen Shapira (32:26):

We're still slightly in Connect land, but more toward people who are actually using Connect. So if you use Connect, you may know that when you add a new connector or you change a connector config on a running system, it basically does this really annoying thing where every single connector stops, refigures out what the state of the world and starts again. And I promise that there are very good reasons that we had to do it, but it was pretty painful for users like it's... And it's also just the surprise factor. who thinks that changing on config on one connector will cause every single connector on the same connect worker to freeze off and try to figure out life from scratch?

Gwen Shapira (33:10):

So in, I think... I want to say 2.3, we introduced the Cooperative Rebalancing Protocol that allows Connect to no longer do stop the world reassignments and rebalances. So you can make a change, and it will just do the minimal change necessary, and everything else will just keep working as usual. So maybe one or two tasks will stop and move around if needed, but that's it. And everything else will keep on working. And this is just a marvel of engineering that I keep being impressed. I love just... I love just the name, doing things cooperatively.

Tim Berglund (33:52):

Yes.

Gwen Shapira (33:54):

And I love the architecture behind it. We have two blog posts explaining it. One specific for Connect and then the other for Kafka consumers and Kafka Streams. You can actually use the Cooperative Protocol on any Kafka consumer, which is even cooler. It's just that Connect has been the biggest pain point I think. Yeah, so the idea is that basically connectors and consumers can declare, "Hey, I am giving up those partitions or those tasks." And then, someone else says, "Hey I'm giving it up, and we can manage this conversation just for those specific things," without having this hard barrier of every one give everything up, and then everyone starts from scratch. I think this is really awesome, like a really, really great piece of distributed system engineering. It took I think... I don't know, three or four people to get this done across the board and really amazing. I love it. So more people should know about it.

Tim Berglund (34:55):

I do, too. People should know about that. This is the kind of thing where going forward if you've just started using Connect since that KIP was merged or since, I guess, 2.3, you're like, "Oh, this is nice> I can reconfigure, and I'm not surprised by this thing I didn't like."

Gwen Shapira (35:13):

Yeah, that's the only thing that's sometimes bittersweet about improvements. The new people who joined don't know how hard it used to be. And we used to carry connectors uphill both ways.

Tim Berglund (35:27):

Those connectors had to mill their own flour, but not anymore. You kids these days.

Gwen Shapira (35:35):

[inaudible 00:35:35] used to not cooperate with each other and now they do.

Tim Berglund (35:39):

Which is a wonderful thing.

Gwen Shapira (35:41):

[crosstalk 00:35:41].

Tim Berglund (35:41):

Okay. And I'm going to link to that KIP in the show notes because that's a good one to read about.

Gwen Shapira (35:46):

I keep remembering that you also had an episode about cooperative rebalances.

Tim Berglund (35:51):

I think we did. Yeah. Okay.

Gwen Shapira (35:55):

[crosstalk 00:35:55].

Tim Berglund (35:55):

So I... Here's a confession about myself as a podcast host. My memory for previous episodes is not good. People who know me well right now are saying, "Oh, your memory for previous episodes is the thing that's not good. Oh, I see." Well, fair enough. But I don't know, like me... Probably, I could safely say still my favorite podcaster is a guy named Russ Roberts of EconTalk. I've ruthlessly stolen his sign off. The way I sign off the show, I get from that podcast. He's been doing this for like 10-15 years. And I don't know if it's his memory or if it's in the notes or a producer helping, but somebody will drop a name... He'll be interviewing some economists, and they'll say, "Oh, and the work of this other person." He'll say, "Oh, yeah. Past EconTalk guest in 2012." I'm like, "Wow. I've done like 100 episodes now, and I forget things I've talked about." So anyway, that's just what it's like to be me. Gwen, what is the sixth-

Gwen Shapira (36:58):

It's kind of a humblebrag. Yes.

Tim Berglund (37:00):

It is. I didn't mean it as a humblebrag, but maybe it's a humblebrag.

Gwen Shapira (37:07):

Yeah. So last thing everyone needs to know... I think the last thing is a warning and a promise, a threat, and a promise, maybe. You use Kafka and your data architectures will never be the same. Kafka will absolutely influence the way you think. I think, Tim, you mentioned Kafka having an agenda of its own [crosstalk 00:37:32].

Tim Berglund (37:32):

Kafka has an agenda. I've said this. It absolutely does. You're so... Yeah, go on.

Gwen Shapira (37:36):

Yeah, it definitely... I think it's true for great tools in general that they want to be used in a specific way and you feel it when you use it. And I don't know if like, I mean... I'm into cars and cars have the way that they want to be driven, and you feel that when you drive.

Tim Berglund (37:56):

Yes.

Gwen Shapira (37:58):

So I feel the same is true for any good software and really any good tool that you bring into your life. And Kafka, I think the big thing is that Kafka is really about sharing in my mind. It will enable people in your organization to, I'd say, empower them to do things with data that they probably wouldn't have even considered before Kafka showed up, because suddenly the data is there, is available. They can innovate. And some of the design patterns in Kafka are different, and they're very natural. Kafka is events and the APIs are events. You'll write in systems that are more event-driven. You will start syncing in events. You will start siphoning off your databases and creating local application-specific views for them. It feels like... I think there is a bunch of recent talk about the data mesh patterns. I don't know if you had a data mesh pattern on your show, but that would be a good one.

Tim Berglund (39:01):

No. Ooh. Okay. Note.

Gwen Shapira (39:04):

Kafka really enables a lot of those more distributed organizational patterns. And part of it is really like Kafka turns your database inside out, right? You take all those central databases, you turn them into a stream and you allow everyone to materialize the shared source of [inaudible 00:39:20] in the way they need inside their own applications. It's hugely powerful. And I think we've both talked to enough Kafka organizations that really adopted Kafka awhile back, and we've seen how it really changed the way the organization as whole works.

Tim Berglund (39:37):

Yes. This is what happens. So if you don't want significant changes in your technological life, just avoid Kafka, because it does have an agenda for the way you think about data, the way you think about application architecture. It shows you that you're building a giant application-specific database, and it's the infrastructure that you build databases out of.

Gwen Shapira (40:03):

You say it as in a sense of nobody would ever consider not getting Kafka, because obviously people want better architectures. But I've actually seen centralized IT teams, not necessarily block off code, but try to slow down the adoption exactly because of the sense that if everyone starts using Kafka, does it mean that we lost some sense of control?

Tim Berglund (40:29):

Ah. Yeah.

Gwen Shapira (40:31):

Yeah. As I said, I don't know. I mean, I don't think it's common but is very, very traditional organizations, I've seen a bit of that dynamic of people really feeling that Kafka is a bit of strengthening change.

Tim Berglund (40:43):

Yeah. And governance is a thing. It's not like-

Gwen Shapira (40:47):

Yeah. [crosstalk 00:40:47].

Tim Berglund (40:47):

... "Oh, sure. Every byte of data is everyone's equally, comrades." That's not the world.

Gwen Shapira (40:53):

Yeah. No, not at all. Obviously, Kafka has ACLs and access control. And it's all hugely important. And it's all hugely feasible. If you use Kafka, you have all those options. But that's exactly the thing. Kafka gives you options, and it's up to you how you use it.

Tim Berglund (41:09):

Yes. But now this honestly is the thing. And I won't talk as much here as I did about Connect, but this gets me as excited as Connect doe. It just... This is the key idea, the big idea. You said-

Gwen Shapira (41:23):

I think people who get excited about it should go listen to Kafka Summit talks.

Tim Berglund (41:28):

Yeah.

Gwen Shapira (41:28):

There are a lot of stories of how it actually happened in practice, and they are incredibly exciting stories.

Tim Berglund (41:34):

Yes. You said something about how cars have a certain way they want to be driven and tools shape the way we use them. There's a great Churchill quote. I did a keynote, I don't know, 8-10 years ago, based on this. But during the Blitz in World War II, part of Parliament was destroyed and had to be rebuilt, and there was a certain way the seats were arranged inside the Parliament. And there were, in the rebuilding process, some people wanting to go for... and it was kind of like two rows of seats that faced each other. And there were people who wanted the in-the-round thing, like the French Parliament or the U.S. Congress where it's kind of circular. And so there were some architectural, philosophical discussions about how is this going to influence the kind of deliberation that happens in the room.

Tim Berglund (42:27):

And the quotable Churchill quote was, "We shape our buildings, then our buildings shape us," which is very true. Our tools shape us, too. You make a decision to adopt Kafka usually because there's a thing you're trying to get done. And then, over a year or two, the process you described takes place where you know, "Okay, everything's an event. My APIs are all events. Wait a second. Yeah, literally everything is an event. Logs of an immutable offense are much easier to share than database rows." And so, you get it into your system.

Gwen Shapira (42:59):

Yes. And I can't remember that, but there's also a front cover quote about that, "The only books that are worth reading are the ones that changes the way you think." Yeah. Sorry. It's one of those things on the tip of my tongue, but I don't actually remember it. And I feel like this is true. The only systems worth using are the systems that shape the way you think... also, programming languages.

Tim Berglund (43:24):

Very true. Unfortunately, with books, it's difficult to tell ahead of time. So you have to take the guess. You don't always know.

Gwen Shapira (43:32):

You can get recommendations from people you trust. By the way, Tim, do you have...

Tim Berglund (43:38):

Yes. Hang on. I forget the author. Googling right now to get the author's name. And he's a total guru, so it's crazy that I don't know his name. And I might even say his last name incorrectly, so by all means, tell me on Twitter if I did. But Peter Senge. It's The Fifth Discipline: The Art and Practice of the Learning Organization. It's a systems theory approach to organizational structure. Currently rocking my world.

Gwen Shapira (44:05):

Okay. I have noted that down. That's-

Tim Berglund (44:09):

May have influenced at least one slide I gave to some folks at the executive level just this week. So that's a little peek inside. It's a good book.

Gwen Shapira (44:22):

That sounds absolutely amazing. Thank you.

Tim Berglund (44:26):

How about you, Gwen? Last question. Bonus question. Any books you're reading or have read recently that you'd recommend?

Gwen Shapira (44:32):

Yes. There is a book by the founders of Pixar. It's called Creativity, Inc., and it's mind-blowing. Super honest. Super detailed. Maybe the best management book I've read, because it has really great, grounded advice on managing creative people. But the best parts were the stories about how they had to manage Steve Jobs who was like basically the owner of the company... part-owner of the company. And I don't know if it was CEO or Chairman of the Board or something like that... But basically how to create a creative organization without Steve Jobs taking all over it was just a fantastic story, which I absolutely loved.

Tim Berglund (45:21):

I need to read faster because I want to read that next, and there a few other books in the way as well. But thank you for that.

Gwen Shapira (45:26):

Yeah. And they have all those great examples like stuff from Monsters, Inc., and don't remember what it was called... The one with the fish who got lost... Finding Nemo.

Tim Berglund (45:36):

Yeah, yeah, yeah. Finding Nemo. That's a great one.

Gwen Shapira (45:37):

So the examples are just so good. And you're like, "Okay, if it helps them get Finding Nemo out the door, surely it can help us with the next release as well."

Tim Berglund (45:46):

That's right. My guest today, I'm very happy to say, has been Gwen Shapira. Gwen, thanks for being a part of Streaming Audio.

Gwen Shapira (45:54):

Thank you, Tim.

Tim Berglund (45:55):

Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 21st, 2021, and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit, and there are a limited number of codes available, so don't miss out.

Tim Berglund (46:23):

Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or, you can leave a comment on a YouTube video or reach out to our community Slack. There's a Slack sign up link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. Thanks for your support, and we'll see you next time.

This year, Confluent turns six! In honor of this milestone, we are taking a very special moment to celebrate with Gwen Shapira by highlighting the top six things everyone should know about Apache Kafka®:

Clients have metrics
Bug fix releases/Kafka Improvement Proposals (KIPs)
Idempotent producers and how they work
Kafka Connect is part of Kafka and Single Message Transforms (SMTs) are worth not missing out on
Cooperative rebalancing
Generating sequence numbers and how Kafka changes the way you think

Listen as Tim and Gwen talk through the importance of Kafka Connect, cooperative rebalancing protocols, and the promise (and warning) that your data architecture will never be the same. As Gwen puts it, “Kafka gives you the options, but it's up to you how you use it.”

EPISODE LINKS

Continue Listening

Episode 119September 21, 2020 | 49 min

Using Apache Kafka as the Event-Driven System for 1,500 Microservices at Wix ft. Natan Silnitsky

At Wix, a team of 900 developers are using Apache Kafka to maintain 1,500 microservices! Tim Berglund sits down with Natan Silnitsky to talk all about how Wix benefits from using an event streaming platform.

Listen Now

Episode 120September 30, 2020 | 56 min

Using Event Modeling to Architect Event-Driven Information Systems ft. Bobby Calderwood

Bobby Calderwood, the founder of Evident Systems, discusses event streaming, event modeling, and event-driven architecture.

Listen Now

Episode 121October 1, 2020 | 14 min

Confluent Platform 6.0 | What's New in This Release + Updates

The release of Confluent Platform 6.0, based on Apache Kafka 2.6, introduces Tiered Storage, Self-Balancing Clusters, ksqlDB 0.10, Admin REST APIs, and Cluster Linking in preview.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog