Get Started Free
December 2, 2020 | Episode 131

Tales from the Frontline of Apache Kafka DevOps ft. Jason Bell

  • Transcript
  • Notes

Tim Berglund:

Jason Jase Bell is not just a Yorkshire man, not just a stalwart of the Kafka Community, but he's also a guy who actually operates Kafka clusters on behalf of customers for a living. He has to know how it's done. We talk about how he got into this role, what the challenges of Kafka DevOps are, and what he wishes all Kafka developers knew on this episode of Streaming Audio, a podcast about Kafka, Confluent, and the Cloud. Hello and welcome to another episode of Streaming Audio. I continue to be your host, Tim Berglund, I have not changed my name. I am as usual excited to be joined in the virtual studio by a new guest. Today it is Jason Jase Bell, Kafka DevOps engineer at consulting firm Digitalis. Jase, welcome to the show.

Jason Bell:

Thank you for having me.

Tim Berglund:

I am very glad to have you on. I want to talk about really your job. That's kind of the thing.

Jason Bell:

Okay.

Tim Berglund:

But let's… Tell us the story, how do you get to be a Kafka DevOps engineer? What's your… Give us your career story. How'd you get to where you are?

Jason Bell:

Okay. Well, my career is 32 years and we've only got 45 minutes so-

Tim Berglund:

Summarize.

Jason Bell:

I'll condense it down a bit. Yeah. Actually, I started off as a hardware engineer back in 1988 when I was 16. During the course, it turned out that everyone wants to be a software developer instead so I learned C at that stage. Then over a period of time, I've been involved in software development supply chain, customer loyalty. Did a lot of data mining in the early 2000s so things like the Target baby story and the ways that the Tesco Clubcard works, which is a loyalty card in the UK for anyone not in the UK, how those kinds of data mining things work fascinate me and they still do.

Tim Berglund:

The Target baby story being the young woman who started receiving maternity promotions at home-

Jason Bell:

Yes, she was-

Tim Berglund:

Before family knew was-

Jason Bell:

Before her father found out, yeah. Her father was less than amused and went to the store and complained of why his daughter was getting these baby club offers and then he went back about two or three weeks later apologizing because he'd been given some new information.

Tim Berglund:

Right. [crosstalk 00:02:38] it was in fact the game that he should have been hating and not the player in this case.

Jason Bell:

Yeah. Those kinds of things fascinate me. Around, sort of 2009, 2010, I started getting into the big data side of things so I was doing a lot of the do work. Then that led me into more streaming things when Spark came out, the Spark Streaming API and those kinds of things. I was then, eventually led me to RabbitMQ. I made the confession in the Cleveland meet-up last night that the reason that I used RabbitMQ is because I didn't like having a ZooKeeper dependency on Kafka, but I'm sure we'll talk about that later. Then I moved on to Kafka.

Jason Bell:

I moved on to Kafka about 2014 and I worked with a company called Mastodon C in London and I looked after a couple of Kafka clusters for them and that's how I really got into it. I came in fairly, I knew bits and pieces about streaming data, those kinds of things, but in terms of actually looking after a Kafka cluster, I've never really done that before. You know how it's 10 people sort of fall into these things by learning while doing. It's like, “We need to look after this, here you are,” and then you just go off and start learning about it and how it all works. That cluster was really interesting, so it was for flight search data.

Jason Bell:

That's as much as I can say about it. It was an awful lot of data that's being passed through, so it was really interesting because we ended up having to rebuild that cluster and we started using marathon [Mythos 00:04:34] for deployment which comes with its own set of problems. I was there for three and a half years and then I moved to Digitalis last year, last June and I've been looking after Kafka clusters for them since. Yeah, I absolutely love working with this tech so it's kind of what I'm known for now, is working with Kafka and Hadoop. Mainly Kafka, not so much Hadoop, yeah, but it's kind of what I'm known for.

Tim Berglund:

Yeah, which is not a bad thing to be known for. Interesting, in the early part of your story, you and I have some overlap. I-

Jason Bell:

Cool.

Tim Berglund:

Had aspired to be a hardware designer. That's really where I think I was headed or where I was trying to head and ended up getting a firmware job as a sophomore in college. It wasn't like a formal internship program, it was just a startup in 1992 looking for cheap labor. They found some and there I was and I learned this craft.

Jason Bell:

Cheap labor. It's great, isn't it?

Tim Berglund:

Isn't it? Yeah. [inaudible 00:05:40] I didn't benefit from that arrangement.

Jason Bell:

Absolutely, [crosstalk 00:05:45], used to do a lot of stuff on single board computers and you would just, you were basically flipping switches and pressing a button to commit the binary number. [crosstalk 00:05:54] after 10 minutes, yeah, you hope that the LEDs did actually flash at the end because if they didn't, you'd done it wrong.

Tim Berglund:

Right.

Jason Bell:

Yeah, it was interesting. [inaudible 00:06:09]. You know that now and it's interesting how things have come full circle because we've now got Arduino, Unix and we've got the Raspberry Pi and those kinds of things and they're absolutely fantastic to learn. I think there are some good Kafka posts on running Raspberry Pi clusters on Kubernetes and that kind of thing, which I've not really had time to look at but it's something I want to get into. Yeah, [inaudible 00:06:33].

Tim Berglund:

Yeah. There was a Kickstarter, some outfit doing a little eight module Raspberry Pi-

Jason Bell:

Yeah. [crosstalk 00:06:41]. I saw a pre-order for it and I can't remember what it was called. I was looking for it the other day.

Tim Berglund:

Yeah. If I find it, we'll put a link in the show notes.

Jason Bell:

Yeah [inaudible 00:06:50].

Tim Berglund:

But I looked at it, it was one of those aspirational Kickstarters that like, “I want to be the kind of person who does this so I'll pay for this thing.” But I knew like, “I'm not going to make time for that,” so I didn't do it but-

Jason Bell:

Tell me about it. Between Moleskine Notebooks and bits of Arduino Hardware, Raspberry Pi Hardware, it's like, “I'll do something with that,” and then never do. I've got skim card readers and barcode readers from a previous customer, loyalty startups that I've been working on, and yeah, this, there's all sorts of stuff kicking around this but never get-

Tim Berglund:

[crosstalk 00:07:31] for me it was FPGA Kickstarter Projects. I do FPGA development on this easy board, but yeah, I want to do that but I stopped after like three of them. Like, “Okay, I'm actually probably not going to make time for this. I understand that I aspire to be that person but I'm not so it’s okay.”

Jason Bell:

I think the last Kickstarter thing I contributed to was, there was a champ that was making potato salad-

Tim Berglund:

[crosstalk 00:07:58] right. It was just that, right? It was just, “I'm going to-”

Jason Bell:

[crosstalk 00:08:00] just potato salad. I want feedback on potato salad that I made and it's like, “Well, I didn't receive any, but I do want to support,” so I was, “Yeah, it's great-”

Tim Berglund:

[crosstalk 00:08:10] I remember seeing that. I didn't support it because I hate potato salad so it just didn't make any sense but-

Jason Bell:

I’m not a huge fan in all fairness but it was just the whole, the audacity of doing [crosstalk 00:08:22] potato so I absolutely love stupid stuff like that.

Tim Berglund:

Right. It was more the aesthetic of it than the actual-

Jason Bell:

Absolutely. Yeah.

Tim Berglund:

Experience of putting potato salad in your mouth and-

Jason Bell:

[crosstalk 00:08:35] wasn't interested in eating the end product. [crosstalk 00:08:37], if you want $5 to make a potato salad then-

Tim Berglund:

I'm going to get tweets about this by the way. I know I'm going to get a tweet about my hatred of potato salad and I-

Jason Bell:

[crosstalk 00:08:46]. I know Twitter has its moments but that's a bit strong, Tim.

Tim Berglund:

Yeah. I think I will. [crosstalk 00:08:52] find out-

Jason Bell:

You think so?

Tim Berglund:

I do. In fact, I'm going to call them out. I think Neil [Busing 00:08:58] is going to tweet about this. Neil-

Jason Bell:

No, come on. No-

Tim Berglund:

Neil-

Jason Bell:

Neil won't do that because-

Tim Berglund:

Yes, he will.

Jason Bell:

[crosstalk 00:09:05], he will listen to me before he listens [inaudible 00:09:08]-

Tim Berglund:

Accept the challenge Neil.

Jason Bell:

That was good to see Neil last night in the Cleveland meet-up [inaudible 00:09:15]-

Tim Berglund:

Yes. Good. Excellent. Solid guy. Anytime time I can spend time with that guy is a good thing.

Jason Bell:

The one nice thing about the community on the Confluent Community side of things or the meet-ups, especially this year with COVID and everything, is everyone's so supportive. I've met a bunch of people now, that I wouldn't have got to have met otherwise with the exception of Twitter. Yeah, so it's been really good this year because I've met Dave Klein and I met Shay, Anna McDonald, Mitch Henderson, and Neil, and a bunch of other folks, did brilliantly this year for that.

Tim Berglund:

Yes. Dave Klein, who at the time of this recording, this is not announced and not a thing, but at, by the time this airs, Dave Klein will be a developer advocate here at Confluent. That's a happy thing [crosstalk 00:10:00]. Yes.

Jason Bell:

Brilliant. That's great news.

Tim Berglund:

Yes. It'll be old news on Twitter by the time this airs but I just want to say the words.

Jason Bell:

Yeah. But how much do I have to bribe him to be one of the Confluent Community champions then?

Tim Berglund:

Yes. Well, that's all backchannel. We can't talk about that on the air.

Jason Bell:

[crosstalk 00:10:18]. Okay. Or we can have a conversation later-

Tim Berglund:

[crosstalk 00:10:22] clear nominee for the 2021 class.

Jason Bell:

Well, that you're very [inaudible 00:10:28], I'm not too sure of that to be honest, but anyway.

Tim Berglund:

Well, 2014, was it makes you a fairly early adopter for Kafka-

Jason Bell:

It does, yeah.

Tim Berglund:

Yeah. It was not what it is now then. I was certainly aware of it.

Jason Bell:

No. That was something I was talking about last night oddly enough. Yeah, I was talking about really sad. These are the things that keep me going. I was talking about the performance test tools for the producer and the consumer in the command line and how they were great for performance testing on the producer to a broker to broker to the consumer but with all the tools that we now have, whether that's KSQL, Kafka Connect, skipping, registering, the bits and pieces like that. Doing end to end testing with those tools is actually really difficult.

Tim Berglund:

Okay. Well, talk to me, about what you've tried to do and how it's been hard.

Jason Bell:

Well, when I'm-

Tim Berglund:

Give me an example.

Jason Bell:

When I'm testing Kafka Connect, what I actually do is just look at the logs and panic.

Tim Berglund:

That is very normal.

Jason Bell:

It’s just like, “Help.” Yeah, so [inaudible 00:11:39]. Seriously, monitoring wise, yeah, we tend to use Prometheus for monitoring so I have access to pretty much every metric that's coming out. We'll do an awful lot of functional testing. The nice thing, even though I do DevOps, I'm still a developer at heart as well.

Tim Berglund:

Right.

Jason Bell:

It doesn't stop me from writing a basic consumer or producer with an Avro payload, for example, and testing it that way. Some of the clients that we have are not necessarily involved in any of the development so I don't know what Dev teams are doing, I'm mainly responsible for the cluster itself. Anything that goes in from a consumer… Sorry, anything goes in from a producer, sorry, and goes out to a consumer is none of my business because those messages are sensitive, private information which is good.

Tim Berglund:

Got it.

Jason Bell:

I don't need to know.

Tim Berglund:

Got it.

Jason Bell:

I don't need to know. I don't want to know and I've signed NDAs to say that I won't look so I won’t. Yeah, so when it comes to testing like that, it tends to be a team sport. But the interesting thing is over the last, especially the last couple of weeks when I said I'd do this talk on capacity planning, really went down into the internals of disc, a network for your pro into a level that I've not really considered before. It was actually a really interesting learning exercise.

Jason Bell:

Testing has been interesting in all honesty. I think it's something I'm still learning as I go. Some things are working, some things aren't. A lot of it I do in my own time as well because I'm still a serial tinkerer of Kafka outside of work hours to see how these things work and how we can improve things. I've never committed anything though. A bunch of it was a shame. I will do one day I'm sure.

Tim Berglund:

It’s okay. There are other ways to be a part of the community. You can-

Jason Bell:

[crosstalk 00:13:54] [inaudible 00:23:55] I think and I was becoming-

Tim Berglund:

That’s been my strategy for some years now [inaudible 13:13:59] be working out.

Jason Bell:

If it's working for you, I'm sure it will work for me as well.

Tim Berglund:

Yeah.

Jason Bell:

Yeah. We went down this interesting rabbit hole yesterday, what I was really getting at is the days of saying, “Can you create me a topic?” Are kind of gone because it's now create a topic, “I'm using Kafka Connect.” “Okay. Have you got DLQs?” Then DLQs [inaudible 00:14:25] more topics which have got more partitions and replication factor so therefore there's an impact there. “I've got 5 KSQL jobs as well.” “Right. Okay.” That's another four times five partitions plus replication. It starts-

Tim Berglund:

[crosstalk 00:14:40] topics associated with those queries and-

Jason Bell:

Absolutely, yeah. Then obviously, things like aggregations have network capacity considerations to take into account if you're taking huge volumes of data through to do aggregate accounts and those kinds of things. Yeah, it's actually been really interesting. But like you say, in 2014, the Streaming API didn't exist then and I caught the first wave for the Streaming API which was really interesting for me because I was writing consumers to save data on to things like Amazon S3 because Kafka Connect didn't exist at that time.

Jason Bell:

Then the Streaming API came out and actually I started writing Streaming API jobs to do those kinds of data [inaudible 00:15:31] onto S3. When did I first really see the power of Streaming API? I'll tell you where it was. It was at Strata London. I think it's 2015 in the end. Michael Noll was doing an intro talk to the Streaming API and then the year after he did KSQL. It was like, “Right, Michael, you've just written off 80% of my Streaming API jobs with that.” Yeah. Michael's really good. I like Michael.

Tim Berglund:

Very much.

Jason Bell:

Yeah. It's an interesting ecosystem, isn't it?

Tim Berglund:

Yeah.

Jason Bell:

I have actually spent most of this year just trying to keep up.

Tim Berglund:

Well, it's my job.

Jason Bell:

It's your job, yeah. You get paid for that. Yeah.

Tim Berglund:

To keep up and it is a lot of work-

Jason Bell:

[crosstalk 00:16:23] I get paid for that too.

Tim Berglund:

You do, don’t you? Right [crosstalk 00:16:27]. [inaudible 00:16:29] great things about you, being a Kafka DevOps and be just, for one thing, doing this podcast, I get to keep up with all kinds of things. It's good.

Jason Bell:

Yeah. Why are you talking to me?

Tim Berglund:

Why am I talking to you?

Jason Bell:

Yeah.

Tim Berglund:

Yeah. Actually, Jason, I'm asking the questions here. This is my interview but [inaudible 00:16:48]. [crosstalk 00:16:51] [inaudible 00:16:53].

Jason Bell:

[crosstalk 00:16:53] I've told you this-

Tim Berglund:

[crosstalk 00:16:53]-

Jason Bell:

I told you this wouldn't be serious. I warned you in advance.

Tim Berglund:

You 100% warned me and here we are and it's delightful so-

Jason Bell:

Are you regretting-

Tim Berglund:

No, because-

Jason Bell:

[crosstalk 00:17:02] you've got half an hour to go yet.

Tim Berglund:

That's true. No, you're a guy who runs Kafka in production so yeah. We were talking about the command line data gen tools-

Jason Bell:

Yeah, that goes with my age. The command line is my place.

Tim Berglund:

Absolutely. Yeah. That’s so very yeah, it's funny. Okay. Talk about age. To talk about my adult son who was asking me a tech question. He normally does web game development on a Mac and he was having to do something on windows and he's like, “Why is there no Homebrew on windows? I can't install Git. I'm having trouble installing Git on the command line. Do people just use the GUI?” “Yeah. They use the GUI. It's okay. You're still a good developer, it's just how it works on windows and there are good reasons for that.” But apparently, I've passed on that love of the command line to the next generation which is good.

Jason Bell:

Cool. Yeah. He has a very valid point though, no Homebrew on windows. Is-

Tim Berglund:

The homebrew on the window, there's no package manager on windows. I think the thing and this windows developers, I encourage you to tweet correction at me here or just more information, but I know the command line experience has gotten better recently. It's just that for literally decades, it was kind of a children's toy. Over in the olden days, Unix world and later in Linux, we had these sophisticated programming environments that we called a terminal and command prompt wasn't that and so the tools ecosystem that evolved given that constraint was a lot of GUI things because that's how you got work done there. That's not the case anymore, but it's still sort of that tradition carries over.

Jason Bell:

Yeah, it does. [crosstalk 00:18:58] absolutely. I remember the old Solaris boxes. They were great.

Tim Berglund:

Yes. Yeah, that was actually my first job. That firmware job, there was a network management system being developed on Solaris and I wrote some… What did they call it? It was a Sun API that just wrapped the Socket API, TLI maybe. Yeah. Everything was just like a direct analog to the Socket API. I was like, “Why are we doing this?” But yeah, that was my first Unix code. Anyway, Kafka DevOps, what kinds of things come up in your daily? You’re operating on-prem clusters or clusters on self-managed cloud instances, [crosstalk 00:19:46] about that?

Jason Bell:

All on-prem.

Tim Berglund:

All on-prem. Okay.

Jason Bell:

Yeah, I'm a [inaudible 00:19:54]-

Tim Berglund:

Nice. [crosstalk 00:19:57] data centers, blinky lights-

Jason Bell:

I don't see them. I assume that other people do.

Tim Berglund:

Somebody does.

Jason Bell:

I work from home. I have worked from home for nine years so COVID didn't really affect me.

Tim Berglund:

[crosstalk 00:20:10].

Jason Bell:

I feel for anyone that it has affected.

Tim Berglund:

Right.

Jason Bell:

[crosstalk 00:20:14] well, a few of friends might’ve gone, “I don’t know how you do this in this life of-”

Tim Berglund:

[crosstalk 00:20:20], for a while.

Jason Bell:

As long as they were the catalyst and I can make a cup of tea then all is good with, well.

Tim Berglund:

Everything is fine.

Jason Bell:

Exactly. Yeah, the on-prem stuff, the medium-sized clusters that I work with, I wouldn't, I can't go into numbers or I'd be uncomfortable with going into numbers. But the fairly hefty throughputs, things, the usual monitoring stuff, we have alerts coming out of our ears if we need to. For an instant, replica's going down, yeah, re-partitioned jobs that are taking too long, all sorts of stuff. But on a day-to-day basis, as I said previously because I don't get involved with the coding of the applications, it's up to those teams to tell me what they want. In test environments, they have a run of the show so they can do whatever they wish.

Jason Bell:

If they’ve needed to create a topic, they'll create a topic. Usually, when they're in the application for the first time and it automatically creates a topic, and then I get a little bit upset because I just see all these topics that have only got one partition and one replica. But you know what, it's not their fault so that's fine. Then Schema Registry through the rest API, those kinds of things. But when you start getting onto pre-production and production things, then it becomes requests and justifications for why they need those certain things, and then it's up to me to create those topics, apply those ACL's and sort the Schema Registry out for them.

Jason Bell:

Those kinds of operations. Install Kafka Connect connectors. Me and Kafka Connect have become quite good friends over the last 12 months. Yeah, so that's kind of the day-to-day. What I've also done and made a very conscious effort for in less or six months is with our clients being very involved with their internal development communities as well. I might do a presentation or two about the ecosystem or about the Streaming API and KSQL or yeah. Because developers have questions and they don't necessarily know how Kafka works but they're developing applications for it. Do you see what I mean?

Tim Berglund:

That is a chilling statement, but it's true. Yeah. They're just doing their job. They're trying to understand the domain and everything else about their platform and frameworks and everything and here's this new data infrastructure that is a little weird and shouldn't it just be in… Why can't it just have the courtesy of being a database?

Jason Bell:

Yeah, and it's actually really interesting because when you surround yourself in an ecosystem like Kafka with a very tight community like the Confluent Kafka Community is, and the wider Apache Kafka Community, don't get me wrong, the assumption is we all know how it works. Partitions, replication, factor, consumer groups, all those kinds of things. It was impressed on me this year, especially when talking to the developer communities within some of our clients who, I'm not meaning this in a horrible way, who just didn't, they don't understand the sort of core concepts because they don't need to.

Jason Bell:

In reality, it's like, “Well, we needed to write a producer that sends this message and bundles it as an Avro and sends it to the producer.” As long as they've got their broker addresses and ports correct in their properties after that, it doesn't really matter. Now, I have made it my mission this year to just get people to be more serious about things like retries because one of the sorts of questions I get is, “I sent a message and I got an exception back because it didn't send.”

Jason Bell:

It's like, “Okay, start looking into it.” Am I only going to start looking through the logs and yeah, I'm quite good at reading Kafka server logs now, I can usually correlate a rebalance somewhere just at the time that they were sending that message that you got? Okay. We never even have to start talking about retry policies and those kinds of things and what is your acks set to? Is it all zero or one? “Well, it's at least set to zero.” I said, “Well, that's fine,” and forget.

Tim Berglund:

Yes.

Jason Bell:

What happens to your message? It could be anyone's guess. I then start providing the producer intersect to classes so a little sort of rap demos of those to say, “Okay, so when your message sends, then there's this listener that will say, “Here you go. The message was sent, okay,” and then yeah. It's all those kinds of little educating things which makes everybody's life easier, it also helps increase the adoption going forward as well which is actually really important because you always come against the push and pull. I've heard so many times people say, “But Kafka isn't a performant.” It's like, “I beg your pardon. Where did you hear that from?”

Jason Bell:

Then, “I heard from…” It becomes this whole whisper circle of I heard it from so-and-so at the water cooler who heard it from someone else at the coffee machine who heard it on the radio somewhere. Yeah. It's that kind of thing. I'm fairly sure this happens in a lot of places. I don't think it's just me in isolation but I always felt that my job was to encourage and educate those kinds of things. I've sat in meetings with 35 people where 10 of them are telling me that Kafka is not performant and I said, “Well, I know that there's a telecoms company that are processing 3 million messages a second.”

Tim Berglund:

Right.

Jason Bell:

[inaudible 00:26:56], “Really?” Like, “Well, yeah. They do it like this.” “Right. Okay.” I said, “There's an ad auction startup that does ad auctions via Kafka through 10s of 1000s of nodes and their response times are less than like eight or nine milliseconds so I don't know how you can sit here and tell me it's not performant.”

Tim Berglund:

Does it take you a great deal of tweaking and customizing to get to that point or is there a fairly sensible set of defaults? You were talking about producer side configuration as a knowledge gap for developers that teams can get wrong, what's it look like in the cluster or on the clients to get to that?

Jason Bell:

There is some tweaking but is, it's usually around retention policy because you default seven days, 168 hours. We normally double that to 14 days, it's fine and fair enough. But the actual throughputs… Yeah. The more tweaking is actually for things like replicating Connect, especially replicated because obviously what you may be working across a wide area network so your latency times increase. It's things like that more than anything.

Tim Berglund:

Got you. Okay. That does look a little custom and it takes someone with some knowledge of the internals and how to monitor things and how to take measurements. It takes a Jase.

Jason Bell:

Well, as I said in my talk last night, I'm referring to my talk last night, it will be available on the Confluent Community website.

Tim Berglund:

Okay. [inaudible 00:28:40] link in the show notes.

Jason Bell:

Now, I can tweet loss of my train of thought because I'm being silly. I'm sorry.

Tim Berglund:

[crosstalk 00:28:47] we're talking about-

Jason Bell:

[crosstalk 00:28:50] I going to say?

Tim Berglund:

Tweaking of things and-

Jason Bell:

Tweaking of things.

Tim Berglund:

Requiring you.

Jason Bell:

It’s gone. I'm sorry.

Tim Berglund:

Well, next question. It's okay.

Jason Bell:

[crosstalk 00:28:59] come back to me-

Tim Berglund:

[crosstalk 00:29:01] it’ll come back to you and I'm not done, I have more. I'm a little shaken by you saying that those simple producer settings are a common problem and that's motivating to me because I think, I don't feel like my team addresses that publicly enough. We sort of having this attitude, Yeah. Those are those settings. Of course, you know those. Let's get on to the meaty stuff.” But maybe not really.

Jason Bell:

No. I see where you're coming from and I can understand when you're selling a, an ecosystem and a platform, that's your way of thinking. But when it goes to the outside world, there may be… Alright, let's take a cluster, for example, that might say, “Okay, we have producers that are taking information in from…” Could be point of sale. Okay, “It's using that producer to send the transaction event to the brokers.” Okay. That's the simple part. That's easy. “I want KSQL to do this.” Okay. That's the KSQL, but what we're not talking about is how do we get the POS system to talk to the producer? How's that working? How's that monitored?

Jason Bell:

There's one team on that side of things and then on the other side of the spectrum, especially with things like Kafka Connect, I said this last night, in one of my slides, the latency, that your biggest problem of latency is the thing that Kafka Connect syncs to which is either a database or it could be ElasticSearch or there's something in that connection that can go wrong. If that connection is slow, then everything else is slow. You can have the fastest broker cluster set up in the world, but if everything sluggish between the Kafka Connect and say an Oracle Database or MySQL or ElasticSearch, doesn't matter what, then that's where the actual problem is.

Jason Bell:

It kinds of goes out of scope a little bit. But there's all… There might be a database team involved. There might be a data science team involved who are writing KSQL jobs but that's, there's zero focus on just writing those jobs for that function. Then there might be developers that are writing producers and consumers, but not necessarily knowing in the internals of Kafka like where I just touched on. I don't…

Jason Bell:

Like I said, with the community aspect, we all think that everyone operates in this bubble, that we all know the ecosystem and we all know how Kafka works. I now maintain that developers don't work like that. They don't necessarily know about the internals, but they know how to create a producer, they know how to create a consumer or they know how to create a Streaming API job, for example, and deploy them.

Tim Berglund:

Now, you gave some examples of some, I'll use the word ignorance, that's in-

Jason Bell:

[crosstalk 00:32:13] ignorance. I like to be the stupidest person in the room so I learned-

Tim Berglund:

It’s well stupid different, is different. The ignorance in the nonjudgmental sense of I haven't been told yet but there's ignorance of what happens when you set acks=0. You talked about that basic producer behavior. What do developers need in your experience? You're operating the cluster, you're taking this I, not only will I, will I not look at the messages but I may not look at the messages, but you're also being this educator popularizer evangelist, what do you think are, what's the core set of Kafka knowledge that a developer should have? Jason's doing my 2021 planning for me right here folks if you couldn't tell-

Jason Bell:

Yeah-

Tim Berglund:

[crosstalk 00:33:04] directly extracting this knowledge from him so I know what kind of materials to build. But please Mr. Bell, go on.

Jason Bell:

Mr. Bell always got fumble all of a sudden.

Tim Berglund:

Yes, yes, yes.

Jason Bell:

It's now contractual. That’s a really good question. From a developer's perspective, developer’s overview of what Kafka does and I know that there are tons and tons of videos, blog posts, meet-up sessions, and those kinds of things that already do a very good job of doing that. But the whole crux for a developer and it's usually time constraint is like, “We were going to adopt Kafka, we need to write a producer, we need to write a consumer, go.” It's like that. What we do as developers are we go hunting.

Jason Bell:

We go to GitHub and we pull down the Confluent examples or the Apache examples for example. Yeah. Most of my learning started from the getting started page. Yeah. It wasn't really concerned about partition counts, how replication worked, it was really a case of I've got, I have a job to do here so I need to write this code. What I've done with our customers is I have a lot of Google Documentation with overviews of Kafka for developers. It's not written specifically for developers, but I do get asked, “Could you just put this document together that just sort of says what it is and what it does?” It's like, “Yeah, okay.”

Jason Bell:

Then we'll take it to the next stage where it then becomes kind of internalized in the Wiki and then becomes what we're trying to get to as a place of best practice. The thing with the retries is a good example of that. It's like, “Okay, so I want every…” Yeah. Because also, with a big organization, you might have 10 different teams working with Kafka that don't necessarily know how Kafka works and they've all got their own individual use cases. Excuse me, and they may work in different ways.

Jason Bell:

But if you can give them a community of practice guidelines that say, “Okay, this is how we're going to base our producer. It's going to have acks=0. We need to know that the message has been written to the leader and all the followers and going to implement the producer interceptor to know and register that the message has been successfully written. Or if there's an issue, then what happens with that message?” It's those kinds of things. Do I make-

Tim Berglund:

What do you do, what do you think about larger organizations who wrap APIs? Like so instead of directly using the consumer API, use the big corp consumer or a big corp producer. Is that a thing that you see? Do you love it? Do you hate it?

Jason Bell:

Fortunately, I don't have to develop and explain myself so I'm quite happy with that. But some industries will be quite happy to wrap it-

Tim Berglund:

[crosstalk 00:36:20] folks that was shade. I don't know if you caught it, but that was shade.

Jason Bell:

You'll find a lot of things written in Spring Boot, which is fine. It's years since I've done any proper job development in Spring so it's a long time ago now, seven years ago. They tend to work off frameworks and they… Actually, it's an interesting overview of how certain developers work within certain organizations. If I went into… Well, I actually give you an example. There’s a company in Belfast, I won't name who it was, who invited me down years ago now and the CTO and I are friends. Anyway, I talk at his conferences. I usually cost triple his conferences by talking.

Jason Bell:

I do this all on purpose. This is why I get asked to do these things. But we will… He said, “Come down, I'll show you around. I'd like to have a chat with you and see if there's a way that we can get you here.” I said, “Okay.” It's a big organization, it's IPO, it's big. Within 10 minutes, we both looked at each other in the room and go, “I'd be too much of a maverick here. I wouldn't work. You'd hate me. You'd all hate me. This isn't how I do software.”

Tim Berglund:

[crosstalk 00:37:46] would end badly.

Jason Bell:

It won’t. We both looked at each other and he agreed with me and I said, “Look, I'd get, I don't have to get bought here or I would probably drive everyone at the wall because there are certain things that I would expect code to do that may not necessarily fit with where you're at.” Those kinds of things. Developer to organization fit is important to start off with. See, certain organizations may not necessarily be looking for a software developer, they might be looking at someone who is absolutely brilliant at Spring, and that is fine.

Jason Bell:

There are people out there who know spring far better than I do and they will fit and they will do a fantastic job and more power to them. Whereas, they're also the sort of maverick startup-y type developers as well. It takes all sorts to make the world go round. It's not a criticism, I'm actually all for it but they wouldn't fit in a big organization and those kinds of folks tend to know straight away, “I wouldn't fit there or be able to do that.” It's interesting from the point of view when it comes to learning because it's very much a case of, “We need to do this in a big organization, go write it. We’re doing it in Spring Boot. This is how you do it. The docs are there.”

Jason Bell:

Then they go off, they write the code, it goes to a QA and all the rest of it and it's perfect code-wise, but they still don't know how it's really going to behave on a cluster until it's published and deployed on the cluster and they get going with it. Now, as I said, I don't see half of this stuff. All I see are the messages coming through on monitoring statistics and then I'll get an email that says, “We're doing this and we're thinking of doing this and what do you think of this?” Instead of saying, “It's not my job,” what I say is, “Okay, let's have a conversation about it,” and that's what we ended up doing. That's fine. It's everyone's got a different way of doing things

Tim Berglund:

They do. It's good to take stock of yourself and your way of doing things and a larger, perhaps more staid organization and weather [crosstalk 00:40:16] fit.

Jason Bell:

I have been really blessed to work with some great SMEs, some great larger companies and then also have service companies who have large clients. I've seen a fairly broad range of different people, different use cases, different ways of working on the opposite side of that. I've also seen different politics which happens wherever you are.

Tim Berglund:

It’s a thing.

Jason Bell:

It’s a thing. How people react in meetings and how people get to a point of an agreement and how things are discussed and how do we move forward with this and yeah. It's all interesting as part of life's rich tapestry.

Tim Berglund:

Indeed. Another thing that's part of life's rich tapestry, segue coming up here is you've got a book.

Jason Bell:

That was pretty neat written.

Tim Berglund:

Yeah, that wasn't a, it wasn't bad so I could just let it be a segue but I felt like I had to kind of hang a lantern on it. Like, “Wow, this is such a segue. Let's just call it out.” You've got a book, not even on Kafka but on machine learning. Tell us-

Jason Bell:

[crosstalk 00:41:24] I know. I was approached by another author who'd been referred to me by a good friend of mine, Matt Johnson. Basically, I said, “Yeah, James can string a sentence together.” We started chatting about doing co-authoring in a machine learning book and this chap had already made inroads with Wiley to publish it and that was fine. About a month went by of back and forth and the publisher was all along for it and we needed to do some polishing on the proposal, and then, unfortunately, there was something in a contract that said that he was not allowed to work on that-

Tim Berglund:

[crosstalk 00:42:11]-

Jason Bell:

Yeah. These things happen-

Tim Berglund:

[crosstalk 00:42:16] good to do to your employees to encourage them not to contribute to the state-of-the-art or teaching other people or creating educational artifacts. I recommend that employers. Anyway, Jason, sorry, go on. Your coauthor unfortunately-

Jason Bell:

Unfortunately, my co-author had to pull out the project, the commissioning editor came back to me and she said, “What do you want to do? Do you want to find someone else to help you?” The great thing about being completely blind to writing a book for the first time is you go, “Yeah, I’ll be fine. Can’t be that difficult.”

Tim Berglund:

It's easy.

Jason Bell:

It's easy. Yeah. It's not, it's really hard. Yeah. But I said, “What I'd like to do is actually take the proposal and redo it because there are dozens and dozens of books on machine learning that are all theoretical.” The proposal that we had was great. There, do not get me wrong, it was a fantastic proposal. But I said, “I want to write something a bit more developer-focused,” and it's not so much a case of how we learn how these algorithms work.

Jason Bell:

I said, “That's been done dozens and dozens of times. The real question is how would we deploy these things? How do we write code around it and how does a software developer who wants to learn about machine learning actually start learning about it?” That was back in 2014. I wrote that over seven months and it got published in October, 2014. I-

Tim Berglund:

It only took you seven months?

Jason Bell:

Yes. There was a life event that kind of accelerated things a bit. I had to have an operation on my neck to have a disc removed because it was [inaudible 00:44:06] spinal cord.

Tim Berglund:

Yeah.

Jason Bell:

Yeah, it’s just one of the obvious ones. I became very focused on getting as much of the first draft done as I could not knowing what was going to happen post-op because one of the downsides of the operation if anything happened was either I was going to die or I was going to be paralyzed from the neck down. No, but everything is okay, so everything was fine and I finished the book. No, the consultant surgeon is a fantastic man. Yeah, so I finished that book in October, 2014. That was all good, that was all very interesting.

Jason Bell:

It was an interesting time because it was just ahead of the curve for the media explosion of AI and machine learning and all those kinds of things. It was an interesting time to be involved and to be sort of coding things and understanding how it happened. 2019, I was asked to do the second edition which by this point in 2019, an awful lot had happened. Self-driving cars and biased algorithms and-

Tim Berglund:

Yeah. I was going to say 2014 to 2019, the huge difference [inaudible 00:45:23]. This is the second edition of a book, it's not clear whether it's even the same book at this point, right?

Jason Bell:

I would say it was… I went back to them, I was honest with them. I said, “Look, it’s half a rewrite here.”

Tim Berglund:

Yeah. It's the same, I guess, same [ESPN 00:45:37], second edition and all that, but [crosstalk 00:45:41] call your office.

Jason Bell:

Yeah. The part that you'll be really interested in Tim is that I took out the Hadoop chapter and replaced it with a Kafka chapter.

Tim Berglund:

I am not sad about that. I

Jason Bell:

I didn’t think will be somehow. Bye-bye yellow elephant and hello K with circles on the end of the logo.

Tim Berglund:

K with [crosstalk 00:46:05], [inaudible 00:46:05] circles, yeah?

Jason Bell:

Yeah, so there's a chapter on Kafka in there and that is pretty much an entire chapter on building a self-learning machine learning platform. Data gets ingested in with Kafka Connect, goes through the system, gets piped out with Kafka Connect to create files which then can trigger the models. There's a linear regression model, there’s a neural network model and there's a decision tree model, I had to rack my brain. It's been a while since I looked at this now. Then there was an API that I wrote enclosure, my favorite programming language to do the predictions via HTTP. Yeah. It was interesting…

Jason Bell:

That started out as a talk that I did for Strata London in 2018 and then became part of the chapter of the book because this is a project in itself really, is quite a big project. Then the rest of the book, there's a chapter on Spark has a chapter on R and then there's coverage of, there's a starters 101 chapter. Now, there's things about data cleaning and data acquisition which doesn't really get talked about either and I think it's one of the joys of the job that I do, is I'm so knee deep in data cleaning it and all the rest of it that I don't mind writing about it and talking about it. Whereas, there's a lot of people that just think it's not important and to be honest, it's probably the most important part, [inaudible 00:47:43] thing-

Tim Berglund:

Yeah. When-

Jason Bell:

[crosstalk 00:47:47]. This is why we should all be using Schema Registry in Avro pilots. Anyway, yeah. Yes, I wrote a book. The second edition came out in 2020 in February and then COVID happened so I haven't, I've still got, I still got the author copies at my feet under my desk.

Tim Berglund:

Right-

Jason Bell:

[crosstalk 00:48:08] give them out.

Tim Berglund:

Yeah. What's a thing you're most excited about coming up in the near future of Kafka? Thing you'd like to see happen. Give us an aspiration for the future.

Jason Bell:

Right. Well, first of all, it's great to see tiered storage, that's really interesting. I've not looked at it properly yet because I just have not had the time to look at it. Obviously, I've been dancing in circles in the streets about KIP-500 because it means that I now no longer have to even think about RabbitMQ as much as I love RabbitMQ. Yeah.

Tim Berglund:

You don't have to feel the allure of a ZooKeeper [LIS 00:48:56] system that's kind of like Kafka in-

Jason Bell:

No, no, no. The one thing that I would miss about RabbitMQ is RPC like client connections. Anyway-

Tim Berglund:

Okay.

Jason Bell:

One thing that Kafka… If you're to do it in Kafka, you'd have an input topic and an output topic which actually it's quite interesting if you take things into consideration. Like the Rendezvous Model which Ted Dunning, talks about quite a bit in times gone by where you can have multiple machine learning models. You send out a single request via topic, for example, and then 10 answers come back and the Rendezvous Model collates the answers and then forwards them on to say an output prediction topic, those kinds of things.

Jason Bell:

But as I say, the RPC thing, it was just something that was really cool in Rabbit that I always liked because it was just simple to do. It meant you weren't required to write separate consumer and producer for it. You could just write it in a single producer, wait for the response and there you are anyway. These things-

Tim Berglund:

Yeah. No, I can’t confirm but-

Jason Bell:

[crosstalk 00:51:37] all time zones. That's what I don't understand.

Tim Berglund:

That is the thing. Nobody understands how but it, he does and-

Jason Bell:

[crosstalk 00:51:46] he got Hootsuite account. Is that what he's doing?

Tim Berglund:

[crosstalk 00:51:52]. A time Turner. I'm trying to remember from Harry Potter, maybe I [inaudible 00:51:57].

Jason Bell:

Yeah. The virtual, being forced not to do in-person conferences, which I kind of miss, has opened up this explosion of some fantastic content and meet-ups virtually. It was so cool last night that… Listeners, look, it's all very well talking at the Cleveland Ohio Kafka meet-up, but when you're in the UK, I'm basically talking at 11 o'clock at night and I'm half asleep.

Tim Berglund:

No, that's just us extremely rewarding.

Jason Bell:

It's always the best trust [inaudible 00:53:19] wonderful. Yeah. Because this goes to 11 is kind of used. No?

Tim Berglund:

It is and nobody even really knows the spinal tap thing. It's just kind of in the lexicon and the hello Cleveland is a lot more-

Jason Bell:

When I took it, the bass guitar in 1996, a guy I worked with in a guitar shop lend me his copy of spinal tap on the video and he says, “If you're going to be in the band, you'll need to watch this,” and it was the best thing [inaudible 00:53:49].

Tim Berglund:

Yes.

Jason Bell:

He’s still a dear friend now. Yeah, I don't take these things too seriously as you gathered. No, the reason that did Cleveland is because Dave Klein, Anna McDonald, Shay, all encouraged me to do so and I said, “Yeah, okay. It's going to be late.” “That's fine. No problem.” I actually quite enjoyed doing it. It’s interesting to get a different audience. I think this, if there is one thing that I would encourage people to do, especially if they're doing talks on Kafka or development on Kafka for that fact, is to do talks that are not in their natural time zone because you get a different kind of audience. Some of them got some very different questions, yeah, it's really interesting. Not just that-

Tim Berglund:

I've found that in my speaking career, very true, even different regions in the United States, definitely different regions of the world, there are different default interests and kinds of questions and the nature of question asking varies from place to place. Sometimes it's stump the chump, sometimes it's super deferential and embarrassed to be asking, sometimes it's why the heck haven't you answered this already. It just depends and it's great just to get that.

Jason Bell:

One of the things I've always tried to put across in talks is there's no such thing as a stupid question. I think every question's valid and every question that someone asks deserves an answer. It's like, yeah, I would hate to say to someone, “You should know that.” That's just very arrogant to do that kind of thing. I would never do that. If I have in any point in my 32 year career, I do apologize. I always pitch my talks to someone who is coming into this new and doesn't know, and if people know their stuff, they usually don't mind hearing it again because they might actually learn something new.

Jason Bell:

That's one of the great things I've realized with the Kafka meet-ups this year that have happened virtually. There's a real big amount of push and pull and it's absolutely great. I did not hang about for too long last night, I was really tired, was the conversations afterwards as well, we were talking about deleting topics with 1000s of partitions. It takes so long, it blocks things. How do we solve that? Is there anything we can do about it? Those kinds of things. I was talking this afternoon with my CTO, Johnny Miller, and he learned KSQL does not have a DLQ for failed messages on converts. It's just like, “I'd really liked that.” I think that's actually quite important because that's the way I think. If-

Jason Bell:

Absolutely. I can't keep an eye on Slack Channels 24 hours a day and I'm not even going to attempt to in all fairness. I subscribed to about six or seven different Slack Channels, closure ones, the Confluent Community one, which is great where they've climbed [inaudible 00:58:13] about doing under the talk which is really nice. It's nice to be wanted.

Tim Berglund:

Isn’t it?

Jason Bell:

Yeah. But there's so much content at the minute, I can't keep it all in. I can't find the time to sit down and listen to it all either. I'm probably going to spend Christmas... I actually said to Shay last night because, “Do you want to do another meet-up talk?” I said, “January.” I'm not doing anything else in 2020. Tim, you're the last talk that I'm doing this year.

Tim Berglund:

With that, my guest today has been Jason Bell. Jase, thanks for being a part of Streaming Audio.

Jason Bell:

It's an interesting concept I found is I'm willing to listen to someone that has used Kafka for six years, seven years, but you know what, I'm also willing to listen to someone who's used Kafka for six months because they tend to uncover things that you had not even considered or thought about. Or they might have found something out that you just didn't know about and I want to learn that. Yeah, that's really interesting. I want to know about that. Please tell me about it. I found talks have been really interesting as well, especially virtually is you, as a speaker can come away with new information that you'd never even considered before as much as the information that you're imparting to the audience.

Jason Bell:

Thank you very much. I really enjoyed that. Hope you did too.

Tim Berglund:

Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST. That's 6-0-P-D-C-A-S-T to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31 2021 and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit and there are a limited number of codes available so don't miss out. Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in our community slack. There's a slack signup link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever find podcasts are sold. And if you subscribe to Apple podcasts, be sure to leave us a review there. that helps other people discover us which we think is a good thing. So thanks for your support and we'll see you next time.

Jason Bell (Apache Kafka® DevOps Engineer, digitalis.io, and Author of “Machine Learning: Hands-On for Developers and Technical Professionals” ) delves into his 32-year journey as a DevOps engineer and how he discovered Apache Kafka. He began his voyage in hardware technology before switching over to software development. From there, he got involved in event streaming in the early 2000s where his love for Kafka started. His first Kafka project involved monitoring Kafka clusters for flight search data, and he's been making magic ever since!

Jason first learned about the power of the event streaming during Michael Noll’s talk on the streaming API in 2015. It turned out that Michael had written off 80% of Jason’s streaming API jobs with a single talk. 

As a Kafka DevOps engineer today, Jason works with on-prem clusters and faces challenges like instant replicas going down and bringing other developers who are new to Kafka up to speed so that they can eventually adopt it and begin building out APIs for Kafka. He shares some tips that have helped him overcome these challenges and bring success to the team.

Continue Listening

Episode 132December 7, 2020 | 42 min

Apache Kafka and Porsche: Fast Cars and Fast Data ft. Sridhar Mamella

Apache Kafka + Porsche = Fast cars and fast data! Sridhar Mamella (Platform Manager, Data Stream Platforms, Porsche) discusses how Kafka’s event streaming technology powers Porsche through their StreamZilla platform.

Episode 133December 16, 2020 | 47 min

Choreographing the Saga Pattern in Microservices ft. Chris Richardson

Chris Richardson, creator of the original Cloud Foundry, maintainer of Microservices.io, and author of “Microservices Patterns: With Examples in Java,” discovered cloud computing in 2006 during a Google talk about APIs for provisioning. Chris explains what choreographed sagas are, reasons to leverage them, and how to measure their efficacy.

Episode 134December 21, 2020 | 10 min

Apache Kafka 2.7 - Overview of Latest Features, Updates, and KIPs

Apache Kafka® 2.7 is here! Here are the key Kafka Improvement Proposals (KIPs) and updates in this release. Find out what’s new with the Kafka broker, producer, and consumer, and what’s new with Kafka Streams in today’s episode.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free