Get Started Free
March 14, 2023 | Episode 262

How to use Python with Apache Kafka

  • Transcript
  • Notes

Kris Jenkins (00:00):

Joining me on this week's Streaming Audio is Dave Klein. Now, a lot of you will know Dave Klein. He's been something of a staple of the Kafka scene for a number of years, and he's also spent a lot of happy hours as part of the Python scene, and it's the cross-pollination of those two worlds that he's here to talk to us about. What's the state of Python for Kafka, or Kafka for Python?

Kris Jenkins (00:24):

Dave has just released a course for people getting started using Python and Kafka together, so I thought we'd get him in to chat about it. About the options available, if that's what you want to do, the pros and cons of different libraries, the use cases, or the things you should know if you're about to dive in.

Kris Jenkins (00:42):

There's a link to Dave's detailed course in the show notes, and you'll find it at developer.confluent.io, but for now, sit back and let Dave tell you about that particular corner of computing where the snake meets the stream.

Kris Jenkins (01:02):

Joining me today is Dave Klein. Dave, how you doing?

Dave Klein (01:05):

Doing all right, thanks. How about you?

Kris Jenkins (01:07):

Man, very well. Glad to have you here.

Dave Klein (01:09):

Yeah.

Kris Jenkins (01:09):

You've just released a Python and Kafka course based on your Python and Kafka world-leading expertise...

Dave Klein (01:18):

Right, right.

Kris Jenkins (01:19):

Expertise I think is the word I wanted. So, I thought I'd grab you in to talk about the state of Python and Kafka in the world.

Dave Klein (01:29):

Sounds good, yeah.

Kris Jenkins (01:30):

Yeah, yeah, but there is definitely a perception in the Kafka world that it's a Java thing.

Dave Klein (01:37):

Yes.

Kris Jenkins (01:38):

Which I think is just because it's written in Java and there's been a lot of historical support for Java, but it's not a Java platform, it's an anyone who wants to write data platform, right?

Dave Klein (01:49):

Right. Well, it's also that the client that comes with Kafka when you download it is the Java client.

Kris Jenkins (01:53):

Yeah, but there are plenty of others out there, right?

Dave Klein (01:55):

Yes, lots of them.

Kris Jenkins (01:56):

Indeed. You're going to tell us about one or two specifically.

Dave Klein (01:59):

Yeah, there are a couple of really good libraries out there for Python. There's a community-based one, which you can find on GitHub, and it's got quite a bit of users, it's well-maintained and active. There's also one that's put out by Confluent, which is based on the librd kafka project, which is a C library that does have the advantage of being probably the one that's closest par to the Java library. And Confluent, puts a lot of effort into keeping it that way. So it's a great library to use, and that's the one that we do talk about in the course, but both of them will get you what you need to do. I did some real just ad hoc sort of... What's the word I'm looking for? Benchmarking, sorry.

Kris Jenkins (02:46):

Yeah.

Dave Klein (02:47):

We did a little bit of ad hoc benchmarking with the two, and the Confluent library is a little bit faster, about 10%, in most use cases that I tried. That's nothing scientific there, but just something to think about.

Kris Jenkins (02:57):

Like first stab, rule of thumb thing.

Dave Klein (02:59):

Yeah.

Kris Jenkins (02:59):

Yeah. One thing-

Dave Klein (03:01):

But yeah, really I've been surprised to see how popular Kafka is in the Python community.

Kris Jenkins (03:07):

Really?

Dave Klein (03:08):

Yeah. There's so many resources out there for it. So yeah.

Kris Jenkins (03:11):

What kind of things do people do with it?

Dave Klein (03:14):

Well, obviously Python is very heavily used in machine learning applications and data engineering in general, and so there's a lot of use cases there for just building streaming machine learning pipelines, feeding the data to your models. As well as for training models, you can use the same data set, and just by resetting your offset, you can re-feed that same data through over and over again as you tweak your models and things like that. So, there are people using Kafka for that.

Dave Klein (03:40):

There's actually... was it folks from Baddr. Actually, they were on the podcast with you before, and they've done some things on that, and they actually have a blog post on the Confluent blog where they're using Kafka to train their models-

Kris Jenkins (03:54):

Ah, okay.

Dave Klein (03:54):

... in real-time. As they're using them even, so yeah, it's used there.

Dave Klein (03:59):

But also, and this was a surprise to me since I'm used to Java for microservices, but there's a lot of people building microservices with Python. Again, most of them are using the traditional request response structure for microservices, but I've spoken at several conferences on building event driven microservices with Python and Kafka, and those talks have been very popular. In fact, each time I've given that talk, it's been pretty much standing room only in that [inaudible 00:04:26].

Kris Jenkins (04:25):

Wow.

Dave Klein (04:28):

So yeah, and there are a lot of great discussions afterwards and people are starting to see the value of that there. So that's another popular use case apparently.

Kris Jenkins (04:35):

I can see that being actually quite popular. I mean, there's something very nice about Python, maybe TypeScript, kind of lighter weight languages for doing microservices.

Dave Klein (04:44):

Right.

Kris Jenkins (04:44):

I'm not going to get into language wars, but Java is relatively heavy.

Dave Klein (04:51):

Yes.

Kris Jenkins (04:51):

Yeah, I can see that being popular. And then you've got the whole Kafka microservices story that we've talked about a few times.

Dave Klein (04:57):

Right.

Kris Jenkins (04:57):

I can see that being an overlap. So why don't we go right back to the start and get your history? How did you get into the world of Kafka and the world of Python? Did you come to one before the other, or?

Dave Klein (05:11):

I've dabbled with Python a little bit, mostly just in building Lambdas.

Kris Jenkins (05:15):

Okay.

Dave Klein (05:16):

So I've used those for a few small projects that I've done in the past. I've been doing Java development though mostly for 20 years or more. And although a big part of that, I was actually working with the Groovy programming language, which is-

Kris Jenkins (05:28):

Oh, really?

Dave Klein (05:30):

... [inaudible 00:05:30] language, which is heavily inspired by Python. It's, yeah, in fact, I saw that more clearly than once I started working with Python more lately, you could see a lot of the Python influence on the Groovy language. And I loved Groovy, but I wasn't able to find work in it for the most part. So-

Kris Jenkins (05:46):

Yeah.

Dave Klein (05:47):

... [inaudible 00:05:47] Java.

Dave Klein (05:51):

So I'd done some Python, like I said, for Lambdas, but then I started working at Confluent. I started working with Kafka actually at a company before I went to Confluent. And then I started using Kafka heavily then again, mostly with Java. But while I was there, we started realizing that a lot of Kafka users were using Python. In fact, there was a survey done at one point by somebody on the community team checking for language popularity among people in the Kafka community. And Python was the second most popular language after Java. So that's a bit outdated on that survey, but it was an important data point. And so-

Kris Jenkins (06:27):

Yeah.

Dave Klein (06:29):

... I started pushing for... we just started doing more content for Python developers, and that's how this course came about. And hopefully there'll be more coming, but I think it's important to start providing resources for people in the Python community.

Kris Jenkins (06:42):

Yeah, absolutely. I'm hoping we'll do one with JavaScripts and Typescripts before long.

Dave Klein (06:46):

For all the different languages would be helpful. Because there are so many, like you said, Kafka can be used with almost any language, even with Haskell, right?

Kris Jenkins (06:53):

Even with Haskell, yeah. I'm not holding my breath for the Haskell, Kafka course being commissioned, but on that day I'll be there already.

Dave Klein (07:01):

It is, I'm thinking of doing it though, right? Yeah.

Kris Jenkins (07:05):

So I'm going to segue with Haskell then. So one thing you get in Haskell is there is a Kafka library, and it's based, it's a wrapped C library, and you find that in a lot of languages that they take librd Kafka, which is a very fully featured C library and wrap it.

Dave Klein (07:24):

Yeah.

Kris Jenkins (07:24):

And then you get a few languages like Python. And I think JavaScript is one where there is also a native implementation.

Dave Klein (07:32):

Right.

Kris Jenkins (07:34):

Which should people pick in your opinion, what are the trade offs there?

Dave Klein (07:38):

Well, yeah, it really depends on your use cases. Like I said, the trade-offs, one of the trade-offs is going to be with the Confluent one, which is the librd Kafka wrapper, you're going to be more in sync with the Java library, which is what most of the documentation is out there, so you can use all the same configuration values, or most of them anyways. And most of the docs that refer to the Java client will apply to your client as well. So there's that. And that also is a little bit more performant, I think, as I mentioned earlier.

Dave Klein (08:09):

But there's some drawbacks with it as well. And that is, it's not pure Python, so it is a wrapper of the C library. And so if you were trying to embed a Kafka consumer producer in a pyscript client running in a browser, you wouldn't be able to do that with a Confluent library. So there's some trade-offs there. And then also just if your background's purely Python and you don't really have any reason to try to match the Java client, then the Python client is probably going to be more comfortable to you, more familiar to you.

Kris Jenkins (08:41):

So you generally get the more kind of idiomatic language style that's more familiar in the native implemented client, right?

Dave Klein (08:47):

Right.

Kris Jenkins (08:48):

And that's nice when you're getting started.

Dave Klein (08:49):

Exactly, yes.

Kris Jenkins (08:51):

Yeah. And it makes me think, I've got this little pet project with, do you know Ada Fruit? They make all this great hardware stuff.

Dave Klein (08:58):

Mm-hmm, yes.

Kris Jenkins (08:59):

Yeah, and I have a pixel display board that just runs Python.

Dave Klein (09:05):

Oh, okay.

Kris Jenkins (09:07):

And I tried to get the Confluent Kafka library on it, and I couldn't because it requires C. And I'm hoping that after this conversation, I'll know what I need to know to get the native one running.

Dave Klein (09:16):

Yeah, you could give that a try,

Kris Jenkins (09:19):

That'd be cool. But the flip side, and I have to mention this for performance, and I think this is again, my expertise, this is definitely true for Java. I assume it's true for Python that the C Library has things like producer batching, which makes a huge impact on performance when you get large.

Dave Klein (09:37):

Right. And the Python client definitely has... at least the Confluent Python client definitely has that, I'm not certain about the native client. I would assume that it does something like that, but I don't know for sure.

Kris Jenkins (09:48):

That's one of those things where you'd need to check the feature list before you-

Dave Klein (09:51):

Yeah. But the Confluent Kafka client client does definitely have that. It also, the consumers participate in consumer groups just like the Java of client Java consumers do. So it takes advantage of a lot of, so that's what I say, that's one of the key advantages too, is the things that you hear about from the Java world in Kafka, they're all going to be there in the Confluent Kafka client.

Kris Jenkins (10:15):

Yeah.

Dave Klein (10:15):

Have trouble that word today.

Kris Jenkins (10:18):

Yeah, I think when you're getting started,

Dave Klein (10:21):

[inaudible 00:10:21] a client, right?

Kris Jenkins (10:23):

Yeah, a client. In fact, someone's going to create a library with that name now plient, Python Kafka Library client. Yeah, I can see that happening. So you go into this in-depth in your course, but do you have any tips for someone getting started in Python with Kafka?

Dave Klein (10:46):

Well, yeah, if you want to just start really quickly and easily, you could do it with a Jupyter Notebook, and I was surprised, yeah, how well that runs. Confluent provides some great Docker compose files that you can find on their tutorial site, and they'll load up infrastructure for you. You can actually run those within a Jupyter Notebook.

Kris Jenkins (11:06):

Oh.

Dave Klein (11:07):

[inaudible 00:11:07] did that. Yeah, you could run shell scripts in a notebook, so just run there. But you know, could do it on command line as well. But I would say I would recommend just run the dock post files that get up your infrastructure, they'll spin up your zookeeper and Kafka broker, all those things you need. And then you can start, import the Confluent Kafka library. First it's just pip install Confluent-Kafka. And then once you've got that installed, then you can just import the producer and consumer from that library, and away you go.

Dave Klein (11:43):

In fact, it's so simple to use, it's just a few lines. Like I said, you could do it... on my Github repository I've got a notebook that's just a producer and a consumer in it. That's really simple. But I recently did a half day workshop on Kafka, and it's a Java based workshop.

Kris Jenkins (12:00):

Okay.

Dave Klein (12:00):

And the whole time I was doing it I was thinking, "Boy, I wish I was doing this with Python, it'd be so much..." Like, you know you give exercises for the students in a workshop to do, right?

Kris Jenkins (12:08):

Yeah.

Dave Klein (12:08):

And so I had to basically build a project, put it on GitHub, and leave out a few bits for them to fill in because there's no way they can do the whole thing from scratch in the time that we have in this workshop because there's so much boiler plate code to include in a Java project.

Kris Jenkins (12:22):

Yeah.

Dave Klein (12:22):

And so Python is actually a great way to learn Kafka because there's so much less you have to do before you're actually doing the important bits.

Kris Jenkins (12:29):

Yes. Yeah. I found recently that I've started to lean more on things like Python and Typescript for prototyping, if not the final product. Right?

Dave Klein (12:39):

Exactly, yeah.

Kris Jenkins (12:40):

And a lot of people would stick with that for the whole lifetime of the project and go for it. But I would suggest to anyone using Java that maybe having a lightweight scripting-ish language under your belt is a valuable addition to getting things like sketched out.

Dave Klein (12:58):

Yeah, exactly. Yeah, you've got the wrap all, you've got notebooks, you've got lots of different ways to do things quickly that don't take all of the extra work that you would need to do with Java.

Kris Jenkins (13:06):

Yeah. But then that gets us into another topic, which I'm sure Java purists will come back to every time, types and type checking and schemas and validation of type shapes. What can you teach us about that?

Dave Klein (13:25):

Well, we do have a section in the course, that covers working with schema. So as far as when you're working with Kafka, you do want to use Schema, and that's supported with the Confluent Kafka Python library. So it still works with the schema registry as well. So you can still have your schemas stored in the registry and the producers can pick them up from... or, can store them there, and then consumers can pick them up from the schema registry. So you still have those same advantages, and I recommend doing that. So that's why we wanted to make sure to include a module on that in the course. As far as types in your programming, that's still up for debate, there are some of those folks like you that types that are [inaudible 00:14:08] direction. So there is type hints you can add to Python now, so you can have some of that behavior that you're looking for. They don't actually exist when you're running it. It's not enforced it quite the same way, but they'll help you out in your IDs and things like that.

Kris Jenkins (14:22):

Okay. Is it like a compile time linter?

Dave Klein (14:25):

Yeah, basically.

Kris Jenkins (14:26):

Okay. Yeah, yeah.

Dave Klein (14:27):

You can. And there's a few different ways to do that. I prefer not having them. That's why I said I really enjoyed Groovy because it was a dynamic version of Java and types were optional. But I understand the argument. I don't want to get into...

Kris Jenkins (14:45):

No. No, I think that you could do a whole podcast just every week-

Dave Klein (14:49):

Exactly, yeah.

Dave Klein (14:55):

Right. So you can, and a lot of people do use that for providing, especially on shared projects with multiple people on a team working on it. It does give more information to other developers who don't know what you were thinking when you wrote the code. But since, I guess I've worked so much on my own for projects that I know what I meant, and so it's... Also using well named variables, things like that.

Kris Jenkins (15:18):

Yeah. But yeah-

Dave Klein (15:20):

[inaudible 00:15:22].

Kris Jenkins (15:21):

... just agree that schemas, schemas are a good thing.

Dave Klein (15:25):

That is definitely-

Kris Jenkins (15:26):

Company wide.

Dave Klein (15:27):

... because one of key Kafka, one of the key things with Kafka clients, the producers and consumers are decoupled. They don't know anything about each other, right? But they do need to agree on the way the data that they're working with. And so I think schemas are very critical for that.

Kris Jenkins (15:42):

Yeah.

Dave Klein (15:43):

And they work really good with a Python clients.

Kris Jenkins (15:45):

And from what I've used in the Python world, actually, the integration with schemas, frankly it feels nicer than the Java stuff. That's been my experience. It's just a little more lightweight. You have to hint a bit less about what your types are. And I don't find I get any more runtime errors from Schema deserialization than I do in the Java world.

Dave Klein (16:11):

Right, right. Yeah, it works really well with them there. The integration with the schema registry is not quite as thorough as it is with the Java client. It's there, but it doesn't do, you still need to give it... Like, with the Java client you don't have to mention the schemas at all really. It's automatically picked up from the message. The schema registry will pick it up and the consumers will pick it up. So it's much more seamless. You still need to pass in the schema when you're constructing the producer, if with the schema registry, some things like that. So there's a little, but it's minor, and it's only when you're first making that first connection. And I'm sure it'll improve over time as well. But yeah, right now the Java client does work a little bit more seamlessly with the schema registry than the Python...

Kris Jenkins (17:01):

Oh, you think? Okay. Maybe it's because I've been mostly writing Kafka streams lately, but I find it a little bit painful the amount of [inaudible 00:17:09] I have to do.

Dave Klein (17:10):

In working with it it's great, it's just in setting up the initial connection with the schema registry that's not quite as seamless.

Kris Jenkins (17:19):

Fair enough. That takes us to another topic, which I know absolutely nothing about. But you covered this in your course, which is there's a whole general admin client in the Python world.

Dave Klein (17:30):

Yeah. Well, there's actually one in the Java world as well. It's still there-

Kris Jenkins (17:32):

Okay, I've not used it.

Dave Klein (17:33):

In the Java library. Yeah, it's not... The thing with admin client is you mainly would use it if you're building a Kafka tool. Right?

Kris Jenkins (17:40):

Oh, okay.

Dave Klein (17:41):

Those things allows you to create topics and delete topics and get information about topics. You can also change configurations on the brokers. You can find out what brokers you have. You can even do things like acls. There's all kinds of things that you can do with the admin client that you don't normally do in your application.

Dave Klein (18:02):

But the thing that I point out in the course and the place where I think it is helpful sometimes is to ensure that certain things are there when your application starts up. So you can use... and this is something I'd recommend people to do, if there's any configurations or topics that they need that have to be there for the application to work, is you can use the admin client to check for the existence of the topics that you need. And if they don't exist, you can create them. You can also check for configurations. So if you know that your application is relying on a certain max message size, right? Maybe your-

Kris Jenkins (18:36):

Right.

Dave Klein (18:37):

... [inaudible 00:18:37] messages are a little bit bigger than the default, so you want to make sure that that's a set or your application is going to fail. So you can check the configurations on the broker. Yes, that it's set the way you want it to be, or change it if you need to.

Kris Jenkins (18:50):

So that's one of those things that can solve a lot of problems when you're moving from development, to QA, to production. Right?

Dave Klein (18:57):

Yeah. Or, if you're building a package that might be building software that might be used by other people as well. Something that you're posting, people can download it, or it's an open source project or even a commercial project that you're giving out.

Kris Jenkins (19:10):

That explains why I've never encountered it. I've never actually gone... I've never actually-

Dave Klein (19:13):

Yeah.

Kris Jenkins (19:13):

Yeah.

Dave Klein (19:17):

But when you have that need, it really is a helpful tool to have.

Kris Jenkins (19:20):

Yeah, yeah. Outside of work, do you use Python and Kafka much, or is it mostly a work tool for you?

Dave Klein (19:32):

Oh boy. Yeah, it's mostly just been for work.

Kris Jenkins (19:35):

That's fair.

Dave Klein (19:36):

Yeah.

Kris Jenkins (19:37):

You're a busy man,

Dave Klein (19:38):

That's the thing. Yeah, I haven't had a lot of side project things in a long time. I used to do more of that, and if I did, I think I would be finding ways to use Kafka now. Kafka I found was one of those things that once you discover it, you start finding uses for it all over the place. I mean, once I started learning about Kafka, my mind raced back to many projects I worked on that really could have used it and would've benefited from it.

Kris Jenkins (20:06):

Yeah.

Dave Klein (20:08):

And so I could even think of side projects of my own that I've done, that yeah, Kafka would've helped with that too.

Kris Jenkins (20:13):

Yeah, it's funny. I've worked on... I've written up one or two of them actually, it's like places where we tried to reinvent a real time data stream in Postgres or things,

Dave Klein (20:26):

Right.

Kris Jenkins (20:27):

And it's like... Back then it was probably the right choice, but back then Kafka didn't exist and now I do it differently.

Dave Klein (20:34):

Right, yeah.

Kris Jenkins (20:35):

Yeah. So we've touched on... One other thing I wanted to discuss with you. I mentioned Kafka streams and my current wrestling with it in Java.

Dave Klein (20:45):

Mm-hmm?

Kris Jenkins (20:46):

Do I have any options for the Python world?

Dave Klein (20:49):

Yeah, there are some. There is probably the most well-known one is Faust, which I believe started with Robinhood, I think put-

Kris Jenkins (21:01):

Oh yeah, yeah. I've heard some of this... because they kind of dropped it and then it taken over by someone else,

Dave Klein (21:06):

[inaudible 00:21:07] And it was forked now. So there's a community supported version that's out there now that's still being maintained. And that probably is the closest match to what you would do with Kafka streams, but for Python, and it can do most of the same things. So it's pretty full featured.

Kris Jenkins (21:24):

Okay.

Dave Klein (21:24):

It's got a new library. There's some others out there that are newer. There's one that's relatively new called Byte Wax.

Kris Jenkins (21:31):

Byte Wax?

Dave Klein (21:32):

Yeah.

Kris Jenkins (21:32):

Okay.

Dave Klein (21:33):

Which is Python and it has a Rust component built in, but I as a user of it you wouldn't really see that from what I hear. I haven't actually tried using it myself yet, but I did talk with one of the folks on Byte Wax recently, and they... it sounds like it would be a really good option to check out if you want to do Streaming with Python. Those are the two that I know of that you can actually build into your applications like you would Kafka streams. The others, there's other services out there that you can use that that will provide Streaming as well. There's a company called Quix, which has a Streaming library, or Streaming platform that they provide. You can almost think of it as equivalent to ksqlDB where we're running on Confluent Cloud, except that this is Kafka and Python applications running on the cloud.

Kris Jenkins (22:27):

Oh, okay.

Dave Klein (22:28):

So you can [inaudible 00:22:29] applications in Python and has the Streaming components and stuff all built into that. And then you deploy the whole thing onto their cloud infrastructure.

Kris Jenkins (22:39):

Okay, you might have to send me some contacts, maybe we can get them on podcast too.

Dave Klein (22:42):

Yeah. I think it would be really interesting to have them on. Yeah, it's a pretty interesting... pretty interesting project.

Kris Jenkins (22:49):

Yeah. I'd like to talk to someone about the gory details of implementing your own Kafka library.

Dave Klein (22:56):

Yeah.

Kris Jenkins (22:57):

Yeah.

Dave Klein (22:58):

So as I said there's a few options to use for that and it's becoming, especially as Python is becoming used with Kafka more in things like Streaming pipelines for machine learning, or event-driven microservices, people are seeing the need for a Streaming library more. So you've seen lots of people asking if there's one that Confluent supports, and Confluent doesn't support one directly, but just you're seeing a lot of growth in things like Faust and now Byte Wax.

Kris Jenkins (23:28):

Yeah.

Dave Klein (23:29):

So I think we'll hear more from both of those in the future.

Kris Jenkins (23:31):

I've got to try and use one of these in anger, against a real project.

Dave Klein (23:35):

That's the key thing.

Kris Jenkins (23:35):

Yeah. Which you don't always find the time to do in this job.

Dave Klein (23:41):

Right.

Kris Jenkins (23:42):

In any job actually. You don't always find the time to experiment, right? Too busy with the day to day.

Dave Klein (23:47):

Yeah, that's what happens.

Kris Jenkins (23:50):

But you get to do this fun thing where you stop the day to today and record a training course for Python.

Dave Klein (23:57):

Yeah, that was a lot of fun.

Kris Jenkins (23:59):

Yeah?

Dave Klein (23:59):

Yeah. It was-

Kris Jenkins (24:00):

What's it actually doing it?

Dave Klein (24:02):

Well, first I was building the course and so building the course, I started thinking, "Oh, this will be building a conference talk." And so I started on that road and it's not, it's very different really. And so it still has a slide deck with it. And then I turn my speaker notes. Normally I have just a few bullet points on my speaker notes, and then when you do give the talk, you just kind of adlib it as you go. That doesn't work for a recording, something like this course. And so I had to turn the speaker notes into a full fledged script. Fortunately there was a coworker, Dave Shook, he's really good with this kind of stuff. And he turned my notes into a document for me, which then turns into a script, and then I was able to work through that, and I had to rehearse it a lot more than I would normally do for a conference presentation.

Kris Jenkins (24:48):

I can believe that. If you're expecting to hit specific sentences.

Dave Klein (24:51):

Exactly. Yeah, so I did a lot more rehearsing of it. And so that was unique, or a new thing for me. I'll generally practice a conference presentation, but not to this extent. And then the recording was really fun. There's a team that works out of Oakland. We went up to their little studio. We went-

Kris Jenkins (25:09):

Oh, wow.

Dave Klein (25:11):

[inaudible 00:25:11] their studio, which had its own adventures because their studio is in a little... it's in a converted factory that's turned into a suite of boutique shops of different kinds. Really neat kind of place. Almost like a co-working space except it's little offices, office suites. And they're not just all these offices. The one next door to the studio happens to be an auto detailing place and they made a lot of interesting noises.

Dave Klein (25:41):

So we would have to stop the recording once in a while when they were doing something noisy and the folks doing the recording, they knew how long, what they were doing over there. It's, "Oh, right now they're doing this and that. So they'll be done in a minute." So we'd pause and wait again. But they did an amazing job of filtering all that out and then some edits too and stuff like that. And they also just made it, it was the first time I've done any recording or anything like that in front of a camera reading from a teleprompter and that kind of thing.

Kris Jenkins (26:10):

Oh yeah.

Dave Klein (26:12):

I think they made it a lot easier, or made it very easy for me to do so. But that was a lot of fun. But I was really just happy to be able to do it because Confluent Developer has some great courses as you know, but none of them were there for the Python community, they're all focused, or they're all assuming you're going to be working with Java for the most part.

Kris Jenkins (26:32):

Yeah, there's nothing inherently java-y about Kafka, but we could do more to make sure people know that.

Dave Klein (26:40):

Right, and so that's what I really have to do to with that course. And I've heard some good feedback from people that have viewed the course. So hopefully it'll be helpful.

Kris Jenkins (26:48):

So what's the structure of it for people thinking of taking a look?

Dave Klein (26:52):

Yeah, so it starts off with a video. There's several modules, each module starts with a video lecture basically, and with some code on the slides so you can see the shape of the code and the structure of the classes that are going to be used and the functions that are included. And then it goes to a hands-on exercise after each module, except for the final one. But after the other modules, each one has hands-on exercises. And that's basically like a tutorial. So there'll be some steps you can follow to do the exercise yourself and get hands-on practice with using the library.

Kris Jenkins (27:24):

Right. And it covers how to build a producer-

Dave Klein (27:27):

Yep.

Kris Jenkins (27:27):

... and a consumer-

Dave Klein (27:28):

It does.

Kris Jenkins (27:29):

... and deal with the admin client.

Dave Klein (27:30):

Producer, the consumer, the admin client, and also using the producer with the schemas, which is a little bit separate thing. So the producer by itself is simpler to use, but then can, if you using it with schemas, there's a few more steps to include. So there's a separate module just on that.

Kris Jenkins (27:45):

So if you want to get started with Kafka and Python, Dave is your man.

Dave Klein (27:49):

Yeah. Yeah, check out that course, it's a great way to get started.

Kris Jenkins (27:53):

It's a good addition to the Python Kafka world.

Dave Klein (27:55):

Yeah. And hopefully there'll be more coming down the road somewhere, somehow. But yeah.

Kris Jenkins (27:58):

Hope so. Some point this year-

Dave Klein (28:03):

[inaudible 00:28:02] Haskell course.

Kris Jenkins (28:04):

That Haskell course. Yes, absolutely. I have this theory, if I just say Haskell enough in this podcast I might get a 10th of a percent of the audience testing that.

Dave Klein (28:14):

That might work.

Kris Jenkins (28:14):

Yes. In the meantime, I should probably leave you to get back to your, it seems like you're always at another Python community meetup up somewhere.

Dave Klein (28:25):

Yeah, there's a lot of them out there. That's another thing like I've enjoyed about Python as I got more involved in it, is it's got a very vibrant community, especially if you're getting started. There are so many YouTube channels and Twitter accounts out there that people are just eager to help you get started and learn. And then if you go to an event that's just a very welcoming, warm community, very much like the Kafka community. So it's a great blend.

Kris Jenkins (28:48):

That's good to hear. Yeah, and it's sort of become one of the de facto, like your first language, languages.

Dave Klein (28:58):

Mm-hmm.

Kris Jenkins (28:58):

As well as being something you can take all the way into production. A lot of people get started with it 'cause it's friendly, right?

Dave Klein (29:07):

Right. Yes. But a lot of career opportunities in Python right now too. So it's a great, great... If you just want to get into software development in general. I think Python's a great choice.

Kris Jenkins (29:14):

Yeah, I've had a lot of fun with it. And of course it has that lovely indentation based syntax like Haskell.

Dave Klein (29:20):

Yeah. Okay, good. So I didn't know that it happened to have that, that was the one thing that took a little getting used to for me, but now it's really not a big deal once you're used to it actually makes a lot of sense. But I do want to mention one other thing. You actually used Python for one of your coding in motion videos, right?

Kris Jenkins (29:37):

I did, yes. Thank you for the plug. Yeah, that was Python against the YouTube API to send Telegram messages.

Dave Klein (29:46):

Exactly. Yeah, that was a really good one.

Kris Jenkins (29:46):

Yeah. Thank you.

Dave Klein (29:46):

We'll link to that in the show notes.

Kris Jenkins (29:46):

Yeah.

Dave Klein (29:47):

That was a really good...

Kris Jenkins (29:52):

Doing my job for me, thank you. Yeah, but the more we do Kafka with other languages, the better, the more diverse and thriving our community can be, right?

Dave Klein (30:03):

Exactly. Yeah. And it already is that way. I think it just helps us to get to provide more content for them, and I think it's going to be great.

Kris Jenkins (30:12):

Yeah. Well, thank you very much for recording the course, Dave, and thanks coming to talk about it.

Dave Klein (30:17):

Yeah, thanks for having me on. I really appreciate it.

Kris Jenkins (30:19):

Cheers. Catch you soon.

Kris Jenkins (30:20):

Thank you, Dave. And hopefully some of you will get to meet Dave at a Kafka summit soon, or a Python conference near you or something like that, something on the computing scene. Dave really is one of the nicest, warmest people you'll ever meet in computing. So if you do spot him, go up and say hi and tell him Streaming Audio says hi too.

Kris Jenkins (30:42):

As we said in that podcast, if you want to get stuck in with Event Streaming in Python, you'll find a link in the show notes, or you can just head to developer.Confluent.io and click on the courses link and you'll find Dave's course and lots of other great courses there for your edification.

Kris Jenkins (30:59):

And since Dave gave me the excuse to mention it, I will say that Confluent developer will also take you to a few episodes of Coding in Motion, where I've been hacking together some interesting Streaming projects in type script, bit of Python, bit of KSQL, and hopefully one day I'll sneak a Haskell episode in there. I'm not sure I'll get away with it, but maybe I'll try shortly before I get fired. Before I take that risk, it just remains for me to thank Dave Klein for joining us and you for listening. I've been your host, Chris Jenkins, and I'll catch you next time.

Can you use Apache Kafka® and Python together? What’s the current state of Python support? And what are the best options to get started? In this episode, Dave Klein joins Kris to talk about all things Kafka and Python: the libraries, the tools, and the pros & cons. He also talks about the new course he just launched to support Python programmers entering the event-streaming world.

Dave has been an active member of the Kafka community for many years and noticed that there were a lot of Kafka resources for Java but few for Python. So he decided to create a course to help people get started using Python and Kafka together.

Historically, Java has had the most documentation, and people have often missed how good the Python support is for Kafka users. Python and Kafka are an ideal fit for machine learning applications and data engineering in general. Yet there are a lot of use cases for building, streaming, and machine learning pipelines. In fact, someone conducted a survey to find out what languages were most popular in the Kafka community and Python came in second after Java. That’s how Dave got the idea to create a course for newbies.

In this course, Dave combines video lectures with code-heavy exercises to give developers a taste of what the code looks like, how to structure it, a preview of the shape of the code, and the structure of the classes and the functions so you can get hands-on practice using the library. He also covers building a producer and a consumer and using the admin client. And, of course, there is a module that covers working with the schemas supported by the Kafka library.

Dave explains that Python opens up a world of opportunity and is ripe for expansion. So if you are ready to dive in, head over to developer.confluent.io to learn more about Dave’s course.

EPISODE LINKS

Continue Listening

Episode 263March 21, 2023 | 57 min

How to use Data Contracts for Long-Term Schema Management

Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka data with a data contract (schema). A data contract is an agreement between a service provider and data consumers. It defines the management and intended usage of data within an organization. In this episode, Abraham talks to Kris about how to use data contracts and schema enforcement to ensure long-term data management.

Episode 264April 13, 2023 | 1 min

A Special Announcement from Streaming Audio

After recording 64 episodes and featuring 58 amazing guests, the Streaming Audio podcast series has amassed over 130,000 plays on YouTube in the last year. We're extremely proud of these achievements and feel that it's time to take a well-deserved break. Streaming Audio will be taking a vacation! We want to express our gratitude to you, our valued listeners, for spending 10,000 hours with us on this incredible journey. Rest assured, we will be back with more episodes!

Episode 265June 15, 2023 | 11 min

Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates

Apache Kafka 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free