Joining me on this week's Streaming Audio is Dave Klein. Now, a lot of you will know Dave Klein. He's been something of a staple of the Kafka scene for a number of years, and he's also spent a lot of happy hours as part of the Python scene, and it's the cross-pollination of those two worlds that he's here to talk to us about. What's the state of Python for Kafka, or Kafka for Python?
Dave has just released a course for people getting started using Python and Kafka together, so I thought we'd get him in to chat about it. About the options available, if that's what you want to do, the pros and cons of different libraries, the use cases, or the things you should know if you're about to dive in.
There's a link to Dave's detailed course in the show notes, and you'll find it at developer.confluent.io, but for now, sit back and let Dave tell you about that particular corner of computing where the snake meets the stream.
Joining me today is Dave Klein. Dave, how you doing?
Doing all right, thanks. How about you?
Man, very well. Glad to have you here.
You've just released a Python and Kafka course based on your Python and Kafka world-leading expertise...
Expertise I think is the word I wanted. So, I thought I'd grab you in to talk about the state of Python and Kafka in the world.
Sounds good, yeah.
Yeah, yeah, but there is definitely a perception in the Kafka world that it's a Java thing.
Which I think is just because it's written in Java and there's been a lot of historical support for Java, but it's not a Java platform, it's an anyone who wants to write data platform, right?
Right. Well, it's also that the client that comes with Kafka when you download it is the Java client.
Yeah, but there are plenty of others out there, right?
Yes, lots of them.
Indeed. You're going to tell us about one or two specifically.
Yeah, there are a couple of really good libraries out there for Python. There's a community-based one, which you can find on GitHub, and it's got quite a bit of users, it's well-maintained and active. There's also one that's put out by Confluent, which is based on the librd kafka project, which is a C library that does have the advantage of being probably the one that's closest par to the Java library. And Confluent, puts a lot of effort into keeping it that way. So it's a great library to use, and that's the one that we do talk about in the course, but both of them will get you what you need to do. I did some real just ad hoc sort of... What's the word I'm looking for? Benchmarking, sorry.
We did a little bit of ad hoc benchmarking with the two, and the Confluent library is a little bit faster, about 10%, in most use cases that I tried. That's nothing scientific there, but just something to think about.
Like first stab, rule of thumb thing.
Yeah. One thing-
But yeah, really I've been surprised to see how popular Kafka is in the Python community.
Yeah. There's so many resources out there for it. So yeah.
What kind of things do people do with it?
Well, obviously Python is very heavily used in machine learning applications and data engineering in general, and so there's a lot of use cases there for just building streaming machine learning pipelines, feeding the data to your models. As well as for training models, you can use the same data set, and just by resetting your offset, you can re-feed that same data through over and over again as you tweak your models and things like that. So, there are people using Kafka for that.
There's actually... was it folks from Baddr. Actually, they were on the podcast with you before, and they've done some things on that, and they actually have a blog post on the Confluent blog where they're using Kafka to train their models-
... in real-time. As they're using them even, so yeah, it's used there.
But also, and this was a surprise to me since I'm used to Java for microservices, but there's a lot of people building microservices with Python. Again, most of them are using the traditional request response structure for microservices, but I've spoken at several conferences on building event driven microservices with Python and Kafka, and those talks have been very popular. In fact, each time I've given that talk, it's been pretty much standing room only in that [inaudible 00:04:26].
So yeah, and there are a lot of great discussions afterwards and people are starting to see the value of that there. So that's another popular use case apparently.
I can see that being actually quite popular. I mean, there's something very nice about Python, maybe TypeScript, kind of lighter weight languages for doing microservices.
I'm not going to get into language wars, but Java is relatively heavy.
Yeah, I can see that being popular. And then you've got the whole Kafka microservices story that we've talked about a few times.
I can see that being an overlap. So why don't we go right back to the start and get your history? How did you get into the world of Kafka and the world of Python? Did you come to one before the other, or?
I've dabbled with Python a little bit, mostly just in building Lambdas.
So I've used those for a few small projects that I've done in the past. I've been doing Java development though mostly for 20 years or more. And although a big part of that, I was actually working with the Groovy programming language, which is-
... [inaudible 00:05:30] language, which is heavily inspired by Python. It's, yeah, in fact, I saw that more clearly than once I started working with Python more lately, you could see a lot of the Python influence on the Groovy language. And I loved Groovy, but I wasn't able to find work in it for the most part. So-
... [inaudible 00:05:47] Java.
So I'd done some Python, like I said, for Lambdas, but then I started working at Confluent. I started working with Kafka actually at a company before I went to Confluent. And then I started using Kafka heavily then again, mostly with Java. But while I was there, we started realizing that a lot of Kafka users were using Python. In fact, there was a survey done at one point by somebody on the community team checking for language popularity among people in the Kafka community. And Python was the second most popular language after Java. So that's a bit outdated on that survey, but it was an important data point. And so-
... I started pushing for... we just started doing more content for Python developers, and that's how this course came about. And hopefully there'll be more coming, but I think it's important to start providing resources for people in the Python community.
For all the different languages would be helpful. Because there are so many, like you said, Kafka can be used with almost any language, even with Haskell, right?
Even with Haskell, yeah. I'm not holding my breath for the Haskell, Kafka course being commissioned, but on that day I'll be there already.
It is, I'm thinking of doing it though, right? Yeah.
So I'm going to segue with Haskell then. So one thing you get in Haskell is there is a Kafka library, and it's based, it's a wrapped C library, and you find that in a lot of languages that they take librd Kafka, which is a very fully featured C library and wrap it.
Which should people pick in your opinion, what are the trade offs there?
Well, yeah, it really depends on your use cases. Like I said, the trade-offs, one of the trade-offs is going to be with the Confluent one, which is the librd Kafka wrapper, you're going to be more in sync with the Java library, which is what most of the documentation is out there, so you can use all the same configuration values, or most of them anyways. And most of the docs that refer to the Java client will apply to your client as well. So there's that. And that also is a little bit more performant, I think, as I mentioned earlier.
But there's some drawbacks with it as well. And that is, it's not pure Python, so it is a wrapper of the C library. And so if you were trying to embed a Kafka consumer producer in a pyscript client running in a browser, you wouldn't be able to do that with a Confluent library. So there's some trade-offs there. And then also just if your background's purely Python and you don't really have any reason to try to match the Java client, then the Python client is probably going to be more comfortable to you, more familiar to you.
So you generally get the more kind of idiomatic language style that's more familiar in the native implemented client, right?
And that's nice when you're getting started.
Yeah. And it makes me think, I've got this little pet project with, do you know Ada Fruit? They make all this great hardware stuff.
Yeah, and I have a pixel display board that just runs Python.
And I tried to get the Confluent Kafka library on it, and I couldn't because it requires C. And I'm hoping that after this conversation, I'll know what I need to know to get the native one running.
Yeah, you could give that a try,
That'd be cool. But the flip side, and I have to mention this for performance, and I think this is again, my expertise, this is definitely true for Java. I assume it's true for Python that the C Library has things like producer batching, which makes a huge impact on performance when you get large.
Right. And the Python client definitely has... at least the Confluent Python client definitely has that, I'm not certain about the native client. I would assume that it does something like that, but I don't know for sure.
That's one of those things where you'd need to check the feature list before you-
Yeah. But the Confluent Kafka client client does definitely have that. It also, the consumers participate in consumer groups just like the Java of client Java consumers do. So it takes advantage of a lot of, so that's what I say, that's one of the key advantages too, is the things that you hear about from the Java world in Kafka, they're all going to be there in the Confluent Kafka client.
Have trouble that word today.
Yeah, I think when you're getting started,
[inaudible 00:10:21] a client, right?
Yeah, a client. In fact, someone's going to create a library with that name now plient, Python Kafka Library client. Yeah, I can see that happening. So you go into this in-depth in your course, but do you have any tips for someone getting started in Python with Kafka?
Well, yeah, if you want to just start really quickly and easily, you could do it with a Jupyter Notebook, and I was surprised, yeah, how well that runs. Confluent provides some great Docker compose files that you can find on their tutorial site, and they'll load up infrastructure for you. You can actually run those within a Jupyter Notebook.
[inaudible 00:11:07] did that. Yeah, you could run shell scripts in a notebook, so just run there. But you know, could do it on command line as well. But I would say I would recommend just run the dock post files that get up your infrastructure, they'll spin up your zookeeper and Kafka broker, all those things you need. And then you can start, import the Confluent Kafka library. First it's just pip install Confluent-Kafka. And then once you've got that installed, then you can just import the producer and consumer from that library, and away you go.
In fact, it's so simple to use, it's just a few lines. Like I said, you could do it... on my Github repository I've got a notebook that's just a producer and a consumer in it. That's really simple. But I recently did a half day workshop on Kafka, and it's a Java based workshop.
And the whole time I was doing it I was thinking, "Boy, I wish I was doing this with Python, it'd be so much..." Like, you know you give exercises for the students in a workshop to do, right?
And so I had to basically build a project, put it on GitHub, and leave out a few bits for them to fill in because there's no way they can do the whole thing from scratch in the time that we have in this workshop because there's so much boiler plate code to include in a Java project.
And so Python is actually a great way to learn Kafka because there's so much less you have to do before you're actually doing the important bits.
Yes. Yeah. I found recently that I've started to lean more on things like Python and Typescript for prototyping, if not the final product. Right?
And a lot of people would stick with that for the whole lifetime of the project and go for it. But I would suggest to anyone using Java that maybe having a lightweight scripting-ish language under your belt is a valuable addition to getting things like sketched out.
Yeah, exactly. Yeah, you've got the wrap all, you've got notebooks, you've got lots of different ways to do things quickly that don't take all of the extra work that you would need to do with Java.
Yeah. But then that gets us into another topic, which I'm sure Java purists will come back to every time, types and type checking and schemas and validation of type shapes. What can you teach us about that?
Well, we do have a section in the course, that covers working with schema. So as far as when you're working with Kafka, you do want to use Schema, and that's supported with the Confluent Kafka Python library. So it still works with the schema registry as well. So you can still have your schemas stored in the registry and the producers can pick them up from... or, can store them there, and then consumers can pick them up from the schema registry. So you still have those same advantages, and I recommend doing that. So that's why we wanted to make sure to include a module on that in the course. As far as types in your programming, that's still up for debate, there are some of those folks like you that types that are [inaudible 00:14:08] direction. So there is type hints you can add to Python now, so you can have some of that behavior that you're looking for. They don't actually exist when you're running it. It's not enforced it quite the same way, but they'll help you out in your IDs and things like that.
Okay. Is it like a compile time linter?
Okay. Yeah, yeah.
You can. And there's a few different ways to do that. I prefer not having them. That's why I said I really enjoyed Groovy because it was a dynamic version of Java and types were optional. But I understand the argument. I don't want to get into...
No. No, I think that you could do a whole podcast just every week-
Right. So you can, and a lot of people do use that for providing, especially on shared projects with multiple people on a team working on it. It does give more information to other developers who don't know what you were thinking when you wrote the code. But since, I guess I've worked so much on my own for projects that I know what I meant, and so it's... Also using well named variables, things like that.
Yeah. But yeah-
... just agree that schemas, schemas are a good thing.
That is definitely-
... because one of key Kafka, one of the key things with Kafka clients, the producers and consumers are decoupled. They don't know anything about each other, right? But they do need to agree on the way the data that they're working with. And so I think schemas are very critical for that.
And they work really good with a Python clients.
And from what I've used in the Python world, actually, the integration with schemas, frankly it feels nicer than the Java stuff. That's been my experience. It's just a little more lightweight. You have to hint a bit less about what your types are. And I don't find I get any more runtime errors from Schema deserialization than I do in the Java world.
Right, right. Yeah, it works really well with them there. The integration with the schema registry is not quite as thorough as it is with the Java client. It's there, but it doesn't do, you still need to give it... Like, with the Java client you don't have to mention the schemas at all really. It's automatically picked up from the message. The schema registry will pick it up and the consumers will pick it up. So it's much more seamless. You still need to pass in the schema when you're constructing the producer, if with the schema registry, some things like that. So there's a little, but it's minor, and it's only when you're first making that first connection. And I'm sure it'll improve over time as well. But yeah, right now the Java client does work a little bit more seamlessly with the schema registry than the Python...
Oh, you think? Okay. Maybe it's because I've been mostly writing Kafka streams lately, but I find it a little bit painful the amount of [inaudible 00:17:09] I have to do.
In working with it it's great, it's just in setting up the initial connection with the schema registry that's not quite as seamless.
Fair enough. That takes us to another topic, which I know absolutely nothing about. But you covered this in your course, which is there's a whole general admin client in the Python world.
Yeah. Well, there's actually one in the Java world as well. It's still there-
Okay, I've not used it.
In the Java library. Yeah, it's not... The thing with admin client is you mainly would use it if you're building a Kafka tool. Right?
Those things allows you to create topics and delete topics and get information about topics. You can also change configurations on the brokers. You can find out what brokers you have. You can even do things like acls. There's all kinds of things that you can do with the admin client that you don't normally do in your application.
But the thing that I point out in the course and the place where I think it is helpful sometimes is to ensure that certain things are there when your application starts up. So you can use... and this is something I'd recommend people to do, if there's any configurations or topics that they need that have to be there for the application to work, is you can use the admin client to check for the existence of the topics that you need. And if they don't exist, you can create them. You can also check for configurations. So if you know that your application is relying on a certain max message size, right? Maybe your-
... [inaudible 00:18:37] messages are a little bit bigger than the default, so you want to make sure that that's a set or your application is going to fail. So you can check the configurations on the broker. Yes, that it's set the way you want it to be, or change it if you need to.
So that's one of those things that can solve a lot of problems when you're moving from development, to QA, to production. Right?
Yeah. Or, if you're building a package that might be building software that might be used by other people as well. Something that you're posting, people can download it, or it's an open source project or even a commercial project that you're giving out.
That explains why I've never encountered it. I've never actually gone... I've never actually-
But when you have that need, it really is a helpful tool to have.
Yeah, yeah. Outside of work, do you use Python and Kafka much, or is it mostly a work tool for you?
Oh boy. Yeah, it's mostly just been for work.
You're a busy man,
That's the thing. Yeah, I haven't had a lot of side project things in a long time. I used to do more of that, and if I did, I think I would be finding ways to use Kafka now. Kafka I found was one of those things that once you discover it, you start finding uses for it all over the place. I mean, once I started learning about Kafka, my mind raced back to many projects I worked on that really could have used it and would've benefited from it.
And so I could even think of side projects of my own that I've done, that yeah, Kafka would've helped with that too.
Yeah, it's funny. I've worked on... I've written up one or two of them actually, it's like places where we tried to reinvent a real time data stream in Postgres or things,
And it's like... Back then it was probably the right choice, but back then Kafka didn't exist and now I do it differently.
Yeah. So we've touched on... One other thing I wanted to discuss with you. I mentioned Kafka streams and my current wrestling with it in Java.
Do I have any options for the Python world?
Yeah, there are some. There is probably the most well-known one is Faust, which I believe started with Robinhood, I think put-
Oh yeah, yeah. I've heard some of this... because they kind of dropped it and then it taken over by someone else,
[inaudible 00:21:07] And it was forked now. So there's a community supported version that's out there now that's still being maintained. And that probably is the closest match to what you would do with Kafka streams, but for Python, and it can do most of the same things. So it's pretty full featured.
It's got a new library. There's some others out there that are newer. There's one that's relatively new called Byte Wax.
Which is Python and it has a Rust component built in, but I as a user of it you wouldn't really see that from what I hear. I haven't actually tried using it myself yet, but I did talk with one of the folks on Byte Wax recently, and they... it sounds like it would be a really good option to check out if you want to do Streaming with Python. Those are the two that I know of that you can actually build into your applications like you would Kafka streams. The others, there's other services out there that you can use that that will provide Streaming as well. There's a company called Quix, which has a Streaming library, or Streaming platform that they provide. You can almost think of it as equivalent to ksqlDB where we're running on Confluent Cloud, except that this is Kafka and Python applications running on the cloud.
So you can [inaudible 00:22:29] applications in Python and has the Streaming components and stuff all built into that. And then you deploy the whole thing onto their cloud infrastructure.
Okay, you might have to send me some contacts, maybe we can get them on podcast too.
Yeah. I think it would be really interesting to have them on. Yeah, it's a pretty interesting... pretty interesting project.
Yeah. I'd like to talk to someone about the gory details of implementing your own Kafka library.
So as I said there's a few options to use for that and it's becoming, especially as Python is becoming used with Kafka more in things like Streaming pipelines for machine learning, or event-driven microservices, people are seeing the need for a Streaming library more. So you've seen lots of people asking if there's one that Confluent supports, and Confluent doesn't support one directly, but just you're seeing a lot of growth in things like Faust and now Byte Wax.
So I think we'll hear more from both of those in the future.
I've got to try and use one of these in anger, against a real project.
That's the key thing.
Yeah. Which you don't always find the time to do in this job.
In any job actually. You don't always find the time to experiment, right? Too busy with the day to day.
Yeah, that's what happens.
But you get to do this fun thing where you stop the day to today and record a training course for Python.
Yeah, that was a lot of fun.
Yeah. It was-
What's it actually doing it?
Well, first I was building the course and so building the course, I started thinking, "Oh, this will be building a conference talk." And so I started on that road and it's not, it's very different really. And so it still has a slide deck with it. And then I turn my speaker notes. Normally I have just a few bullet points on my speaker notes, and then when you do give the talk, you just kind of adlib it as you go. That doesn't work for a recording, something like this course. And so I had to turn the speaker notes into a full fledged script. Fortunately there was a coworker, Dave Shook, he's really good with this kind of stuff. And he turned my notes into a document for me, which then turns into a script, and then I was able to work through that, and I had to rehearse it a lot more than I would normally do for a conference presentation.
I can believe that. If you're expecting to hit specific sentences.
Exactly. Yeah, so I did a lot more rehearsing of it. And so that was unique, or a new thing for me. I'll generally practice a conference presentation, but not to this extent. And then the recording was really fun. There's a team that works out of Oakland. We went up to their little studio. We went-
[inaudible 00:25:11] their studio, which had its own adventures because their studio is in a little... it's in a converted factory that's turned into a suite of boutique shops of different kinds. Really neat kind of place. Almost like a co-working space except it's little offices, office suites. And they're not just all these offices. The one next door to the studio happens to be an auto detailing place and they made a lot of interesting noises.
So we would have to stop the recording once in a while when they were doing something noisy and the folks doing the recording, they knew how long, what they were doing over there. It's, "Oh, right now they're doing this and that. So they'll be done in a minute." So we'd pause and wait again. But they did an amazing job of filtering all that out and then some edits too and stuff like that. And they also just made it, it was the first time I've done any recording or anything like that in front of a camera reading from a teleprompter and that kind of thing.
I think they made it a lot easier, or made it very easy for me to do so. But that was a lot of fun. But I was really just happy to be able to do it because Confluent Developer has some great courses as you know, but none of them were there for the Python community, they're all focused, or they're all assuming you're going to be working with Java for the most part.
Yeah, there's nothing inherently java-y about Kafka, but we could do more to make sure people know that.
Right, and so that's what I really have to do to with that course. And I've heard some good feedback from people that have viewed the course. So hopefully it'll be helpful.
So what's the structure of it for people thinking of taking a look?
Yeah, so it starts off with a video. There's several modules, each module starts with a video lecture basically, and with some code on the slides so you can see the shape of the code and the structure of the classes that are going to be used and the functions that are included. And then it goes to a hands-on exercise after each module, except for the final one. But after the other modules, each one has hands-on exercises. And that's basically like a tutorial. So there'll be some steps you can follow to do the exercise yourself and get hands-on practice with using the library.
Right. And it covers how to build a producer-
... and a consumer-
... and deal with the admin client.
Producer, the consumer, the admin client, and also using the producer with the schemas, which is a little bit separate thing. So the producer by itself is simpler to use, but then can, if you using it with schemas, there's a few more steps to include. So there's a separate module just on that.
So if you want to get started with Kafka and Python, Dave is your man.
Yeah. Yeah, check out that course, it's a great way to get started.
It's a good addition to the Python Kafka world.
Yeah. And hopefully there'll be more coming down the road somewhere, somehow. But yeah.
Hope so. Some point this year-
[inaudible 00:28:02] Haskell course.
That Haskell course. Yes, absolutely. I have this theory, if I just say Haskell enough in this podcast I might get a 10th of a percent of the audience testing that.
That might work.
Yes. In the meantime, I should probably leave you to get back to your, it seems like you're always at another Python community meetup up somewhere.
Yeah, there's a lot of them out there. That's another thing like I've enjoyed about Python as I got more involved in it, is it's got a very vibrant community, especially if you're getting started. There are so many YouTube channels and Twitter accounts out there that people are just eager to help you get started and learn. And then if you go to an event that's just a very welcoming, warm community, very much like the Kafka community. So it's a great blend.
That's good to hear. Yeah, and it's sort of become one of the de facto, like your first language, languages.
As well as being something you can take all the way into production. A lot of people get started with it 'cause it's friendly, right?
Right. Yes. But a lot of career opportunities in Python right now too. So it's a great, great... If you just want to get into software development in general. I think Python's a great choice.
Yeah, I've had a lot of fun with it. And of course it has that lovely indentation based syntax like Haskell.
Yeah. Okay, good. So I didn't know that it happened to have that, that was the one thing that took a little getting used to for me, but now it's really not a big deal once you're used to it actually makes a lot of sense. But I do want to mention one other thing. You actually used Python for one of your coding in motion videos, right?
I did, yes. Thank you for the plug. Yeah, that was Python against the YouTube API to send Telegram messages.
Exactly. Yeah, that was a really good one.
Yeah. Thank you.
We'll link to that in the show notes.
That was a really good...
Doing my job for me, thank you. Yeah, but the more we do Kafka with other languages, the better, the more diverse and thriving our community can be, right?
Exactly. Yeah. And it already is that way. I think it just helps us to get to provide more content for them, and I think it's going to be great.
Yeah. Well, thank you very much for recording the course, Dave, and thanks coming to talk about it.
Yeah, thanks for having me on. I really appreciate it.
Cheers. Catch you soon.
Thank you, Dave. And hopefully some of you will get to meet Dave at a Kafka summit soon, or a Python conference near you or something like that, something on the computing scene. Dave really is one of the nicest, warmest people you'll ever meet in computing. So if you do spot him, go up and say hi and tell him Streaming Audio says hi too.
As we said in that podcast, if you want to get stuck in with Event Streaming in Python, you'll find a link in the show notes, or you can just head to developer.Confluent.io and click on the courses link and you'll find Dave's course and lots of other great courses there for your edification.
And since Dave gave me the excuse to mention it, I will say that Confluent developer will also take you to a few episodes of Coding in Motion, where I've been hacking together some interesting Streaming projects in type script, bit of Python, bit of KSQL, and hopefully one day I'll sneak a Haskell episode in there. I'm not sure I'll get away with it, but maybe I'll try shortly before I get fired. Before I take that risk, it just remains for me to thank Dave Klein for joining us and you for listening. I've been your host, Chris Jenkins, and I'll catch you next time.
Can you use Apache Kafka® and Python together? What’s the current state of Python support? And what are the best options to get started? In this episode, Dave Klein joins Kris to talk about all things Kafka and Python: the libraries, the tools, and the pros & cons. He also talks about the new course he just launched to support Python programmers entering the event-streaming world.
Dave has been an active member of the Kafka community for many years and noticed that there were a lot of Kafka resources for Java but few for Python. So he decided to create a course to help people get started using Python and Kafka together.
Historically, Java has had the most documentation, and people have often missed how good the Python support is for Kafka users. Python and Kafka are an ideal fit for machine learning applications and data engineering in general. Yet there are a lot of use cases for building, streaming, and machine learning pipelines. In fact, someone conducted a survey to find out what languages were most popular in the Kafka community and Python came in second after Java. That’s how Dave got the idea to create a course for newbies.
In this course, Dave combines video lectures with code-heavy exercises to give developers a taste of what the code looks like, how to structure it, a preview of the shape of the code, and the structure of the classes and the functions so you can get hands-on practice using the library. He also covers building a producer and a consumer and using the admin client. And, of course, there is a module that covers working with the schemas supported by the Kafka library.
Dave explains that Python opens up a world of opportunity and is ripe for expansion. So if you are ready to dive in, head over to developer.confluent.io to learn more about Dave’s course.
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.Email Us