January 13, 2022 | Episode 194

From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine

Transcript
Notes

Tim Berglund:

Danica Fine is a developer advocate with Confluent. She's also got some real-world experience recently building data pipelines in the wild. So I just wanted to talk to her about that process, kind of lessons learned, what are the tools she used, what was easy, what was hard, how did it all unfold? And how does this relate to the data pipelines course we have on Confluent Developer.

Tim Berglund:

Speaking of Confluent Developer, Streaming Audio is brought to you by Confluent Developer. You may already know that. That's developer.confluent.io, it's a website with everything you need to get started learning Kafka, Confluent Cloud, all kinds of resources. When you do any of the examples or labs or anything like that on Confluent Developer, you'll probably sign up in Confluent Cloud. When you do, use the code PODCAST100 to get an extra $100 of free usage credit. Now let's get to the conversation with Danica.

Tim Berglund:

Hello, and welcome to another episode of Streaming Audio. I'm your host, Tim Berglund and I am super happy to be joined in the virtual studio today by Danica Fine. Danica is a developer advocate. She's the newest member of the developer advocate team here at Confluent. Danica, welcome to the show.

Danica Fine:

Hi, it's great to be here, Tim. Thank you.

Tim Berglund:

Now we're going to talk about data pipelines today. Because data pipelines are a thing that you've done or a thing that you've talked about at Kafka Summit. And I want to get into that. I want to obviously plug the Data Pipelines course on Confluent Developer, I mean, everybody knows we're going to do that. Nobody would be surprised at the end when we say that you should watch the Confluent Developer Data Pipelines course. But before we do that, Danica, tell us about yourself. I mean, you are giving talks out in the world and doing things and being visible. But if there's somebody who doesn't know you, tell us about you, how'd you get to be here?

Danica Fine:

Yeah. So prior to this, I spent about three years working on a streaming data infrastructure team. And there we were the first group at our company to really implement a streaming pipeline. We were moving from these monolithic applications that were sort of micro-batch processing. And we wanted to see if we could actually make it real time. So obviously I had a lot of experience there, playing around with Kafka Streams, seeing how that would actually fit into a legacy architecture. And yeah, it was a great experience seeing what works, what doesn't in the context of a large company like that. So it was a lot of fun and would totally go back and do it again, knowing what I know now, especially.

Tim Berglund:

Right. You get to build a second one.

Danica Fine:

Yeah.

Tim Berglund:

I guess that's good now that your profession is to advocate on behalf of people who build things with these technologies and build, among other things, streaming pipelines. I guess it's good that you'd want to do it again.

Danica Fine:

Yeah.

Tim Berglund:

Yeah. With even the same tools. That's a good sign.

Danica Fine:

Absolutely.

Tim Berglund:

So, I want to back up a step. I think it's been a while since we've had a purely kind of pipelines episode on the show. There's a lot of pieces that we cover different aspects of frequently that you could assemble into good pipeline knowledge. If you listen to the whole catalog and remember everything on every episode, but that's tough to do. So why data pipelines? I mean, you can speak in generally, you don't have to speak of what you were doing at your previous employer and what the business motivations were there or anything. But just in general, remind us of the reasons for people that people want to do this.

Danica Fine:

Yeah, absolutely. So, prior to moving to an event driven data pipeline, we were leveraging a micro batching sort of architecture, and there are many benefits from moving away from that. Namely, you obviously want to react to data in real time to get the best and the most up to date results. And then also moving to something like Kafka, you get the higher resilience, increased scalability, that was a major benefit for our pipeline. And then of course decoupling the source and target systems. And that's just small number of the benefits of moving toward this and everything. To our stakeholders, that was a no brainer, we had to do it.

Tim Berglund:

Yeah. How micro is micro in micro batch? What kind of latency difference did you see, if you remember? It's just very specific question.

Danica Fine:

It's been a while since I looked at the numbers.

Tim Berglund:

Right, yeah.

Danica Fine:

So I mean, in the micro batching, we were on a millisecond latency. It was a really, really super fast application that we had going and there wasn't really a problem with the speed. We weren't concerned about that. We actually weren't going to get too much out of moving to event-driven architecture in that regard. But what we did get was the scalability and the resilience, right? So in our previous architecture and the micro batching architecture, we only had one application running, one instance of that really running, and then some hot backups. But here we were able to move over to a more distributed and scalable architecture.

Tim Berglund:

Yeah, yeah. You get all the fault tolerance characteristics of Kafka cluster and Kafka consumers and all that kind of stuff.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

So that makes sense. Cool. So walk us through the whole process. Like I said, I mean abstract away the details, because you're talking about something you did at some other company, but just in general, what are the steps that you, if you had this to do again, what are the steps you go through?

Danica Fine:

Yeah. So, and as I walk through this, for the viewers, you'll see that this actually aligns really, really well with the Data Pipelines course. So you should probably walkthrough-

Tim Berglund:

Oh, hello Data Pipelines course on Confluent Developer.

Danica Fine:

Shameless plug.

Tim Berglund:

This message was brought to you by Confluent Developer. It's developer.confluent.io, ladies and gentlemen. Okay, sorry. Go on.

Danica Fine:

Yeah. So yeah, for our company, the legacy architecture was not going anywhere. So we had to implement the data pipeline within that context, obviously. So the first step was to be able to integrate with the legacy architecture. So we wrote a couple applications that allowed our input data sources to be fed into Kafka. That was step one. Step two, we wrote a way to pipe that data from Kafka back into the legacy architecture as well. And then after that, yes?

Tim Berglund:

I don't want to drill down into that, not so fast. But that's the connections on the pipe? So the pipe is not doing anything interesting yet, but it connects to the source and you can put data back into the legacy system.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

And I've made the point a lot recently in some talks, it's mid-November 2021 when we're recording this, and I gave a bunch of talks last month where I was making the point that streaming event driven technologies are a new thing and you don't burn down the old thing and build a new event driven thing.

Danica Fine:

Yeah.

Tim Berglund:

You supplement, supplant. So that's exactly what you're saying, which is awesome. And the getting data out of the legacy system, were you able to modify the legacy system to produce to Kafka? So you have a direct connection of the events that you own in that legacy system or was it change it to capture or some other thing? What's the connector on that pipe?

Danica Fine:

Yeah. So we weren't actually modifying the legacy system at all. You can kind of view this as almost like a-

Tim Berglund:

I was dreaming that you were, it just sounded awesome. Oh no, it's fine. We'll throw some Kafka in there and produce events. We'll do what we want.

Danica Fine:

Yeah.

Tim Berglund:

And then, but yeah, it's not. Yeah. Okay.

Danica Fine:

Well, so the reason we weren't able to is that these are, this legacy system has been around for decades and it wasn't going anywhere. And the-

Tim Berglund:

It probably compares favorably in age to me.

Danica Fine:

So instead of altering the legacy system, I mean, instead of, in effect, we wrote a sort of Kafka connector. So there were mechanisms in place already, provided by the legacy architecture, to make a data connection and pull that data whenever you wanted. So we effectively wrote our own connector to that and allowed that to just stream the data straight into Kafka for use later on.

Tim Berglund:

Nice, okay. It's not an actual connector, proper, like using connect.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

Cool.

Danica Fine:

Not using connect.

Tim Berglund:

That's not a super mind bending API, is it?

Danica Fine:

No.

Tim Berglund:

Couple of methods you're sort of good.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

All right. So, sorry, interrupting. Which is how Streaming Audio works. I ask questions. So you got that connection on that pipe and basically same thing on the other end, here's a hook somewhere in the legacy system that allows you to put data back into it.

Danica Fine:

Yes.

Tim Berglund:

So that's nice actually. So being able to modify it is... You know the meme with the growing brains, you have the really tiny brain and the meme at the end where it's like this godlike figure. That's that. You don't normally get that, but it's a mature enough legacy system that it wanted to cooperate.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

Yeah. All right. So, integration step one. And I stopped you right before step two. So tell me about step two.

Danica Fine:

Great. Step two is actually building the meat of the pipeline, the algorithm itself. So in our case, as we were proving that we can integrate our architecture with Kafka and really embrace adventure in architecture, we chose the most difficult application to move over. And it was on purpose, in hindsight, obviously on purpose to choose the most difficult algorithm. And so we really had a fun time building out a Kafka Streams application that was able to take the monolithic legacy application and break it down into stream processing architecture. Very fun, again, in hindsight, very, very difficult in practice while you're in the middle of it.

Danica Fine:

But yeah, so we allowed our teams to start playing around in Kafka Streams, our own team, implementing that algorithm. And since we had those connectors that I told you about, moving data from legacy into Kafka, and then Kafka back out, other groups in the organization were suddenly allowed to play around with data in Kafka and see where it could work for them. So we went through this initial testing phase, build out this Kafka Streams application that proved that, not only could we get the data into and out of Kafka, we could also very successfully implement this what had previously been a sort of black box algorithm and move it into Kafka Streams. And then we, in that process, increased the scalability and resilience of the application and achieved this, roughly the same latency that we were getting in the monolith. So, that was a major win.

Tim Berglund:

I like it. Sometimes the host has a question and then he loses his train of thought. We've got the Kafka Stream... Okay, you said, I knew it would come back, you chose the hardest part of the system to do this. Do you remember, this is always interesting to me, the decision process? Was it sort of ad hoc and like, "I don't know, let's do this. This sounds good," which is a lot of how engineering proceeds. Or was there, did you have options on the board and you picked one and you had some rubric? Like how systematic was that for choosing what to attack first?

Danica Fine:

So I mentioned that we were the streaming data infrastructure team, but we were housed under a specific organization that owned a handful of different applications, different algorithms. This was the one that was closest to our team. We were directly reporting to someone whose group owned this algorithm and this application. So we probably could have chosen easier ones, we could have fought back. But yeah, it made sense because we are working with teams that own this algorithm, that we would just partner with them and make it work.

Tim Berglund:

Right. And there's a tension, because it... That's a wise choice in that the team that you are in effect serving gets a lot of value out of this.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

They picked it. It's what they want. Therefore, the result you can assume to the stakeholder is going to be valuable and visible.

Danica Fine:

Yes.

Tim Berglund:

And you want that. And so I'm asking about this because "Where do I start," is a common question for when you're just doing any kind of refactoring of a legacy system or chiseling away of the monolith and replacing it with something new. And what piece do you take first? And I think the best guidance is you want to pick something that's visible, but also easy and they're usually opposites.

Danica Fine:

Yeah, unfortunately, yeah. Anytime someone asks me that, what to start with, I answer in the same way. You want to pick something that is visible enough that it will be impressive and valuable to someone who sees it later on so that a stakeholder can buy into it, because unfortunately, as engineers, what the technology you want to play with and you want to implement and integrate isn't always, time wise, the cheapest or most efficient to the stakeholders. So you got to make it worth their time as well, right?

Tim Berglund:

You absolutely do. And it's a matter of negotiating competing agendas, because our agenda is hopefully to make the system technically better, more flexible, more performant, easier to staff for, because it's not all 15 year old technology or whatever, or longer, or older. There are all these engineering utilities that it's right for us to want to maximize. And they don't always obviously map onto value for the business.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

So, investment decisions in the business are going to get made on the basis of value to the business. That's just how it goes. And so we have this task of trying to connect what our agenda is to the right, and frankly sort of legally mandated responsibilities at some point that people spending money have a fiduciary responsibility to spend it in a way that benefits the company. Can't just do it because this is how we want to build the system. So doing that mapping is just important and that topic always comes up a lot. Sounds like you guys, I mean you, it was hard, but that's the thing I'd rather give on. If the only option is something that's super visible but difficult, sorry. You're going to do the hard thing.

Danica Fine:

Yep.

Tim Berglund:

So, okay. So you built that, you proved it out, and that showed... Gives you momentum for, again, this agenda you have is you want to migrate this thing to an event driven architecture. Maybe the business wants that, maybe the business doesn't, but showing, hey, this is successful in a way that brings value to the business, gives you some credibility for moving forward.

Danica Fine:

Yeah. We definitely, improving that aspect of the processing pipeline, we bought time, right?

Tim Berglund:

Right.

Danica Fine:

To then move on to the next stages and really realize our vision.

Tim Berglund:

Yeah. And what was next?

Danica Fine:

So as I said, we moved data into and out of Kafka successfully, we migrated the algorithm or enough of the algorithm that we could prove that it worked. And the next step then was to bring in the rest of the data that we needed to make the pipeline complete and that involved some configuration data. And we achieved that using Kafka Connect. So we spent a lot of time figuring out the right connectors to use and then figuring out exactly what data we needed because it was all hidden away in some legacy databases. So eventually set alarm-

Tim Berglund:

Oh that's [inaudible 00:17:05].

Danica Fine:

Yeah, weird. And it wasn't all just packaged neatly with a bow on it. Bizarre.

Tim Berglund:

High quality and yeah, right. Okay, well I guess things didn't work well at that organization compared to every other organization in the world.

Danica Fine:

Yeah.

Tim Berglund:

No, data integration is terrible and it's always terrible and that's just how life is.

Danica Fine:

Mm-hmm (affirmative). But thankfully, even though, it is kind of terrible, we leverage the JDBC source connector.

Tim Berglund:

Okay, I was going to ask.

Danica Fine:

And [inaudible 00:17:35] it was not as terrible as I thought it was going to be.

Tim Berglund:

Good, okay.

Danica Fine:

Shameless plug, you can watch my Kafka Summit talk on this, where I go into a lot more detail on how that connector worked. And maybe some of the issues that we encountered.

Tim Berglund:

That is a good idea. And it is linked in the show notes, obviously, because it's very relevant to this topic. And that was sort of an informal part of Danica's pre-interview process, I can say. And it's not, employers I don't want to make you wary of letting people speak at Kafka Summit. "Oh no, they're going to get hired away." It doesn't happen that often. It was just a real good talk. And I like that, like, hey, we should talk to this lady. So anyway, and I mean, that's not exactly how the sequence of events went. It was a good talk and you should watch it.

Danica Fine:

Thanks.

Tim Berglund:

So connect, use connect, JDBC source connector, and not any fancy CDC stuff. Was there a-

Danica Fine:

No.

Tim Berglund:

How come?

Danica Fine:

Yeah. So the database that we were using was actually an in house brand of database. Yeah. So CDC wasn't really an option. And then also the way that the database was set up, I'm not saying the data model wasn't, it wasn't perfect. So it really didn't lend itself well to traditional change data capture. And we actually, the way that we set up the query for the JDBC source connector, and it was, we were using query mode. We weren't just pulling data, all of it from a bunch of tables. We really had to massage the data within the query to get what we wanted.

Tim Berglund:

You needed a query that was doing the joining and the other thing that needed to happen because otherwise you'd hate life. Okay. That makes sense.

Danica Fine:

Yes. Yes. So it actually ended up being kind of a complicated query in the end, but yeah. And then we were able to get what we wanted or at least as close to what we wanted or needed. Over time, we revisited that and sort of made some small changes to the database, added some columns that would make that query a lot cleaner. But it took a little bit of time.

Tim Berglund:

Yeah. Okay. Sounds like I felt that pain. What was the last step? What did you do after, say this is, you had connectors, first pass of stream processing, next level of, we need other data sources to join and that's connect and JDBC source. What was after that?

Danica Fine:

Yeah. So finally, with our pipeline actually in our minds complete, the algorithms there, all the data we needed to process was there. We then wanted to make sure it's good, that the data actually looks like we expected to. Because, as a reminder, we took one of the most difficult applications that we had available to us, it is a very visible application and people who are consuming this data are making decisions based off of it. So it needed to be correct, right? So while we were building out this whole pipeline, the legacy application continued to write its data, produce its data. And so we pulled that legacy data into Kafka using our initial connector. And so now we had two streams of data, the legacy data and also what we were producing with our Kafka Streams pipeline. And so now we are able to leverage ksqlDB to join those streams and conduct some validation to see how different they were within a bucket of time, a couple second or so.

Tim Berglund:

That's pretty fancy.

Danica Fine:

Yeah. It was really, really-

Tim Berglund:

I mean, that's like a nice validation scenario. You could create knowledge and have [inaudible 00:21:36] confidence and all these things that you always want.

Danica Fine:

No, it was absolutely wonderful. Actually that probably took the most time out of the entire thing, because we really wanted to make sure that our ksqlDB queries were doing what we needed them to do. And then we used the results to prove to the stakeholders that this is what we wanted. And then we were able to, based on those results, go back to the algorithm and tweak it, because where we maybe missed something in one of the processors we were able to correct that.

Tim Berglund:

Absolutely.

Danica Fine:

And yeah, really prove that it worked.

Tim Berglund:

How cool is that? I love it.

Danica Fine:

Yeah.

Tim Berglund:

Lessons learned? So looking back, you've kind of hinted at some of those actually going along, but summarizing, what are the things that you took away from this and if you had to do it over again, or if it was your job to help people understand how to do this kind of thing, what would you want them to know?

Danica Fine:

Yeah. So the first thing I think is just in terms of Kafka Streaming and just streaming in general is you have to think in a certain way to really be successful at it. And what we really learned as we were moving that algorithm from, it's just sort of black box monolith and really breaking it down into processing steps, whether you're doing it in Kafka Streams, or even in ksqlDB, you really need to think about exactly what you're doing with the data in each step to be most efficient. And so we were in Kafka Streams, that's breaking it into separate processing stages. And then in ksqlDB, that would be multiple queries, you'd implement those.

Tim Berglund:

Right.

Danica Fine:

So you really needed to take a step back and think about how to break up your algorithm to maybe most efficient.

Tim Berglund:

Got you.

Danica Fine:

Yeah.

Tim Berglund:

Any connect lessons?

Danica Fine:

Yeah. I mean, I love connect. I pretty much, I own that component for the most of that project. And even though it's a little bit tricky at first, it's invaluable, especially with a legacy architecture, you need to find a way to plug into that. And really that's what connect allows you to do. As you're playing around with data pipelines as we were, being able to bring in legacy data and connect with the rest of the legacy architecture is just, it's so important.

Tim Berglund:

Yeah. And not have to write all that code yourself.

Danica Fine:

Yes. No, we learned that, [crosstalk 00:24:11].

Tim Berglund:

Oh, go ahead.

Danica Fine:

Oh, we built sort of our own connectors in the beginning, right? To plug in with that legacy architecture. So we've done it both ways. And I will tell you that leveraging Kafka Connect was definitely the easier component.

Tim Berglund:

Admiral Ackbar is right, it's a trap to write your own connector.

Danica Fine:

Mm-hmm (affirmative).

Tim Berglund:

And that's saying something, because like you said, it's finicky. There's a lot of little configuration dials and you have to get them right. It doesn't just work. But it's a more complex problem than it seems like.

Danica Fine:

Yes.

Tim Berglund:

Because as developers, we look at it and we're like, "Oh, come on. I'm just reading from this database and then putting it in a topic. I can do that in an afternoon." But there are corner cases and it's good to have that done for you. How about, last question. This is a successful streaming pipeline, you know how to do this, you know how to explain to people how to do it, but did you get a sense of boundaries? Like would there be things that you wouldn't want to do this? And I mean, I always think it's important, especially in our line of work when we're trying to say, "Hey, this is a good idea. Let me show you how simple this is. It's understandable. You can do it. It's the future." That's sort of what we do. I think credibility demands we also say, "But you know, it's not everything." So what did you see there?

Danica Fine:

Yeah. I mean, especially, we were definitely riding that high afterwards. Everything works, streaming is incredible, everyone should use it. And a lot of people in the company were coming to us saying, "Okay, your team implemented this. Obviously it's a very important algorithm. How do we use Kafka Streams now?" So we had a lot of groups sitting down with us asking, "All right, how do we implement this in terms of event driven pipeline?" And it's really tough. It's really tough to say no. Some people approached with an architecture diagram that would involve, "Oh, well, we're going to process it in Kafka Streams and then put it back into database and then read it back out and whatever." I'm like, "Maybe this isn't the right way to do it." When you have a hammer, everything looks like a nail, but sometimes it's just not a nail and you should move on.

Tim Berglund:

And that answer is contextual too. Because like that speaks of architectural constraints outside of the pipeline where maybe the problem is perfectly pipeline shaped, but there are constraints elsewhere in the business that just make that ugly and not worth it.

Danica Fine:

Yes. Yeah. And it's definitely difficult in somewhere with so much legacy architecture. And it was difficult because a lot of these teams were looking to the future and yes, I do believe that an event driven architecture is the way to go when everything lines, right?

Tim Berglund:

Why would we be on this podcast if we didn't believe that? Definitely true. Yes. But, and the part that you did build that way, certainly successful project. Like you said, 10 out of 10 would do again.

Danica Fine:

Mm-hmm (affirmative). Absolutely. And we did have a lot of groups that were able to implement something similar because we spent all this time proving that Kafka could work, that Kafka Streams would work, and all these technologies could successfully plug in to the architecture. So yeah, we had a lot of successes follow after that. I'm not saying we were an inspiration to the company, but we were an inspiration to the company.

Tim Berglund:

My guest today has been Danica Fine. Danica, thanks for being a part of Streaming Audio.

Danica Fine:

Thank you so much, Tim. I appreciate being here.

Tim Berglund:

And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer, that's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture design patterns, executable tutorials covering ksqlDB, Kafka streams, and core Kafka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code, PODCAST100 to get an extra a hundred dollars of free Confluent Cloud usage.

Tim Berglund:

Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at TL Berglund on Twitter. That's T-L B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening or reach out in our community Slack or forum. Both are linked in the show notes. And while you're at it, please subscribe to our YouTube channel, and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcast, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support, and we'll see you next time.

Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.

By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system.

In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Kafka connectors to connect to various sources and target systems.

Kafka connectors: Building two major connectors for the data pipeline, including a source connector to connect the legacy data source to stream data into Kafka, and another target connector to pipe data from Kafka back into the legacy architecture.
Algorithm: Implementing Kafka Streams applications to migrate data from a monolithic architecture to a stream processing architecture.
Data join: Leveraging Kafka Connect and the JDBC source connector to bring in all data streams to complete the pipeline.
Streams join: Using ksqlDB to join streams—the legacy data system continues to produce streams while the Kafka data pipeline is another stream of data.

As a final tip, Danica suggests breaking algorithms into process steps. She also describes how her experience relates to the data pipelines course on Confluent Developer and encourages anyone who is interested in learning more to check it out.

EPISODE LINKS

Continue Listening

Episode 195January 20, 2022 | 30 min

Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra

Maximizing cloud Apache Kafka performance isn’t just about running data processes on cloud instances. There is a lot of engineering work required to set and maintain a high-performance standard for speed and availability. Alok Nikhil (Senior Software Engineer, Confluent) and Adithya Chandra (Staff Software Engineer II, Confluent) share about their efforts on how to optimize Kafka on Confluent Cloud and the three guiding principles that they follow whether you are self-managing Kafka or working on a cloud-native system:

Listen Now

Episode 196January 24, 2022 | 4 min

Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs

Apache Kafka 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won’t want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics.

Listen Now

Episode 197January 27, 2022 | 31 min

Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela

In an effort to make Apache Kafka cloud native, Anna Povzener (Principal Engineer, Confluent) and Anastasia Vela (Software Engineer I, Confluent) have been working to expand multi-tenancy to cloud-native systems with automated capacity planning and scaling in Confluent Cloud. They explain how cloud-native data systems are different from legacy databases and share the technical requirements needed to create multi-tenancy for managed Kafka as a service.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog