What is multi-tenancy and how do you make it happen in Kafka? Well, when I want to know things like this, I often find that asking Kafka committer, Anna Povzner is a good plan, so today I did just that. She skillfully walks us through it all on today's episode of Streaming Audio, a podcast about Kafka, confluent, and the cloud. Hello, and welcome back to another episode of Streaming Audio. I am your host, Tim Berglund and I'm joined again in the virtual studio by my friend and colleague Anna Povzner. Anna is an engineer who spends most of her time working on performance optimizations for Kafka. Anna, welcome back.
Hi, Tim. Thank you for inviting me again.
You bet. Always a pleasure. I learn a lot when I have you on the show. So for completely selfish reasons, I want to make it happen as often as possible. You wanted to talk about multi-tenancy today so that that's really what we have in store. And multi-tenancy, there's there's a number of aspects to that. There's a heavy performance aspect to it, there's some security implications, some API questions, there's all kinds of things and it's a hot topic. In as much as there are hot topics related to Kafka on social media. I mean, the Kafka community is a pretty well behaved and kind place. It's not like people yell at each other on Twitter, but multi-tenancy is a thing that is being talked about right now. So I think it's a good thing to talk about. To kick us off, why would anybody want multi-tenancy? What is it and why do we want it?
So obviously, right, so like in Kafka, you can just deploy any application or many application on Kafka, it doesn't mean that you're actually multitenant. So what it means in reality is, that's when you want to be able to isolate or protect your data or your application so that you can basically safely [inaudible 00:02:08] your data sets applications. So I think I better to give some examples for that. One case, could be a [inaudible 00:02:18] cluster scenario, right? So suppose in your organization, you have one team managing are setting up one or just few Kafka clusters and then you want just other teams just to be able to use them as if they all have their own without setting up a separate actual cluster, right? And so in that case, you really want to complete separation of data and applications. But in some cases, there are other cases where maybe it's really about that you want to deploy different applications. And sometimes they might want to be sharing data. So protecting in a way data is maybe not as important but it's important to protect performance. Say you have event-driven application that you're running, say [inaudible 00:03:01] detection, and they produce, consume some data. And then you might want to run another application which uses some of the data that it produces to run some machine learning models.
So in that case, you might want to share data but separate performance. And so of course, it could be all calibrations of both, either or protecting either data or protecting your performance application. Meaning that basically, that when you use it as your cluster, you should be able to not be impacted of some other user or application, just whatever behavior can impact your own performance.
The second one sounds a lot harder to pull off and I think we're gonna get into that later on. But the isolated with respect to performance, I mean, just running in the cloud as everybody is going to or at least on virtualized infrastructure of some kind, you've always got noisy neighbor problems, and that just sounds inherently difficult. Am I onto something there?
Oh, yes. I mean, because basically, the whole point of multi-tenancy also, is that you want to be able to share the source because it's cheaper, right? Sometimes you can spike, somebody sort of used your cluster and maybe you during idle time somebody else using it. So when you basically go towards trying to save some, I guess money, right? So that you pack your applications or user on a cluster, it means they're using the same resources. And in system like data systems, right? We know that anyway Kafka, any actually key value stores, any data systems, they always workload dependent, right? Their performance is very dependent on what the workload is doing. So there are many more things of sharing resource, like how does workload impact that actual performance and so forth? So yes, it's very hard problem to solve.
What does it look like? This just out of my curiosity from an API perspective. I mean, the relevant API's don't have like a tenant ID. So what does it look like? Does it does a multi-tenant cluster look like, here is an isolated set of bootstrap servers that operationally happened to be all on the same cluster? I mean, how do I know as a developer that I am doing multi-tenancy? What do I do?
Oh, right. Yes. So in Kafka, I want to compare Kafka and maybe some few systems, right? So in Kafka, right? So there are ways to protect your data. So you can set authentication in encryption, and then you can set ACLs on topic. So in that case, you not consider yourself. It's not a tenant ID that you're setting, you're just like, "What's the data I'm accessing?" and you're setting up protection in terms of authentication encryption. But for performance, that's Kafka chose that approach, separate the way you protect data, protect performance, right? So you're basically separating that from Kafka perspective, the way you protect performance, your tenant is essentially your authenticated user, or client. So basically there are lots of ways to set it, right? You can just have the client IDs basically performance quotas, or you can set on your user. But the quotas, the way you protect performance, you're setting it from the application perspective or a set of clients that you want to bundle and say this is a tenant. And so just to give an example. The other way to do, some systems do that, and it go into that [inaudible 00:06:53] will show cluster, right?
You could set, say you have a unit which is administrative unit of a tenant where you both providing some grouping of a topic like a namespace and that namespace would also have performance set. Say okay, my set of topics, you would not be able to produce more than hundred megabytes to the topic. So that's more tied in. In some way it's much easier, I would say. Obviously, right? You start one thing and this is whole thing like one tenant, both data and both performance. But then Kafka in a way, it's a bit harder, right? Because if you read our documents, we don't say word tenant, but in reality, you logically defining it by setting your data protection. And then you say, okay for example, take your application, if it's authenticated, it's identified by user principal. And that's when you setting your quotas to protect performance on a user principal. And you can set it for one user or for user and client. For example, you can even say one application and maybe has, whatever application that's more from, I guess a high level thing that you can set, even run multiple consumer groups. You can even separate performance of consumer groups by setting multiple tenants. So in Kafka, pretty much you could say tenant from performance perspective. It's either authenticated user.
Got it. So you can group authenticated users into some sort of concept of tenancy. Potentially, you could have some set of, these 15 principles are one tenant or you could have one principle is one tenant, just depending on the scenarios you laid out laid out before, where you might have some data you want to share, you might have no data you want to share. But through authentication, you're completely protected. And one of those tenants is not able to see the other tenants data.
Yes, exactly. But you setting it separately in a way. So in a way that it's one reason, sometimes our documentation, it's not coming directly, right? There are tenants which are pretty much users or you can bundle those clients into that tenant.
Got it. So I guess before we get too far, I think we've talked about all these things, but tell me again the things that it means for a system like Kafka to support multi-tenancy. We've talked about isolating data and isolating performance, but just take us through the whole thing.
Yeah, so that's pretty much everything to do with protecting data, protecting performance. So for data, of course there are something that absolutely required and something that the good to have, right? So absolutely required, I would say is authentication, is to be able to protect data both in flight and at rest. For example in Kafka, that would be setting encryption using SSL and then using ACLs on topics. Other thing that usually are good to have, but actually, that's what Kafka doesn't have, which may be a good thing to have it's namespaces. But really just so that there is no collisions on topic names. So it is possible that one team would create a topic that some other team can create. So in that case, there is some workarounds to try to enforce naming conventions. So that would be nice. But it makes a little bit harder, I guess, to build all this tooling. So that's one thing right? For data part.
And then for performance part, obviously, there are many things. The way how they do it, systems but in general is how do you able to protect performance on a tenant, right? Can you achieve certain service level objective, meaning that my tenant get specific bandwidth, or latency. And there are many ways, right? So actually, often, there is a big range of things how good your multi-tenancy is, right? You can just say, "Oh, I want to be able to survive noisy neighbors." Meaning that maybe that one application just runs [inaudible 00:11:20] called a co cluster and causes maybe downtime, right? So that's one side. Just get some performance, or there could be all way into, okay, I have my objectives on performance. In Kafka, in applications, usually they have some expectation requirement like say on bandwidth, right? How much data I can produce, and how much I can consume? And this whole way of like what does it take for me to be able to achieve that? And so in Kafka, you have quota mechanisms that you need to set to be able to achieve this kind of performance production.
And I was hoping you bring this up, I was going to if you did. But as it turns out, I'm perfectly willing to admit this on the air. I know almost nothing about quotas in Kafka. This is just one of those things that has never, it isn't a thing I've had to do or teach or explain or anything. So I would love it for my own sake and if you're listening, I'm guessing some of you would also like to know more about this. Take it from the beginning, because it sounds like that's a key. Authentication and ACLs, we basically understand probably everybody has the basics there. And I get how you're describing that as a part of multi-tenant strategy. Naming conventions for topics is where you go for namespacing. It's easy enough to impose those if you control the clients, and you better control the clients. But quotas, quotas seem really important. Tell us about quotas?
Yes. So quotas is basically a limit, right? So in Kafka, the way you protect performance is you start a set of quotas and in Kafka, you have a few kinds of quotas, like bandwidth quotas and what we call request quotas. And so I think usually there is better understanding what bandwidth quotas means, right? Because it's more directly translates like, Oh, it's an application, you understand what bandwidth is. So for example-
Sure. In our phone plans and sometimes our home internet provider, and everybody's bumped into a bandwidth quota before [crosstalk 00:13:30].
Yeah, exactly. So those are better understood, and basically, that's also always a good approach to start with them. And in that case, I think it's what I want to say about quotas first, right? So quotas are limits. So pretty much when you say think quotas, you pretty much saying that my application, you tell the brokers or a Kafka cluster, start limiting my usage if I exceed it. So, say if I set hundred megabyte produce quota for bandwidth, then you set it up and then the broker will start throttling you if you start exceeding it. So it's something that as an application or whatever tenant define, you set it. So which means that if you're just by yourself setting it and other than is not setting it, right? You're basically saying I'm a good neighbor, you will limit my usage. But it means that if you really want that you're yourself protected, every other tenants should also set their quotas, because they should protect itself. So it's a collective action that you're setting quotas for everybody and then it means that everybody not be able to overrun and overuse resources.
Good thing is [inaudible 00:14:48] operator has to know about every application, right? Or how do I know? So in that case, you do have default quotas in Kafka that you can just set and make sure that, if somebody just deploy application without thinking what the quota should be, they will just get into default. So that's one thing I think most important that when you think of quotas, it's really about not that you telling Kafka that I must have this bandwidth, or provide me this bandwidth, it's more about that you're telling that limit me if I start using more, because I want to be a good neighbor.
Yeah. So that's one thing, right? I noticed, people usually use it because you thinking like, "Okay, what my application needs?" And you set it, whatever it needs or some peak, usually peaks, right? Because if you set a quota, you'll never get more than that. So you have to think of your peaks, and you setting it. But the other part which I feel like, it's not, I would say, easy. And also I don't see talking to customers and just absorbing how people use quotas. Usually people stop with that, with bandwidth. And it make sense sometimes, because bandwidth, most often like in Kafka clusters or many data systems, you bottleneck say on disk, right? So that's when you also can overload. But there are cases actually, right? Because in a system like say Kafka brokers, they also process your requests. So you also take some CPU but in many cases, right? If you especially have lots of [inaudible 00:16:36] on each broker, you might not even notice, right? You might have some utilization of CPU, but it's not a big deal. Maybe your workloads are not very efficient but it's still enough. But in some cases, it's still useful, right
So it is possible to actually overload on CPU. And when you're using lots of CPU, when you start getting into very high utilization on broker, your effective bandwidth for cluster can give you start decreasing. And so for that situation, you do actually want to start what we have in Kafka called the quest quotas. It's really your way to limit usage of CPU on a cluster just in case when you have very inefficient workloads. Inefficient meaning you creating lots of connections. And for example, encryption is quite CPU heavy, or you have very small requests, like brokers do lots of CPU processing. So you pretty much protecting both yourself and everybody else to the case when your workload just take so much CPU that broker becomes just mostly processing stuff without doing what we would call useful work, which is delivering your bandwidth. So I would always advise everybody to look into request quotas because this is where you also protect CPU.
So you've got bandwidth quotas, and they're really for I/O bound things, specifically disk I/O bound problems.
Yeah. And normally even, of course in Kafka [inaudible 00:18:10] to cash, in some way you think of [inaudible 00:18:15] but like, if you think of a sustained bandwidth, right? So at some point you need to start. A Linux would be flushing to disk. So the reality [inaudible 00:18:24] system disks, like a bottleneck, right? So which resource you run out first? And very often, you run out of disk bandwidth or EBS, right? Obviously I'm meaning genetic disk, right? Could be anything, yeah.
Disk with a little TM next to it. Disk like thing which may actually be a network connection, there's a disk at some point. We're relatively sure there's a disk somewhere. And the API that we see is definitely just API. So bandwidth is to protect disk and requests are to protect CPU. Are those quota options we have, any other options?
So those are main options. There is one thing which I would say, and although it's not actually still in [inaudible 00:19:12] this is something in progress, that is what we call connection rate limits, but they're not necessarily quota. Because when we talk about quota, they're per tenant which is basically user in a Kafka world. But the problem when we protect CPU, the way quotas those quotas work in Kafka, they protect by throttling requests. And still those requests quotas that takes into account like say encryption, authentication. So if [inaudible 00:19:52] disk, it's pretty much with no throttle requests to basically give space to do connections. But at some point, throttling requests is not enough, right? Because you can just have, if you have misconfigured a client, you can make a connection stone which bring down the brokers.
So for that reason, we're putting what we call connection rate limits which are to ensure that Kafka does not accept connections, right? Because to be able, basically to limit how much you spend on authentication or creating connections, you need to limit that you don't accept connections. So that's what coming in 2.6. Oh sorry, 2.7.
2.7. Yeah. there's a couple of KIPs and I can't remember the KIP numbers, maybe you can.
Yeah. So what I'm talking about is KIP-612. Although I would say you totally half of KIP there, so what's coming in 2.7 is protecting broker or I guess, a listener because you can have multiple listeners. So that's pretty much a protection of the broker, which is also useful. It's still mean that your tenant,, say you have some clients which causing storms, that can take over that whole whatever [inaudible 00:21:09] the broker has. So we're putting limits also on per IP. And that would come I guess, I would say in the next release. I mean, question is why is it not per user? It's because at the point when you accept connections, you don't know what user it is, right? Because you need to authenticate to know who that is. So that's why it's like it's a little bit harder, not harder but it's a different problem that you're solving.
Great. And so everybody knows, it's October 28th, when we're recording this. And it'll probably release a few weeks from now. But 2.7 is supposed to drop about a week from now. And of course, tests run and the PMC has to vote everything. And there's a process around all this, so anything could happen. Maybe by the time this podcast is released, it still won't be there. But KIP-612 is supposed to be a part of 2.7. And I happen to have this on the front of my mind right now because 25 hours before right now, I was next to a little stream near my house recording the release video for 2.7 and talking about KIP-612. So this is all fresh in my mind.
Oh. Actually, these one more KIP. Because it's important to be different, right? So it's KIP-599? I don't know if you talked about it, hopefully you did, right? So this is slightly different because everything I talked about, all this connections and protecting CPU and bandwidth, it's more request path, right? But there is another part, KIP-99. It allows you to set limits or quotas, also per user, but basically per tenant on what we call partition mutation. What that means is actually just, basically it's creating, deleting partitions. Meaning that if you create a topic, it'll create partitions or you delete a topic, deleting partitions or you're like expanding the topic like adding partitions. So all that operations on creating or deleting partition, that's what we call mutation.
It's not usually like you create topics. It's not that you do it every second or every time. But it's usually useful. At least we've seen cases with customers. If you start, somehow you create a very large topic or set of topics, meaning it will be thousands of partitions. And before KIP-99, you can actually make controller very busy. And controller is something that you don't want to be too busy because it does take care of [inaudible 00:23:51] and everything, right? So we seen cases where if somebody tries to create a very large number of partitions, it basically just bring the whole cluster awake. Not down, but super small. And especially if you end up getting into unreplicated partitions, right? To be able to recover, your controlling it to be able to send all the [inaudible 00:24:12] request. It's busy to creating topics, obviously, it is becoming a [inaudible 00:24:18].
So it's very hard to control. So KIP-99 really protects from the case when somebody just creates a large topic and then the whole thing is just going pretty bad [inaudible 00:24:29].
Got it. Yeah. So those are above and beyond just extra protections for brokers. And that that is at the broker level. I just summarized these yesterday, but there's a way to do it per source IP, and these are broker level configs, right? Is there any-
Oh, yes. So per connection rate. Yeah, it's only pretty much broker level listener because you can set different listeners, listen level or per IP. And so that's only relates to connection rate because at the point when you want to be able to reject connections or not accept them or delay it, it is before you know who the user is. But for partition quotas, those are actually still user. So the same tenant definition as we use for request [inaudible 00:25:21] quotas.
Got you. So, I mean, you've been working on this. Talk about what's been hard because it seems performance isolation seems really hard when applications are sharing the same physical resources. So how do you measure that? What are the actual SLOs that you...
Yeah. So basically, I would say there are two different answers, right? Because my work is in Confluent cloud and we slightly, I guess, had [inaudible 00:26:05] in Kafka, and we would be moving some of the work with the to Kotlin platform. And really the hard part actually, right? We consider first when you have a [inaudible 00:26:18] tenant case, without any quotas or multi-tenancy. When you deploy your cluster, you have to have some capacity planning, right? You always answer question. Whatever applications I'm running, will the cluster be able to provide me the bandwidth that I need. But now imagine that you have multi-tenancy, rights? when you set quotas, Kafka allows you to set, it doesn't actually check what the capacity is.
So basically Apache Kafka, what we had is, does not have a notion in brokers, what's my capacity? Am I getting overloaded? So the hard part about these when you just use quotas, it's a bit quite some iterative approach, you need to be taken and [inaudible 00:27:12] to understand, do I have enough capacity so that all the noisy neighbors or tenants who set those quotas do not actually have a capacity. And so I would say normally, that's lots of, I would say manual work meaning that you try [inaudible 00:27:33] is not enough capacity? You need to expand or make sure that your cluster is big enough. And so that's a lot of work we are doing. And around Kotlin cloud where obviously, we don't know, who is going to use your cluster, can they just use more or is it enough? So it's lots of work around putting notion in that mechanisms or indicators inside broker so they know when they exceed capacity or they are approaching, what they can do.
And then what we called autotuning, because we recognize that it's true that in Kafka, you have to set all this quotas and those are a configuration manual, right? And something changes, you again, manually have to go and change them. So the work we're doing is, how do we have all this quotas but without you setting and config. So for example, for request quota, right? You can say, "Okay. Give me some..." We do have that, gave this talk in Kafka summit, right? Basically the whole talk, if you want performance objectives, how do you set quotas? But again, you would not do that it would require some iteration, your own capacity planning. And so for some of these quotas, request quotas, it's not even clear when you reach where the capacity is because, again as I said, with bandwidth, right? It's all dependent of how those resources interact, right?
So if you use too much CPU, your bandwidth of the whole cluster can go down. So you need to be able to, for brokers at least as a minimum, to recognize that. And if you're say, now when you have the case, when all the tenants try to use more resource that has, at least we can dynamically lower the quotas and make sure that we don't overload the cluster, and then use it. Because you can have metrics saying, and we do have metrics actually in Apache, right? Also that are we throttling or not when quotas get mad or exceeded, they start throttling. And so those metrics when we know that it gets throttled, it indicates that we need to expand. So a lot of this work comes to understanding in brokers, to be able to dynamically detect when there's not enough capacity, change quotas and having us auto scaling, auto expansion going on. So that's pretty much the direction at least to be able to, how do we set that up. But also, in cases when you cannot afford, you cannot run like people behind the scenes and just evaluating your capacity. So that's hard problem that we are solving.
Because it has to be automated or at least automated to some degree.
Yes. And I feel like it's could be at the point useful also. Even not on the cloud because that is anything that makes it easier to operate is always better. But making it easier specifically for quotas is not easy, right? Because you need to be able to do good algorithms, right? For example, to be able to detect capacity, we have indicators on latency, how much does broker get overloaded, and other syndicators good or not, right? How well you do, it also defines what's your SLA and also, what's your... Basically performance management is really what you're trying to trade off, right? Because you can always over provision, and you don't need to worry about things.
But in reality, you don't want to over produce it, you want to run as close to the capacity as possible. And that's when it becomes harder because now need to be careful. Because you get the risk of even losing your SLA, basically, you get the risk of downtime if you over overuse it. And so that's the problem that we want to be able to not over reserve and still protect the cluster from just all the tenants suddenly using more resources than we actually have in physical. And then potentially autoscaling me that we automatically add those resources when they need it.
Got it. I mean, if you have super sophisticated auto scaling, which is we don't have yet, right? But if you have that, then you could get away without over provisioning. But it strikes me that no matter what, and like you said, you don't just want to over provision. I mean, if it were that simple, You'd be like, "Okay, let's provision for x the amount of resources we need, and we'll probably be okay." But you're paying for that compute and storage costs money. And so, like any engineer, you're trying to optimist, get the most work done with the least valuable resources possible. And so you don't want to over provision, but you have to when you're trying to isolate performance, right? I mean, is there any amount of magical autoscaling that could ever get the job done fast enough? Do you always have to over provision a little?
I don't know if you have to, but it's nice to have some headroom. But you never know how much, right? Especially, I mean, it's more relevant to cloud case if you want to really, for example, in our case, when people use Cloud especially its usage based billing, right? So you can just have your cluster sitting idle. So you as a tenant, right? You can admit lots of tenants so they can spike at the same time. So the thing is really not about that... Yes, that auto scaling takes time. So it's not that instant. I would say we cannot really go around the case when if you don't have enough physical resource, it's not that, right? It's pretty much there's no magic, you don't have any source, there's nothing you can do.
But what we're doing instead, right? So normal case without, say what we are doing. If you spike, you start overloading. Basically, you can actually get to [inaudible 00:33:51] partitions or you can basically bring the whole cluster down with too much load, right? So our approach is that you need to protect that load. So basically, to protect the total combined usage is at the level that brokers can support and then you expand. So there is some case when you actually might not get all the resources from the cluster but it's more about that you don't kill the cluster.
Right. It doesn't go down.
Yeah, exactly. So it's temporary maybe down. Maybe performance slightly decrease for you temporary, but it doesn't completely kill the whole thing. But it's true. I mean, I don't think anybody can actually solve the problem. If you don't have capacity, you don't have it. And the other way, we considering and maybe that's people also bring it up that it's possible also to implemented priorities, right? Whenever that happens when cluster suddenly, you don't have enough resource. Maybe for some applications you really need to deliver it, but for some you don't. And little bit of prioritization which we actually don't have in Kafka and Kotlin cloud yet but this is something that is one approaches, how do you priorities up to some degree, right? So we can add that and that at least would ensure that if you don't have enough resource, who is getting it? Because right now everybody's equal, they would all equally get with a bit less. But you might not want to be equal.
Right. You might want, say requests to win over bandwidth or bandwidth to win over requests, or one tenant to win over another, or something like that.
Yeah, exactly. Basically I would say, because I did work, this as a performance management and there are guarantees on performance, something that I've worked for at least 15 years. And for example, I did actually write my PhD topic was about [inaudible 00:35:57] on disk performance and storage, right? So basically it's always, if you want any [inaudible 00:36:04] some guarantees, it is always about keeping some resources reservation. So you cannot basically have both, you either over provision and [inaudible 00:36:13] worst case and have it or in, obviously real, more production systems, often, you just want to be able to deal with the case. When you don't have that capacity, how to quickly get there and how to make sure that whatever impact is minimal and more failure among all the tenants.
Right. So in other words, it costs something to hold the resource but it also costs something to get the resource which is the transaction cost problem rearing its head again which, everything seems to be about transaction costs. So that's-
Yeah, exactly. Because I would say, real time system, real time meaning not real time actually using Kafka, but actual real time like satellites and stuff.
Hard real time [crosstalk 00:37:06].
So where you get it, it's pretty much you have what we call a condition control, which meaning that you can't even admit application if there is not that resource. But obviously like in Kafka, do we really want to do that? Probably not. So it's more about we admit everything, but then we have a way to, what to do when we have the problem. So maybe the other way to see.
Yeah. Good analogy because I did real time stuff early in my career, I wrote firmware and they'll be some sort of memory allocation API [inaudible 00:37:40] back in the 90s. And I guess a lot of it is still but you don't have malloc, right? There's no heap. That's just not a thing. You have pre allocated buffer pools. Here is a chunk of stuff, here's our resources and we'll divvy it out to a piece at a time but it is... And I guess heaps are in a sense, pre allocated as well. But there's just an extra intentionality about reserving those resources so that they can be delivered at a predictable transaction cost. And you don't have to go spin up instances because that's the transaction costs we're talking about here where that's clearly, well on a request basis that's [inaudible 00:38:16].
Yeah, exactly. That's why I like to say it's true when they say, Oh, it's hard to guarantee performance, or [inaudible 00:38:25] provides certain extra laws. In reality, we're not really providing, we guarantee, it's always in the bandwidth or it's always sort of latency. It's always statistical thing. So 99 percentile is pretty good if you can get there, 93 percentile is much easier. So whatever higher on percentile is harder to get it. And at some point, you do have to start [inaudible 00:38:44]. So that's pretty much the game you're playing.
Right. Well again, because there are transaction costs in the physical universe and we are bound to them. What happens when a client is bad? Let's not say bad, let's not put a value judgement to it but let's say what happens when a client has needs that we're not able to meet right now and violates a quota?
Yeah. So that's what we, basically it gets throttled. So if it violates bandwidth and request quotas, it gets throttled on requests. And what that means, and actually this is like Kafka, we on second iteration over it. So Kafka, before 2.0, when a broker detects that say, you use all your bandwidth and to bring you back to quota, it would calculate delay, how much you need to wait so that we can process it and bring you back to quota. And so we used to keep those requests in brokers and then delay the response. And that doesn't really work well especially when you have to set long delays, [inaudible 00:39:58] how this client know? Maybe that message is lost, right? Why is the response not coming back? And so since 2.0, what we did is broker still calculate delays, so it knows okay, you need to wait a bit longer for process [inaudible 00:40:15] request to bring it back to your bandwidth. And so it sends that delay in the response back to client and an internal Kafka client, right? It knows how to deal with it, starting from 2.0. What they do is just delaying sending the next request on the same channel. And of course we can still have old clients and in that case, broker also mute the channel from that client. So basically not going to read any more requests until the delay passes.
So that's pretty much the thing. The throttling actually happens on the client side so that you don't keep [inaudible 00:40:55] because clients need to have some idea, what's going on. And the of course, we also have metrics. So you can always observe, right? So we have clients, and both client and broker metrics so you can see if throttling happening.
And see if you're encroaching on quotas, and then-
Yes. Because often you might want to, I think normally if you actually do, I think it's useful to observe the [inaudible 00:41:19] because maybe it means that maybe your application now wants to increase usage, maybe adoption increases of your application, right? So often, you might want to then expand cluster, increase your quota. So it's useful to know that now, you reached what you thought you need, I think.
Seems like yet another way in which we see clients carry a lot of weight in Kafka. I mean, we try so hard to keep brokers simple and we're mostly winning that fight. And there's a lot of stuff gets pushed into clients. It's, I think, a sign of good architecture. But I'm constantly exposed to some new thing I didn't know clients do.
Yes. Because you do in case you do need some way to communicate to client why, right? Because you can just lose the request. And the problem was before [crosstalk 00:42:17] and then your cluster already struggling because you're using a lot then suddenly you get all this new requests for no reason. So that was actually often the opposite effect. So in those cases, it's really useful when it's on the client side, or at least for for everybody else using the brokers.
My guest today has been Dr. Anna Povzner. Anna, thanks for being a part of streaming audio.
Thank you Tim.
Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST. That's 6-0-P-D-C-A-S-T to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31 2021 and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit and there are a limited number of codes available so don't miss out. Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in our community slack. There's a Slack signup link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever find podcasts are sold. And if you subscribe to Apple podcasts, be sure to leave us a review there. that helps other people discover us which we think is a good thing. So thanks for your support and we'll see you next time.
Multi-tenancy has been quite the topic within the Apache Kafka® community. Anna Povzner, an engineer on the Confluent team, spends most of her time working on multi-tenancy in Kafka in Confluent Cloud.
Anna kicks off the conversation with Tim Berglund (Senior Director of Developer Experience, Confluent) by explaining what multi-tenancy is, why it is worthy to be desired, and advantages over single-tenant architecture. By putting more applications and use cases on the same Kafka cluster instead of having a separate Kafka cluster for each individual application and use case, multi-tenancy helps minimize the costs of physical machines and also maintenance.
She then switches gears to discuss quotas in Kafka. Quotas are essentially limits—you must set quotas for every tenant (or set up defaults) in Kafka. Anna says it’s always best to start with bandwidth quotas because they’re better understood.
Stick around until the end as Anna gives us a sneak peek on what’s ahead for multi-tenant Kafka, including KIP-612, the addition of the connection rate quota, which will help protect brokers.
If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.Email Us