August 11, 2022 | Episode 228

Apache Kafka Security Best Practices

Transcript
Notes

Kris Jenkins: (00:00)

Security, it's an aspect of Apache Kafka that's evolved a lot over the years. In the very early days, you just ran Kafka locally and security was entirely your problem to think about, but that was a long time ago. Things have moved on. These days there's out of the box, there are some pluggable options for that. There are things to do with authentication and authorization, encryption, and even things like quality of service guarantees, where you can specify how much bandwidth each consumer can use.

Kris Jenkins: (00:32)

So it is a very important and large topic we thought it was time to bring in an expert to get us back up to speed on the current state of Kafka security. And along the way, I get a little of the backstory about how it got to its current state. So the expert we have with us is Rajini Sivaram. I'd like to tell you that she wrote the book on Kafka security, but I don't think it's a whole book quite yet.

Kris Jenkins: (00:56)

So I can tell you she wrote the security section of Kafka the definitive guide, and she's going to give us the tour. Before we get started, this podcast is brought to you by Confluent Developer, which is our education site for Kafka. More about that at the end, but for now I'm your host, Kris Jenkins. This is streaming audio. Let's get into it.

Kris Jenkins: (01:22)

Joining me today is Rajini Sivaram. Hi, how are you doing?

Rajini Sivaram: (01:25)

I'm fine. Thank you, Kris. How are you?

Kris Jenkins: (01:27)

I'm very well. I'm looking forward to this. You're going to teach me some more up-to-date things about security.

Rajini Sivaram: (01:32)

I hope so.

Kris Jenkins: (01:35)

So let me get this straight. So you are a principal engineer at Confluent, and you've been working on things like security features for Kafka and geo-replication for, I think about seven years. Is that right?

Rajini Sivaram: (01:51)

Yeah. I started working on security for Apache Kafka around seven years ago. And recently of late, I've been working on geo-replication for Confluent platform and Confluent Cloud.

Kris Jenkins: (02:01)

Two very tasty subjects. So we can get into security in a moment, but you've also, let me get your other credentials on the table. You're also the co-author of Kafka the definitive guide, right?

Rajini Sivaram: (02:14)

That's right. I contributed to the second edition of the book, which was out last year.

Kris Jenkins: (02:20)

What's it like because they're like four co-authors? What's it like coordinating four people writing a book?

Rajini Sivaram: (02:27)

So it was a very interesting experience for me because I've never written a book before. And the first version of the book was already quite popular. So writing up to that level was quite an interesting experience. And also I started off by writing the chapter on Kafka security because at the time the first edition was written, there was no security in Kafka. So this was the very first time we were introducing security for Kafka into a book. So that was quite interesting.

Kris Jenkins: (02:57)

I can see why they hold you in for that chapter then. So maybe that's where we should start the security story because, in the early days, Kafka didn't really have any security beyond SSL, right?

Rajini Sivaram: (03:11)

Yeah. So when I first started working on Kafka almost years ago, there was no security at all. So it was just plain text. And this was in 0.8 something.

Kris Jenkins: (03:22)

Which year is this? Give me the timeline.

Rajini Sivaram: (03:28)

Around 20... Well, I don't remember the exact year, but I certainly know it was seven years ago. I think it was when I started.

Kris Jenkins: (03:41)

So sometime in the early age 2010s.

Rajini Sivaram: (03:41)

Yeah. 2015 maybe. And we were thinking of putting Kafka on the cloud and providing it as a service. So without security, obviously we can't do that. So that's how I first started working on security. We had some interest in the community at that time and we would talk of adding SSL and also Kerberos into Kafka. And so my initial work was mostly testing those features by doing the reviews and from then on started, I think I haven't stopped.

Kris Jenkins: (04:09)

Yeah. Security is a job, but never stops.

Rajini Sivaram: (04:12)

Exactly.

Kris Jenkins: (04:14)

But how you actually test something like that? Was there a lot of external auditing of it or were you working with partners to try and hammer down whether this is actually secure? What's the process you go through?

Rajini Sivaram: (04:30)

So to start with, it was all the internal testing that we had to do to make sure that we were using standard protocols. So it's essentially testing the protocol that we were using that implemented company. So a lot of the work is better than we could use existing implementations from Java on the broker side and wherever we could do that on the client side as well. We have had other people tested as well externally over the years, once it was in there and it started going into production.

Kris Jenkins: (05:02)

Right. Okay. So there you are, I'm assuming that SSL is relatively straightforward to add to the communication layer. Tell me if I'm wrong.

Rajini Sivaram: (05:13)

So at the very beginning, it was a little bit more difficult because of the way Kafka was implemented. And the fact that we had assumed that everything is plain text. So refactoring it and getting the protocol in at the beginning was a little bit more work. But once the code was in there and we have support for multiple security layers, security protocols and the multiple transport layers, which is where the SSL fits in, it became a lot easier.

Rajini Sivaram: (05:40)

Now the only work that we need to do is when a new protocol comes along like TLS 1.3 came along a few years ago. And when you're integrating you'll find that there are small changes that we need to make to make sure that the new protocol, some of the assumptions that we made before may not be working in Kafka. So we have to updated slightly every time. So now it is much more incremental.

Kris Jenkins: (06:05)

Yeah. Yeah. I can imagine like classic software problem making it swappable at all is hard work, but once you've made it swappable. Yeah. I learned that lesson hard in internationalization once we went from English to English and German, and that was a colossal amount of work. But then adding each new language after that was pretty straightforward. Yeah. So the first authentication mechanism, you added was Kerberos.

Rajini Sivaram: (06:33)

That's right.

Kris Jenkins: (06:33)

Which all I know about, you're going to have to teach me something about Kerberos, because all I know it's a bit like OAuth in which you ask someone else for authentication and you go over there and say, look, I've got my magic token.

Rajini Sivaram: (06:45)

Yeah. So it is kind of similar in that way, but active directory which supports Kerberos has been around for a long time. It's used by a lot of financial institutions. So that is when security was added to Kafka it seemed like a very good fit in the sense that a lot of people wanted to use it. But to secure Kafka, would've already had active directory setups, which they could connect to if we supported Kerberos. And so we introduced SASL which is a standard framework for introducing security mechanisms into basically any system.

Rajini Sivaram: (07:21)

So we use SASL which is already supported in Java and interview GSSAPI, which supports the Kerberos protocol. And that's how the very first protocol was added. It was just support for Kerberos which was great at the time because a lot of people who were using active electric could straight away start using the system with no additional plugins that you had to add to Kafka. So if you had this external server then it became very simple.

Kris Jenkins: (07:52)

So that opens up a lot of enterprise E-type companies using it. Right?

Rajini Sivaram: (07:57)

Yeah.

Kris Jenkins: (07:57)

And was it largely driven by just the realization that you wanted to get this product into the cloud and working that way? Or was it customers saying we can't use this until you can integrate with our active directory?

Rajini Sivaram: (08:11)

So Kerberos was less about cloud and more about on-premise users who are already had active direct setups that they wanted to integrate. I think at the time that was the main driving factor for introducing Kerberos first. I think slowly we realized that Kerberos itself wasn't sufficient for a lot of users who were running Kafka and cloud, you're much more likely to have a user I password authentication, which with your roles kind of sum backend, that gives you some support for authenticating passwords. So that's how we treat the next protocol, which was SASL plain, which allows you to verify passwords.

Rajini Sivaram: (08:54)

So it is a much simpler protocol. So if you have any kind of username password database, you can integrate it with it and you specify the username password and it verifies the password. The other side could be, for example, again, with an accurate directory and you can still bind to it and verify. But the way we would do that is by extending and adding a callback handler to integrate with whatever backend that you may have.

Kris Jenkins: (09:21)

Okay. So it is gradually becoming at this stage much more pluggable like that?

Rajini Sivaram: (09:29)

Yes. So we started off originally with, if you wanted to replace something, you had to replace the entire security provider. So Java is good that way it allows you to plug in any security provider into your process, but that became increasingly difficult. Like everyone has to replace this entire SASL server in order to do something like password authentications. So we started making the interfaces much more pluggable. Today if you wanted to plug into your own password database to do the verification, all you need to do is implement a simple callback, which verifies whether the password is correct.

Kris Jenkins: (10:10)

Yeah. And what was the development process for that? Was it just all internal and bang here it's available? Or was it, were you bikeshedding series of kits? What was life like getting that feature in?

Rajini Sivaram: (10:26)

So in Apache Kafka whenever you change anything that's public, it's an interface, adding a new conflict. It goes through the Kafka improvement proposal which is the kit. So we originally had a kit for adding Kerberos but then we added another one for SASL plain, which was just adding the mechanism into Kafka with the ability to change it. But then another kit came along as we found that was insufficient. We did another kit, which was to change the whole interface and make everything much more pluggable.

Kris Jenkins: (10:59)

But what's the process like on those? Were they hotbeds of debate or was it fairly easy to reach an agreement?

Rajini Sivaram: (11:06)

So some of these do take time, especially when there are... We have to make sure that nothing breaks. So one of the most critical things, when you're writing anything to Kafka is that we preserve compatibility. So it is quite critical that we have lots of ice on it to make sure that nobody's set breaks us at the result.

Rajini Sivaram: (11:28)

So when we introduce Kerberos initially we did not add a mechanism to plug in another SASL mechanism. So there was this assumption within the code base that everything is Kerberos. So when we had to add SASL plain, we had to change it slightly so that we can detect the old clients connecting assuming everything is Kerberos versus this new one, which needs to negotiate the mechanism.

Kris Jenkins: (11:53)

Oh, God. Yeah.

Rajini Sivaram: (11:56)

So that goes through the process where the discussions in the process to making sure that nothing is breaking at this point and also the testing, obviously to make sure that the compatibility is retained.

Kris Jenkins: (12:09)

Yeah. Did that eventually get deprecated out or do still do that dance today?

Rajini Sivaram: (12:15)

We still do that. So we haven't removed pack them, in terms of authentication, we haven't removed any of the support that we had before, after we did that initial protocol change we also did more work, later on, to make it even much easier to evolve the protocol. So now it's actually the whole SASL thing that goes through a Kafka protocol. There's a Kafka message that goes through, which contains the SASL bites. So it becomes much more easier for us to aversion and check. So it took a few cubes to get there, but I think it's in a much better shape now than it was before.

Kris Jenkins: (12:54)

Yeah. Especially coming from zero. Right?

Rajini Sivaram: (12:55)

Yeah.

Kris Jenkins: (12:56)

Yeah. Cool. So is that the end of the authentication story? Or is it like you can now plug in anything you want, so we're done here?

Rajini Sivaram: (13:08)

So we did the SASL plain and then we made it pluggable. But some of the other things that we have done, we added SCRAM for example. So one of the problems we had at the time was that if you wanted to integrate authentication at Kafka if you weren't using SSL and you wanted to use SASL you either had to have a set up like an active directory, or you had to write a plugin to connect to something like your own personal database. So having something which is totally built into Kafka way, and didn't need an external server is useful for people who are starting off for security and don't have other systems.

Rajini Sivaram: (13:42)

So we introduce SASL SCRAM, which is a slightly stronger protocol than SASL plain because the password is not sent directly on the wire. So it's safer that way it's sorted and scrunched before it's sent across. So we introduced that in Kafka, but the secret passwords are stored in ZooKeeper. And as long as you protect your ZooKeeper, I think you could protect it, put it in an internal network and protect that. You could use security with Kafka without introducing a third-party system.

Kris Jenkins: (14:17)

Okay. So ZooKeeper then becomes your database of password stuff and you just-

Rajini Sivaram: (14:21)

Yeah. But we don't store plain passwords in that, that was the key for using SCRAM as opposed to [inaudible 00:14:30].

Kris Jenkins: (14:30)

Yeah. Yeah. Because if you can avoid it, you don't want to send and store unencrypted passwords. All right. In fact, I'm going to take that back. If you can avoid it, you must avoid it.

Rajini Sivaram: (14:41)

Yes.

Kris Jenkins: (14:42)

Yeah, absolutely. So that raises, I know this is always a question on people's lips. Does that raise new work now we're trying to get rid of ZooKeeper?

Rajini Sivaram: (15:00)

Yes. So this is something that we need to do because when something is stored in ZooKeeper, it was easy for even brokers to use that. So inter broker communication could use SCRAM because ZooKeeper starts at first and then the brokers come up. But in the KRaft mode, we need to make sure that it's slightly more difficult because if you want to use SCRAM into broker communication as well, how do you bootstrap? So this work that is happening now to make sure that can be done.

Kris Jenkins: (15:26)

Give me a clue on how you actually solve that problem. How do you... Because ZooKeeper it's great. It's right there and it's a distributed database, you don't have that anymore. How's that handled?

Rajini Sivaram: (15:41)

Yeah. I think there's there's a separate bootstrap process to make that happen.

Kris Jenkins: (15:47)

Okay. Okay. Maybe we should link to the KIP and move on from the technical details of that. But yeah.

Rajini Sivaram: (15:55)

So after SCRAM we also realized that OAuth was becoming more popular. So it was useful to have an implementation in Kafka which supported OAuth. So that was the final authentication protocol that we have added to Kafka.

Kris Jenkins: (16:08)

Okay. Yeah. That's probably my go-to choice for that kind of authentication task. So I'm glad. Is that why, because I know fairly recently Confluent cloud let you log in with Google and I think one other, Facebook as well, wasn't it? Is your authentication provider? You can sign up with those now? Is that just piggybacking off the OAuth work you've done?

Rajini Sivaram: (16:29)

Yeah. So once you have a token then it becomes easy to authenticate using OAuth. So that was because it's part of Apache Kafka, it's available in Confluent as well.

Kris Jenkins: (16:43)

Yeah, that's a question we should keep an eye on how much of this is going straight into Kafka. How much of this is just in the open source? Apache Kafka?

Rajini Sivaram: (16:50)

So initially when we introduced OAuth it was added as a pluggable mechanism into Kafka with an insecure fourth implementation. So if you wanted to use OAuth you would go and write your own plugin, which acquired the tokens. So because there are so many libraries out there, we didn't want to pick one and say, this is how you acquire tokens. Instead, we'd provided the framework.

Rajini Sivaram: (17:13)

But the downside of that was anybody who wanted to use OAuth had to go and implement this cable of a plugin before they could even get started. So the adoption was slow and people who were already using it maybe and really required it were using it, but it didn't get adopted at this speed that it would have if it'd worked straight of the box. So a new kit recently has added support for secure OAuth into Apache Kafka. So that is an open source as well.

Kris Jenkins: (17:45)

So there's a default one if you don't want to bring your own?

Rajini Sivaram: (17:47)

Yeah.

Kris Jenkins: (17:47)

Yeah, yeah. Another common software development pattern, make it work at all and then make it user-friendly. Right?

Rajini Sivaram: (17:55)

Yeah.

Kris Jenkins: (17:58)

Yeah. So does that bring us up to date on the first leg of security, which is authentication?

Rajini Sivaram: (18:05)

Yeah. I think we've come a long way since we started. And now I think people who want to start integrating authentication into Kafka, there are various ways to do it, but there are various ways that work up in the box as well. So yes, I think so.

Kris Jenkins: (18:23)

Okay. So let's pause for a second. You're the security expert where we've covered, that's authentication. We need authorization next. Give me the difference. Define authentication and an authorization.

Rajini Sivaram: (18:37)

So the authentication is the process of defining who you are. So it's essentially checking your digital signature and making sure that you ask who you say you are. It kind of works both ways. So it is, when a client is connecting to a broker, the client's identity is validated by the broker. So that is a client authentication and equally the client before it sends its sensitive data to the broker needs to verify that it's sending to the actual broker, not some [inaudible 00:19:06]. So that is solvable.

Kris Jenkins: (19:06)

Okay. I didn't realize it's two-way. That's cool.

Rajini Sivaram: (19:08)

It's two ways. So that whole process is essentially answering the question, who are you? So authorization on the other hand is about what you can do. So it's having established an identity and make sure that you are who you say you are. Authorization determines what you're allowed to do. So it's basically authentication is who and author session is what?

Kris Jenkins: (19:34)

Yeah. It's fairly easy to identify if someone is or isn't Mick Jagger, but that doesn't mean they can come into your bedroom. I'm stretching that metaphor. Yes. Let's move on from that metaphor. So authorization, presumably at the start, there was no authorization mechanism either.

Rajini Sivaram: (19:55)

No.

Kris Jenkins: (19:56)

Anyone who connects to do anything?

Rajini Sivaram: (19:58)

Yeah. Without authentication, everybody is plain text. But you can't [inaudible 00:20:04]. But once authentication came along, you could identify when... During authentication, we establish an identity it's called Kafka principle. And this identity is associated with the connection throughout its lifetime. And once you've got your identity, whenever a request comes in, I want to read this topic.

Rajini Sivaram: (20:22)

You can plug in and authorize. And the authorizer verifies that this identity is allowed to read that topic, basically what action is it allowed to perform. So Kafka is a plugable authorizer, which has very fine [inaudible 00:20:37] and you can specify basically what operations on what resources, each identity, each principle is set allowed to perform or not allowed to perform.

Kris Jenkins: (20:46)

How fine grain was that in the first version?

Rajini Sivaram: (20:50)

So to start with it's very fine grain in the sense that the only way you could define was either you had to specify the full topic name. So if you are accessing food, you would say this user is allowed to access food. And we had one sort of wild cup thing. You could say, all our [inaudible 00:21:08] users are allowed to access food. Or you could say this user is allowed to access anything.

Rajini Sivaram: (21:13)

So those were the only two modes that we had. Either sort of blanket access to a principle or to a resource, or you had specific access to a particular resource. That doesn't work in, if you are a very large organization and you have thousands of topics and thousands of users, then very quickly, this could become very difficult to manage. If you had a million ACLs it's not very easy to keep track of them, you likely to make mistakes. So later on-

Kris Jenkins: (21:43)

Sorry at that stage, was it also like, could you specify, read and write separately?

Rajini Sivaram: (21:49)

Yeah. We could specify. The operations haven't changed that much. I think right from the beginning, we had a separate operations for read write access conflicts and so on.

Kris Jenkins: (21:55)

Okay.

Rajini Sivaram: (21:55)

And read conflicts so out conflicts and so on. But one of the things that we added later to help larger organizations manage their ACLs, I was at prefixed ACLs. So if you follow best practices and use separate prefix for different departments for example, then it becomes much easier to say that this user is allowed to access topics with this prefix. So you have a much smaller number of ACLs and that becomes much more easier to manage.

Kris Jenkins: (22:28)

Oh, okay. So you're just naming each sheet role with like finance underscore Dave.

Rajini Sivaram: (22:35)

Exactly.

Kris Jenkins: (22:38)

Okay. Yeah.

Rajini Sivaram: (22:38)

And this is available out in the box, but again, just like authentication, everything is customizable in Kafka. So one of the things you can do is to integrate with your own, for example, an LDAP server and then you will be also be able to do group based authentication or role based authentication, which makes it even much easier to manage in terms of your [inaudible 00:22:59] if you have an organization which already has these users in LDAP and you're using LDAP based authentication. Then you could also use the groups and posts that are defined in LDAP making it much easier for you to manage.

Kris Jenkins: (23:12)

Yeah. Yeah. But that sounds like, again, a thing where if you're an enterprise, you probably have an LDAP server.

Rajini Sivaram: (23:18)

Yes.

Kris Jenkins: (23:19)

And the rest of us maybe don't. So what happened to those people?

Rajini Sivaram: (23:22)

Yeah. And that's part of the reason why the prefix circles are important. I think this is totally out of the box. You don't need to add anything. It's all contained within Kafka and makes it a lot easier if you don't want to manage external servers with this information.

Kris Jenkins: (23:43)

Yeah. Okay. So we got authentication that's an authorization. Is that the whole story for authorization?

Rajini Sivaram: (23:51)

Yeah. So one of the other things that if you're doing authorization is also auditing. So at the moment breach into Apache Kafka is, every time when you're doing authorization, we also log a message which says somebody was allowed to do something or somebody was not allowed to do something. And this is a lock for your locks, which you can use the stack and get it into something like Kibana and you can also look out for a mobile pattern or increase in denied access for example. So that's another way that you can keep ahead and monitor and see that if there are any attempts to access data that you don't expect.

Kris Jenkins: (24:36)

Okay. Yeah. I'm surprised, it goes into a log for Jfile. How come it doesn't just go onto another topic?

Rajini Sivaram: (24:42)

So there are organizations that do that, Confluent, for example, does have an audit feature that sends in directly to a topic. But there is a cost to it as well. So for example, if you re-authorize every produced request, so produced request may have a hundred records, and 400 different partitions. We are essentially authorizing each of them. So it's a lot of volume of data that comes out of authorization. So you need to-

Kris Jenkins: (25:12)

So you can, but it's not the default?

Rajini Sivaram: (25:13)

Yeah.

Kris Jenkins: (25:14)

That makes sense. Okay. Yeah. Okay. So we've got authentication, and authorization. I think the next leg of this is probably encryption, right?

Rajini Sivaram: (25:24)

Yes.

Kris Jenkins: (25:25)

Tell me what's available encryption-wise.

Rajini Sivaram: (25:29)

So the very first thing that everyone thinks of what encryption is, is that you're sending data all the wire and maybe the public internet. You want to make sure that data and transit are encrypted. So TLS support that we added gives you that. So if you enable TLS, then you've got encryption for everything that's going over the wire. Another thing almost everyone does is encrypt the task so that data express is encrypted.

Rajini Sivaram: (25:53)

So together, basically, all the data is encrypted. But if you're running on the cloud, even that may not be sufficient, you want to make sure that your cloud provider doesn't see your data. You may have very sensitive data that you don't want anybody else to see. And that's where end-to-end encryption helps. So that's where the encrypted data using some key and the consumer who has access to the same key can see the data, but not the else can so not on the broker.

Kris Jenkins: (26:21)

Okay. So are you encrypting each record batch?

Rajini Sivaram: (26:26)

Yeah.

Kris Jenkins: (26:26)

So it's at the level of the packet you send over to the producer that gets encrypted?

Rajini Sivaram: (26:33)

Yeah.

Kris Jenkins: (26:33)

Okay. What does it take to actually implement that? Because I would've thought you could just do that encryption yourself with no support from Kafka at all. If you want to.

Rajini Sivaram: (26:44)

Yeah. But you can add an interceptor that does the encryption. The only thing you need to worry about when you're doing encryption is in performance of it and how it interacts with the compression. Does it reduce how much you can compress? So it's mostly about performance, but it can be done without changes to Kafka. And I think a lot of people do.

Kris Jenkins: (27:07)

But you baked it in as a change?

Rajini Sivaram: (27:11)

No, it's not baked in as a change.

Kris Jenkins: (27:13)

Okay. So it's just something you bring your own encryption?

Rajini Sivaram: (27:15)

Yeah.

Kris Jenkins: (27:16)

Yeah. Okay, cool. Do you think it will ever be sort of built-in because if you're using multiple clients that do seem like an overhead?

Rajini Sivaram: (27:29)

Yeah. I think over time, maybe we would have features that enable this much more easily than you bring today. Because today you have to write the code to make it happen. So yes.

Kris Jenkins: (27:41)

Okay. Someone's going to open a cap and flag you to review it at some point in the future. So there's your future work stack. Okay. Well, you made the encryption part sound rather easy.

Rajini Sivaram: (27:55)

It is easy if you didn't want encryption because everything is built in and you use tools for your description and you use TLS for the data in transit. It's when you start doing enter encryption it is more work. Because it is a lot of testing to make sure that you are trying to do that. And also you might not want to encrypt the old message you may be encrypting fields in certain fields of your message, in which case you need to integrate with something like a Schema Registry to track what field needs to be encrypted. So there's a lot of work in that area that is done, but not as part of the cook Kafka book in Apache Kafka.

Kris Jenkins: (28:36)

Okay. What's the scheme of registry support? How does that work?

Rajini Sivaram: (28:40)

If you could track what is sensitive and keep moving the encryption into a tool.

Kris Jenkins: (28:50)

Yeah. But is that something that's there today or you're just talking, you add some metadata to the Avro record that scheme of registry happens to store.

Rajini Sivaram: (29:00)

Yeah. I don't know whether we have any support from it today. Because it's, again, it's not part of the Apache Kafka stack, so.

Kris Jenkins: (29:09)

Okay. We are doing podcast speculation-driven development. I'll coin that term and I'll go on the circuit promoting it. Okay. So you mentioned auditing, but I guess that brings us to the fourth leg of this, which is things like quality of service, and denial of service attacks. There's another security topic.

Rajini Sivaram: (29:35)

Yeah. So again, when we started off, we didn't have any way to mitigate denial of service. There was no concept for quotas, but over the years we have added various different types of quotas to make sure that we can distribute the load fairly, and also prevent the amount of load on brokers. So one of them is to have to be the amount of bandwidth that produces the rate at which you can produce or consume. And you can set up these quotas based on either users or you can just set up default.

Rajini Sivaram: (30:09)

Which basically controls how much bandwidth can each user can use. We also have CPU level request quotas, which can determine how much of the threat time you are talking. So you can't just keep on sending requests which are not produced and consumed, but maybe just get metadata. They all cost. So we have all these quotas and there are connection-level quotas. They all kind of are there to help with reducing the amount of load and preventing the number of services.

Kris Jenkins: (30:40)

Okay. But how expressive is that? Can you say, I don't know, you can only send a hundred megabytes a minute, or can you actually say you can have as much as you want, but if someone else gets busy then you are going to be limited to a hundred megabytes a minute. Can you bursty? That's what I'm asking. Can you be bursty if things are quiet?

Rajini Sivaram: (31:00)

At the moment no. But again, we have made this also customer. So you can write your own plugin switch to do specific things. I think one of the areas that we did make it bursty was that we are controlling the rate at which topics and partitions are created, that we do allow it to be bursty because that's likely to be bursty traffic.

Rajini Sivaram: (31:19)

But the overall bandwidth we expect that it's kind of stays, at least the default implementation assumes that you are setting it at the rate and we don't allow. But we are monitoring over periods of time. And if it exceeds over whatever the window that you have set, then you get through essentially you may enter data for a period of time.

Kris Jenkins: (31:46)

Yeah. So let's step back a second. So you see this with two hats on. You see this as a program committee member for Apache Kafka. You also see it as a Confluent employee working on the cloud side. Do the two support each other? Do you see different usage patterns? What's the tension informing each one?

Rajini Sivaram: (32:11)

So when you're running a cloud service, like in Confluent cloud, we have a multi-tenant cluster. So the defaults aren't always sufficient because you are also trying to protect tenants from each other. So we have extended it because the whole of the quota in Apache Kafka is extensible and customizable. We have our own implementation in the cloud, which allows us to handle more relative to the multi-tenancy.

Kris Jenkins: (32:45)

Okay. Yeah. But are you finding like... I think one of the big differences, if you're working on a cloud team, you have this deployed and you are actively seeing the problem, that's right. So does the reality of sticking up in the cloud inform stuff that ends up eventually back in the open source model?

Rajini Sivaram: (33:09)

Yes. I think one of the advantages of running a cloud service is you get feedback very, very quickly on what is working, what is not working, and what could be improved. And I think a lot of that does feedback into Apache Kafka as well, as we learned over the years. We have contributed several features to Apache Kafka as a result of our experience in running the cloud service.

Kris Jenkins: (33:33)

Yeah. Because it can be hard when you're doing... I've done much smaller scale open source work and I found, that I end up prioritizing the features that affect me day to day because you've got harder visibility on other people's lives. I've often wondered if doing a cloud service actually brings other people's problems to your doorstep in a useful way.

Rajini Sivaram: (33:56)

Yeah. Yeah. I think so.

Kris Jenkins: (33:59)

And maybe leading into that... Is a lot of the feature development driven by people banging on your door, asking for something? Or is it you see the gaps and these are your priorities? Do you think people need that?

Rajini Sivaram: (34:20)

Yeah. I think it's a combination of the two. A lot of the time we are getting feedback from customers on what are the features that would be useful. But we are also seeing both for on-front customers and for cloud customers some of the areas where we think we can help and this applies to security as well as most of the other features that we work on. But the feedback from customers is absolutely critical and that's, I think the product tends to take them and talk to a lot of customers to prioritize all the work that we need to do.

Kris Jenkins: (34:54)

Yeah. Customer-driven development is probably better than podcast speculation-driven development if I'm honest. There go those consulting fees. So maybe we should bring this to a close by saying if I've just got a vanilla on-prem installation of Apache Kafka, what would be your top tips for security? What should I be worrying about first?

Rajini Sivaram: (35:21)

I think one of the things, when you start planning your employment think about, is if they are incorporating security right from the start. So that includes it using secure protocols, setting up some best practices to make sure that it's just not enough to configure security, you need to make sure that if you're using passwords, you're using smart passwords, and so on. So having all those best practices, keeping that in mind that understanding your attack services and making sure that you're locked in as much as you possibly can, don't expose will keep for instance put it in its own network. And upgrade service as quickly as you can because we are fixing issues.

Rajini Sivaram: (36:04)

Whenever we find issues, we are fixing them. The newer protocols like TLS 1.3 are far stronger than the older ones like TLS 1.1, which where vulnerabilities have been found. So I think using the latest versions with all the bug fixes and integrating security very early on in your project, would be very useful than when you come later on and come to testing rather than leaving it to the end and then you find you have little time to explore what the problems may be.

Kris Jenkins: (36:40)

Yeah. Yeah. It's one of those things you want to worry about as you approach production, but maybe you should probably focus on it sooner. Yeah. Otherwise, you have a seven-year journey back to fitting security systems into something that was originally just plain text. Well, this has been a very interesting tour of what's available for security. Rajini, thank you very much for joining us.

Rajini Sivaram: (37:07)

Thank you for having me.

Kris Jenkins: (37:09)

Cheers. We'll catch you again.

Kris Jenkins: (37:10)

And there we go. The current state of security for Kafka. I wouldn't be at all surprised if in the next few years someone writes a pluggable traffic shaper that lets you specify the custom quality of service things and that'll be terrifically useful around black Friday. I think. Those kinds of peak burst times. Maybe that's a tip that you could submit for me. That'd be great. Thank you very much. In the meantime, if you want more concrete details on how you implement some of the security options we've talked about, head to developer.confluent.io because we've recently launched a step-by-step course for configuring Kafka security.

Kris Jenkins: (37:50)

That's presented by our very own Dan Weston. There's a link in the show notes and I'm sure you'll find it useful. Alternatively, if you'd like Confluent to take care of most of your security configuration issues, not all, but a lot of them head to Confluent Cloud and you can spin up a Kafka instance, in the cloud and we will manage as much of it for you as we can. If you add the code PODCAST100 to your account, you'll get $100 of extra free credit to run with.

Kris Jenkins: (38:19)

And of course, if you've enjoyed this episode now is a great time to click like and subscribe, and notify and rate and review and the comment box and all the buttons that the world gives us. Or you can find my Twitter handle in the show notes. If you want to get in touch, it's always great to hear from people. And with that, it just remains for me to thank Rajini Sivaram for joining us and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.

Security is a primary consideration for any system design, and Apache Kafka® is no exception. Out of the box, Kafka has relatively little security enabled. Rajini Sivaram (Principal Engineer, Confluent, and co-author of “Kafka: The Definitive Guide” ) discusses how Kafka has gone from a system that included no security to providing an extensible and flexible platform for any business to build a secure messaging system. She shares considerations, important best practices, and features Kafka provides to help you design a secure modern data streaming system.

In order to build a secure Kafka installation, you need to securely authenticate your users. Whether you are using Kerberos (SASL/GSSAPI), SASL/PLAIN, SCRAM, or OAUTH. Verifying your users can authenticate, and non-users can’t, is a primary requirement for any connected system.

But authentication is only one part of the security story. We also need to address other areas. Kafka added support for fine-grained access control using ACLs with a pluggable authorizer several years ago. Over time, this was extended to support prefixed ACLs to make ACLs more manageable in large organizations. Now on its second generation authorizer, Kafka is easily extendable to support other forms of authorization, like integrating with a corporate LDAP server to provide group or role-based access control.

Even if you’ve set up your system to use secure authentication and each user is authorized using a series of ACLs if the data is viewable by anyone listening, how secure is your system? That’s where encryption comes in. Using TLS Kafka can encrypt your data-in-transit.

Security has gone from a nice-to-have to being a requirement of any modern-day system. Kafka has followed a similar path from zero security to having a flexible and extensible system that helps companies of any size pick the right security path for them.

Be sure to also check out the newest Apache Kafka Security course on Confluent Developer for an in-depth explanation along with other recommendations.

EPISODE LINKS

Continue Listening

Episode 229August 18, 2022 | 48 min

Real-Time Event Distribution with Data Mesh

Inheriting software in the banking sector can be challenging. Perhaps the only thing harder is inheriting software built by a committee of banks. How do you keep it running, while improving it, refactoring it, and planning a bigger future for it? In this episode, Jean-Francois Garet (Technical Architect, Symphony) shares his experience at Symphony as he helps it evolve from an inherited, monolithic, single-tenant architecture to an event mesh for seamless event-streaming microservices. He talks about the journey they’ve taken so far, and the foundations they’ve laid for a modern data mesh.

Listen Now

Episode 230August 25, 2022 | 34 min

Streaming Real-Time Sporting Analytics for World Table Tennis

Reimagining a data architecture to provide real-time data flow for sporting events can be complicated, especially for organizations with as much data as World Table Tennis (WTT). Vatsan Rama (Director of IT, ITTF Group) shares why real-time data is essential in the sporting world and how his team reengineered their data system in 18 months, moving from a solely on-premises infrastructure to a cloud-native data system that uses Confluent Cloud with Apache Kafka as its central nervous system.

Listen Now

Episode 231August 30, 2022 | 61 min

Capacity Planning Your Apache Kafka Cluster

How do you plan Apache Kafka capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.

Listen Now

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

Community Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2025

Current 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

Community Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2025

Current 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns