Many Kafka workloads, whether consisting of financial information, healthcare records, or personally identifiable details, have demanding data privacy and integrity requirements. These could be in accordance with corporate policies, industry standards, national and international regulations, or a combination of the above. In order to be confident that your data is protected from eavesdroppers throughout its journey, both in transit and at rest, you'll need to use encryption. (The only way that you might be able to get away with not using encryption is if your Kafka system fully resides in a secure and isolated network, and you don't have to answer to any authorities or auditors.)
Encryption uses mathematical techniques to scramble data so that it is unreadable by those who don’t have the right key, and it also protects the data’s integrity so that you can determine if it was tampered with during its journey.
The simplest encryption setup consists of encrypted traffic between clients and the cluster, which is important if clients access the cluster through an unsecured network such as the public internet:
The next thing to consider is encrypting traffic between brokers, and between brokers and ZooKeeper. Even private networks can be breached, so you want to be sure the traffic on your private network is resistant to eavesdroppers or anyone who wishes to tamper with it while it is in motion.
Another thing to consider is your data at rest, which will be extensive as Kafka makes data durable by writing to disk. So you need to think about encrypting your static data, to protect it from anyone who gains unauthorized access to your filesystem.
Finally, there are other ways users could gain unauthorized access to your data, including data residing in memory that could appear in a heap dump, as well as data in logs.
Next, we will cover the three encryption strategies, in turn, beginning with data in transit, the only type for which Kafka provides direct support.
An out-of-the-box Kafka installation doesn’t use encryption, but rather sends everything in the easily intercepted plaintext. Fortunately, as we discussed in the Authentication Basics module, it’s relatively simple to implement SSL or SASL_SSL in order to TLS encrypt data in transit. For this, you'll need a self-signed certificate (primarily for internal environments) or one signed by a certificate authority (a must for production environments).
In the Authentication with SSL and SASL_SSL module, we demonstrated how, in addition to brokers providing certificates to clients, you can also require that clients provide certificates to brokers. This is accomplished by enabling the SSL security protocol and setting ssl.client.auth=required in the broker config, and it is sometimes referred to as mutual TLS or mTLS. Conversely, if all you want to do is encrypt and you don’t want to check client certificates (which will reduce the scope of your certificate management duties), you can set set ssl.client.auth=none.
TLS uses private-key certificate pairs, and each broker needs its own. Each client does too—if client authentication is enabled. Note that if you want to enable TLS for inter-broker communication, add security.inter.broker.protocol=SSL to your broker properties file.
You should keep in mind, that enabling TLS can have a performance impact on your system, because of the CPU overhead needed to encrypt and decrypt data.
Apache Kafka doesn't provide support for encrypting data at rest, so you'll have to use the whole disk or volume encryption that is part of your infrastructure. Public cloud providers generally provide this, for example, AWS EBS volumes can be encrypted with keys from AWS Key Management Service. For on-premises solutions, you might consider platforms like Vormetric or Gemalto (Thales).
By this point in the course, you’ve likely set up some certificates, encrypted your data at rest, and set strict filesystem permissions. However, you may wish to go even further and encrypt your data from start to finish, so that it will show as encrypted even in places like heap dumps and logs. For this, you'll need end-to-end encryption, which in the context of Kafka, uses a key management system that encrypts/decrypts when serialization/deserialization happens. End-to-end encryption provides the greatest amount of security since brokers never see the unencrypted contents of messages:
In addition to end-to-end encryption, you should add a key rotation policy, since clients will come and go and changes will be made to your system. This ensures that in any security breach the number of compromised messages will be limited to those from the time since your keys were rotated.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.