Head of Developer Education, Developer Relations
Learn from Peter Moskovits (Confluent | Developer Education) how you can protect sensitive data at the source using Confluent’s client-side field-level encryption (CSFLE). This video walks through key concepts like envelope encryption, key access patterns, and architecture. Whether you're handling PII, financial transactions, or healthcare data, CSFLE helps you keep it secure before it ever touches Kafka.
Hello, I'm Peter Moskowitz with Confluent. Let's talk about client-side field-level encryption. Remember the Equifax data breach? Millions of people had their sensitive information exposed. Imagine if that could have been prevented. Client-side field-level encryption is a powerful tool that can help protect your data from such breaches. Let's dive in. Data privacy concerns and consumer expectations Data privacy isn't just a buzzword. It's a fundamental expectation. As consumers become more aware of the risks, they demand higher standards of data protection. Studies show that a significant number of consumers are hesitant to share personal information online due to privacy concerns. This isn't just about compliance. It's about building trust. And this is where data privacy comes in. It's about building trust. And this is where CSFLE comes in. How CSFLE works and selective field encryption Client-side field-level encryption gives you precise control over your data security. Unlike broad all-or-nothing approaches, this allows you to decide exactly which fields in a record need to be encrypted and which don't. For example, you might choose to encrypt sensitive fields like credit card numbers or social security numbers while leaving less sensitive information like timestamps or status codes in plain text. This field-by-field encryption not only protects what matters most, but also keeps your data pipelines efficient by avoiding unnecessary encryption. And the best part? The encryption happens client-side before the data even leaves your application. So whether the data is in transit or at rest on disk, only authorized users with the right keys can access those sensitive fields. Here is where we dig a little deeper. Encryption architecture: keys, KMS, and Kafka flow This diagram shows the moving parts of client-side field-level encryption. CSFLE uses two keys, a key encryption key and a data encryption key. Let's focus on the key encryption key first. The key management system, KMS, stores the key. Confluent Cloud supports GCP, AWS, Azure, and in some cases, HashiCorp as the KMS provider. The key encryption key stored in the KMS is needed for both encryption and decryption. The Kafka producer encrypts the sensitive fields before publishing the message to Kafka. On the consumer side, the process reverses. The encrypted fields are decrypted. In our example, the credit card number and the CVC, the card verification code, are encrypted while all the other fields are plain text. To make this process efficient, we use envelope encryption. Envelope encryption explained Envelope encryption is the backbone of client-side field-level encryption, and for good reason. It's a widely used, highly efficient method that balances security and scalability. Here's the gist. It's a hybrid approach combining symmetric key cryptography, which is fast and efficient for encrypting data, with public key cryptography, which securely protects the keys themselves. This means you get the best of both worlds, high performance and robust security. That said, it's a slightly involved process with different types of keys playing specific roles. If you're curious about how these keys work together, be sure to check out my dedicated video on envelope encryption, where I break it all down in detail. Key handling options: share vs. don’t share with Confluent When implementing client-side field-level encryption, you have two main options for how to handle the key encryption key. Each option comes with its own trade-offs, so let's take a closer look. Option one, providing Confluent access to the key encryption key. With this approach, Confluent can decrypt encrypted fields, enabling features like stream processing and support for fully managed connectors. The configuration is simpler, making it a great choice if you need maximum flexibility and ease of use. However, while Confluent has extensive security checks and balances in place to ensure nobody can access customer data, this setup does mean that Confluent could technically decrypt the fields during processing. Option two is not providing Confluent access to the key encryption key. This is the most secure option. End-to-end encryption ensures that even Confluent cannot decrypt your data. However, this means that there are trade-offs. One, a more complex initial setup. Two, no support for stream processing on encrypted fields. And three, no access to fully managed connectors. Ultimately, the decision comes down to your use case. If you are new to configuring client-side field-level encryption, I recommend starting with the first option. At any point in time, you can switch to option two. Both approaches are fully supported, so you can pick what works best for your scenario. When it comes to implementing Implementing CSFLE using fully managed connectors client-side field-level encryption, you have several client options to choose from depending on your setup and security needs. Fully managed connectors, self-managed connectors, and using one of the many client libraries available. Let's start with the fully managed connectors in Confluent Cloud. When using fully managed connectors, the key encryption key is always shared with Confluent. This enables seamless integration with managed services and simplifies your configuration. The simplified architecture diagram demonstrates that the source connector accesses the data, encrypts it, and publishes it to a Kafka topic. When the data arrives to the consumer, the sync connector decrypts the data and passes it on to the target backend. This is a very powerful yet simple approach, taking full advantage of the power of the capabilities of Confluent Cloud. Confluent Cloud offers an extensive list of connectors and an ever-growing number of them support CSFLE. Here, you can see the connectors supported at the time of recording this video. To really understand CSFLE, let's walk through the encryption and decryption process together. As the first step, we have to define schema and encryption rules in the schema registry and provide necessary access through RBAC, rule-based access control. Using self-managed connectors and custom clients In step two, we have to grant access for Confluent Cloud to the key encryption key stored in an external KMS. In step three, the producer requests the schema and the schema encryption rules. In step four, the schema registry returns the schema and the rules with a decrypted data encryption key. The connector reads the data from the source backend and using the data encryption key it just received, it encrypts the fields as specified by the rules. In step six, the connector publishes the messages with the encrypted fields. In step seven, the consumer requests schema and rules. In step eight, the schema registry returns the schema and the data encryption rules. In step nine, messages with the encrypted fields are consumed. And in step 10, the connector decrypts the encrypted fields in the message and writes it to the sync backend. Later in this video, we are going to see mild variations of this diagram, so be sure you take your time to understand this flow. An alternative to fully managed connectors running on Confluent Cloud is a wide variety of self-managed connectors. Self-managed connectors, as their name suggests, are hosted, scaled, and upgraded by you. There are two capabilities that make self-managed connectors especially powerful. First, you have the option to decide whether you want to give your Confluent platform environment access to your key encryption key or not. The CSFLE encryption and decryption flow is slightly different in the two scenarios. We'll review them in a moment. The second power of self-managed connectors is they allow you to build an environment in which data doesn't leave your firewall unencrypted. Encryption takes place within the boundaries of your network and is passed on to Confluent all encrypted. Very powerful capability. This is a list of self-managed source and sync connectors that support CSFLE. The documentation contains the complete list of self-managed connectors with client-side field-level encryption support. Now, let's take a closer look at our environment and the encryption decryption flow for self-managed connectors. We'll review two scenarios. In the first one, the key encryption key is shared with Confluent Cloud. The major difference between this and the fully managed connector scenario we reviewed before is that the connector is deployed within your own boundaries and therefore all the encryption takes place there. Again, no data leaves your firewall unencrypted. The main difference between the shared and not shared key encryption key options is that when the key encryption key is not shared with Confluent Cloud, it's your responsibility to retrieve it and decrypt the data encryption key with it. This step is required on both the producer and the consumer side in steps four and 10 so they can perform the encryption and decryption of your data. It's important to know that when you decide not to share your key encryption key with Confluent, you cannot perform stream processing operations in Confluent Cloud on any of your encrypted fields. In addition to fully managed and self-managed connectors, client-side field-level encryption is also supported for Kafka clients built using a variety of programming languages. As always, for the latest information on additional languages, check the documentation. Similarly to self-managed connectors, custom clients can also choose to share or not to share the key encryption key with Confluent. The serializer performs the encryption and decryption task in your custom clients. Depending on what message format you use, you will have to use the Avro, Protobuf, or JSON serializer. This diagram shows the steps it takes to perform encryption and decryption. The steps are slightly different depending on whether Confluent Cloud has access to the key encryption key or not. If the key encryption key is shared with Confluent Cloud, the decrypted data encryption key is handed to the clients in step four and eight. If the key encryption key is not shared with Confluent, it's the responsibility of the clients to retrieve the key encryption key and use it to decrypt the data encryption key both on the producer side for encryption in step five and on the consumer side for decryption in step 11. In closing, I wanted to share a few best practices and considerations for you to think about. Encryption introduces some computational overhead. The extent of this overhead depends on the number of encrypted fields and the type of the workload. Keep this in the back of your mind as you use CSFLE. Another thing to consider is how many fields should I encrypt and does it make sense to share the key with Confluent? In general, you want to encrypt all sensitive and confidential fields and share the key with Confluent to get the most, such as stream processing, out of your streaming backbone system. For example, if Confluent Cloud doesn't see the location, the spend amount, and the vendor for a credit card transaction, running flink fraud detection jobs is not possible either. And lastly, you should remember that compressing encrypted data is significantly less efficient than compressing unencrypted data. The impact of this very much depends on your workload. Client-side field-level encryption isn't just a technical solution. It's a strategic approach to data protection. It gives you precision, control, and peace of mind. By encrypting what matters most, exactly where it matters most, you're not just protecting data, you're building trust. And now that you know about it, it's your turn. Give it a try.
We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.