Get Started Free
May 17, 2022 | Episode 215

Apache Kafka 3.2 - New Features & Improvements

  • Transcript
  • Notes

Danica Fine: (00:00)

Welcome to another episode of Streaming Audio. I'm Danica Fine, Senior Developer Advocate at Confluent, and you're listening to a special episode where I get to do the honor of announcing the Apache Kafka 3.2 release on behalf of the Kafka community. There are so many great KIPs, highlights, and updates in this release. So, let's get to it.

Danica Fine: (00:24)

As usual, we'll be breaking this up into sections based on what the KIP pertains to. We'll cover updates around Kafka Core, Kafka Streams, and Kafka Connect. We'll kick things off with Kafka Core. First up, we have KIP-704, which allows for further communication between the controller and the brokers in the event that an unclean leader election has taken place. In this scenario, the controller sends messages to the brokers stating that an unclean leader election has happened, and that the topic partition leaders ought to recover their state. This helps to coordinate recovery and mitigate potential inconsistencies.

Danica Fine: (01:00)

KIP-764 introduces a new configuration parameter, socket.listen.backlog.size. Increasing this integer allows for a larger number of connections to be made and avoids filling up the acceptor backlog during leader election. Next, we have KIP-784, which made DescribeLogDirs response more consistent with other responses. With the change, an empty response no longer indicates in authorization error. This also opens the door for currently unknown error codes to be added in the future to further classify error responses. KIP-788, better handles network traffic by adding a new configuration to specify the number of network threads per listener. It's pretty common to have multiple types of listeners to find across Kafka brokers, but some of these listeners may have less traffic to handle than others. With KIP-788, the parameter has been altered and can now be prefixed with a specific listener name, allowing network threads to be configured on a per listener basis.

Danica Fine: (02:00)

With KIP-798 and KIP-810, Kafka-console-producer, everyone's favorite Kafka debugging and testing tool, has undergone a crucial facelift. Up until now, Kafka-console-producer was unable to produce messages with headers or null values. Currently scripting with Kafka-console-producer? No worries. These new features are handled with brand new flags, meaning that the existing functionality remains unchanged. The addition of KIP-800 means that we no longer need to guess the reason behind a consumer joining or leaving a consumer group. Clients log a reason when they join or leave a group. But it's difficult to collect this information from the broker side of things for further troubleshooting. With this change, the reason is propagated up to the brokers, making it easier to debug potential consumer group issues.

Danica Fine: (02:48)

For those of you excited about KRaft, KIP-801 introduces the first built-in authorizer that doesn't depend on ZooKeeper. In general, this new StandardAuthorizer does all of the same things that AclAuthorizer does for ZooKeeper-dependent clusters. StandardAuthorizer will be the default authorizer for KRaft-enabled clusters, but you're still free to specify another authorizer if you prefer. And wrapping up our Kafka core updates is KIP-814. This change makes metadata more consistent across a Kafka cluster in the event that a leader leaves and rejoins a group while static membership protocol is enabled. Before this change, a leader was allowed to be absent for a short time and rejoin the group while remaining the leader, but it was never informed that this happened. With this change, the functionality isn't affected. The leader can still leave, rejoin, and remain the leader, but it's now informed of its continued leadership.

Danica Fine: (03:42)

Next up we'll address a few Kafka Streams updates. KIP-708 increases fault tolerance of Kafka Streams by making it rack aware. Individual Kafka Streams clients can be tagged, for example, with our cloud region and cluster. Kafka Streams can then consider Stream clients' cloud region and cluster when distributing standby tasks. With KIP-805, we introduce RangeQuery, a new implementation of the Query interface provided by interactive queries version two. RangeQuery allows a state to be queried over a specific range. Expanding on this KIP is KIP-806, which brings WindowKeyQuery and WindowRangeQueryes class to interactive queries version two. Now you can access either a specific key or all keys within windowed state stores. The bottom line is that both KIP-805 and KIP-806 make it easier than ever to access information from underlying state stores through the new version of interactive queries.

Danica Fine: (04:37)

Our last Kafka Streams update is KIP-791. The StateStore context has been enhanced to include record metadata that may be accessed from state stores. This additional metadata will open the door for the implementation of stronger consistency semantics in the future.

Danica Fine: (04:52)

From the Kafka Connect side of things, our first update is KIP-769. Prior to this change, the Connect plug-ins endpoint printed only connectors. The API has now been extended to include a new connector's only flag, which when set to false will display all available plug-ins, not just the connectors. Using a plug-in output from the previous endpoint, you can also query the configuration of each plug-in. Up next is KIP-808, which supports new unix precisions in the timestamp converter single message transform. Using the new unix precision configuration parameter, you can easily specify your desired timestamp precision as you transform to and from unix timestamps.

Danica Fine: (05:34)

And finally KIP-779 adds the ability for connector source tasks to handle producer exceptions. Prior to this change, when a source task encountered a producer exception, it killed the entire connector. There wasn't really a way to properly recover. Worker source tasks will now utilize a getter method within the retry with tolerance operator and check to see how the exception should be handled either by skipping the record or failing the task.

Danica Fine: (05:59)

All right, those are the highlights from this latest Apache Kafka release. Thank you for listening to this episode, and I hope this podcast was helpful to you. If you have any questions or would like to discuss, you can reach out to our community forum or Slack, both are linked in the show notes. If you're listening on Apple Podcast or other podcast platforms, please be sure to leave a review. We'd love to hear your feedback. If you're watching on YouTube, please do subscribe. We'll notify you with updates that you might be interested in. Thank you again for your support and see you next time.

Apache Kafka® 3.2 delivers new  KIPs in three different areas of the Kafka ecosystem: Kafka Core, Kafka Streams, and Kafka Connect. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent), shares release highlights.

More than half of the KIPs in the new release concern Kafka Core. KIP-704 addresses unclean leader elections by allowing for further communication between the controller and the brokers. KIP-764 takes on the problem of a large number of client connections in a short period of time during preferred leader election by adding the configuration `socket.listen.backlog.size`. KIP-784 adds an error code field to the response of the `DescribeLogDirs` API, and KIP-788 improves network traffic by allowing you to set the pool size of network threads individually per listener on Kafka brokers. Finally, in accordance with the imminent KRaft protocol, KIP-801 introduces a built-in `StandardAuthorizer` that doesn't depend on ZooKeeper. 

There are five KIPs related to Kafka Streams in the AK 3.2 release. KIP-708 brings rack-aware standby assignment by tag, which improves fault tolerance. 

Then there are three projects related to Interactive Queries v2: KIP-796 specifies an improved interface for Interactive Queries; KIP-805 allows state to be queried over a specific range; and KIP-806 adds two implementations of the Query interface, `WindowKeyQuery` and `WindowRangeQuery`.

The final Kafka Streams project, KIP-791, enhances `StateStoreContext` with `recordMetadata`,which may be accessed from state stores.

Additionally, this Kafka release introduces Kafka Connect-related improvements, including KIP-769, which extends the `/connect-plugins` API, letting you list all available plugins, and not just connectors as before.  KIP-779 lets `SourceTasks` handle producer exceptions according to `error.tolerance`, rather than instantly killing the entire connector by default. Finally, KIP-808 lets you specify precisions with respect to TimestampConverter single message transforms. 

Tune in to learn more about the Apache Kafka 3.2 release!

Continue Listening

Episode 216May 19, 2022 | 33 min

Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB

Apache Kafka isn’t just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too! Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.

Episode 217May 26, 2022 | 55 min

Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.

Episode 218June 2, 2022 | 48 min

Data Mesh Architecture: A Modern Distributed Data Model

Data mesh isn’t software you can download and install, so how do you build a data mesh? In this episode, Adam Bellemare (Staff Technologist, Office of the CTO, Confluent) discusses his data mesh proof of concept and how it can help you conceptualize the ways in which implementing a data mesh could benefit your organization.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free