Get Started Free
February 7, 2023 | Episode 256

Apache Kafka 3.4 - New Features & Improvements

  • Transcript
  • Notes

Danica Fine (00:00):

Welcome to Streaming Audio. I'm Danica Fine, developer advocate at Confluent. You're listening to a special episode where I have the honor of announcing the Apache Kafka 3.4 release. On behalf of the Kafka community. There are so many great KIPs in this release so, let's get to it. As usual, the release is broken up based on what the KIP pertains to. And this release will cover updates from Kafka Core, Kafka Streams, and Kafka Connect. First up, for Kafka Core we have KIP-866, which provides a bridge to migrate between existing ZooKeeper clusters to new KRaft mode clusters. With this change, you'll be able to migrate your existing metadata from ZooKeeper to KRaft. After metadata is synced between the two clusters using dual-write mode, you can safely transition control to KRaft controllers, and just in case the change also allows fail back to ZooKeeper during the upgrade migration.

Danica Fine (00:46):

This KIP gives you more flexibility to try out KRaft mode. But keep in mind that these changes are just an early access release. General availability of this feature will be a part of Apache Kafka 3.5. Next up is KIP-830. This change treats the JMX reporter like all other reporters so that it can be disabled for environments where it's not being used. The KIP includes a new configuration setting, auto include JMX reporter, to support disabling the reporter. When set false, a JMX reporter won't be instantiated, and instead reporters set via Metric Reporters will be used. When set to true, which is the default, a deprecation warning is printed directing the user to use Metric Reporters instead. In Apache Kafka 4.0, expect the default value for Metric Reporters to be set to JMX Reporter. With KIP-881, consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing.

Danica Fine (01:37):

In the past, KIP-392 enabled consumers to fetch from their closest replica. KIP-881 is an extension which allows consumers to fetch data from leaders or followers within the same availability zone when possible to benefit from locality. These changes are also a stepping stone to the consumer group protocol work being done as part of KIP-848, which will introduce rack-aware partition assignments in both the service side partition assigners, and the client side partition assigners. Check out KIP-881 for more details on how these changes will impact future design of client protocols. Next up is KIP-876. Cluster metadata snapshots are used to compact the underlying log segments and clean up redundant records. In the past, these snapshots were triggered based on the amount of data bytes that had been appended to the log since the last snapshot.

Danica Fine (02:22):

But seeing as snapshots are also used as cluster backups, it makes sense to generate snapshots based on time, KIP-876 enables this. To do so, this KIP adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot, the default is one hour. And finally, we have KIP-854, which introduces changes to clean up producer IDs more efficiently. Item potent producers and transactions are essential to Kafka's exactly-once semantics, and they both require some IDs in order to work properly. Item potent producers are assigned a producer ID automatically at startup. Transactions are meant to offer guarantees across topic partitions and producer sessions, so they require both a producer ID as well as a user provided transaction ID.

Danica Fine (03:02):

In the past, both producer IDs and transaction IDs would expire and be cleaned up using a single timeout parameter. KIP-679 made all producers item potent by default, so there are quite a lot more producer IDs floating around. To avoid excess memory usage, there needs to be a way to expire and clean up producer IDs independent of transaction IDs. KIP-854 introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs. Also affecting Kafka Streams is KIP-837, which allows users to multicast result records to every partition of downstream sync topics. The change also adds functionality for users to choose to drop result records without sending. To achieve this, a few changes were made. First, the Streams Partitioner Interface was given a new method called Partitions, which returns an optional set of partitions.

Danica Fine (03:53):

Next, the record collector and its implementing class now accepts a stream partitioner object. The record collector uses the partition's method from the streams partitioner to determine which partitions to send the record to. And don't worry, the key query metadata class was also updated to account for the fact that a single key could be present in multiple partitions. In rounding out our updates for Kafka 3.4, we have one KIP for Kafka Connect, specifically for MirrorMaker 2. In the past, MirrorMaker 2 used the built-in Kafka admin client and made a number of assumptions about the ACLs and administrative control that a particular user must have in order to run. Although these assumptions simplified MirrorMaker 2 resource management, they particularly affected those trying to run federated or infrastructure as code solutions. KIP-787 bypasses these hurdles by allowing users to run with custom implementations for the Kafka Resource Manager and integrate more easily with their ecosystems.

Danica Fine (04:46):

Those are the highlights from this latest Apache Kafka release. Thank you for taking the time to listen to this special episode. If you have any questions or would like to discuss, you can reach out to our community forum or Slack. Both are linked in the show notes. If you're listening on Apple Podcast or other podcast platforms, please be sure to leave a review. We'd love to hear your feedback. If you're watching on YouTube, please subscribe so you'll be notified with updates that you might be interested in. Thanks again for your support and see you next time.

Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect.

In Kafka Core:

  • KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. 
  • KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it’s not being used. 
  • KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs.
  • KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode clusters, enabling the migration of existing metadata from Zookeeper to KRaft. 
  • KIP-876 adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot; the default is 1 hour.
  • KIP-881, an extension of KIP-392, makes it so that consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing. 

In Kafka Streams:

  • KIP-770 updates some Kafka Streams configs and metrics related to the record cache size.
  • KIP-837 allows users to multicast result records to every partition of downstream sink topics and adds functionality for users to choose to drop result records without sending.

And finally, for Kafka Connect:

  • KIP-787 allows users to run MirrorMaker2 with custom implementations for the Kafka resource manager and makes it easier to integrate with your ecosystem.

Tune in to learn more about the Apache Kafka 3.4 release!

EPISODE LINKS

Continue Listening

Episode 257February 8, 2023 | 55 min

What can Apache Kafka Developers learn from Online Gaming?

What can online gaming teach us about making large-scale event management more collaborative in real-time? In this episode, Ben Gamble (Developer Relations Manager, Aiven) talks with Kris about integrating gaming concepts with Apache Kafka. Using Kafka’s state management stream processing, Ben has built systems that can handle real-time event processing at a massive scale, including interesting approaches to conflict resolution and collaboration.

Episode 258February 15, 2023 | 41 min

What is the Future of Streaming Data?

What’s the next big thing in the future of streaming data? In this episode, Greg DeMichillie (VP of Product and Solutions Marketing, Confluent) talks to Kris about the future of stream processing in environments where the value of data lies in their ability to intercept and interpret data.

Episode 259February 22, 2023 | 43 min

Real-Time Data Transformation and Analytics with dbt Labs

dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests. It’s possible for an analyst to adapt this process for use with a microservice application using Apache Kafka and the same method to pull batch data out of each and every database; however, in this episode, Amy Chen (Partner Engineering Manager, dbt Labs) tells Kris about a better way forward for analysts willing to adopt the streaming mindset.

Got questions?

If there's something you want to know about Apache Kafka, Confluent or event streaming, please send us an email with your question and we'll hope to answer it on the next episode of Ask Confluent.

Email Us

Never miss an episode!

Confluent Cloud is a fully managed Apache Kafka service available on all three major clouds. Try it for free today.

Try it for free