Tutorial

How to use the Confluent Parallel Consumer library

The Confluent Parallel Consumer is an open-source Apache 2.0-licensed Java library that enables you to consume from a Kafka topic with more parallelism than the number of partitions. In an Apache Kafka consumer group, the number of partitions is the parallelism limit. Increasing the level of parallelism beyond the partition count is desirable in many situations. For example, when there are fixed partition counts for a reason beyond your control or if you need to make a high-latency call out to a database or microservice while consuming and want to increase throughput.

In this tutorial, you'll build a small "hello world" application that uses the Confluent Parallel Consumer library. There are also some performance tests at a larger scale to compare the Confluent Parallel Consumer with a baseline built using a vanilla Apache Kafka consumer group you can explore on your own.

ParallelStreamProcessor

For parallel record consuming, you'll use the ParallelStreamProcessor which wraps a KafkaConsumer.

You create a new instance of a KafkaConsumer, create a ParallelConsumerOptions configuration object, then use the configuration to create a new ParallelStreamProcessor instance:

final Consumer<String, String> consumer = new KafkaConsumer<>(appProperties);  
    final ParallelConsumerOptions options = ParallelConsumerOptions.<String, String>builder()  
        .ordering(KEY)  
        .maxConcurrency(16)  
        .consumer(consumer)  
        .commitMode(PERIODIC_CONSUMER_SYNC)  
        .build();
    ParallelStreamProcessor<String, String> eosStreamProcessor = createEosStreamProcessor(options);
    
    
 eosStreamProcessor.poll(context -> recordHandler.processRecord(context.getSingleConsumerRecord()));

In this example, you're specifying ordering by key with a maximum concurrency of 16. You specify PERIODIC_CONSUMER_SYNC for the committing of offsets. The PERIODIC_CONSUMER_SYNC will block the Parallel Consumer’s processing loop until a successful commit response is received. Asynchronous is also supported, which optimizes for consumption throughput (the downside being higher risk of needing to process duplicate messages in error recovery scenarios).

Then you start consuming records in parallel with the ParallelStreamProcessor.poll method which takes a java.util.function.Consumer instance to work with each record it consumes.

Performance tests

There are two performance tests ParallelConsumerPerfTest and the MultithreadedKafkaConsumerPerfTest you can run to observe the power of parallel consumption first-hand.

Do you have questions or comments? Join us in the #confluent-developer community Slack channel to engage in discussions with the creators of this content.

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Articles

Patterns

FAQs

Blog

NEWStreamables

NEWLearn More

Language Guides

Tutorials

Demos

Language Guides

Tutorials

Demos

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

Meetups

Ask the Community

Community Catalysts

NEWCommunity Use Cases

Confluent Developer Newsletter

Data Streaming Awards

NEWCurrent 2024

Kafka Summit 2024 - Bangalore

Kafka Summit 2024 - London

Current 2023

Kafka Summit 2023

NEWKafka® 101

NEWApache Flink® SQL

NEWApache Flink® Table API: Processing Data Streams in Java

NEWDesigning Event-Driven Microservices

NEWApache Flink® 101

NEWBuilding Flink® Apps in Java

NEWKafka® 101

Kafka® Connect 101

Kafka Streams 101

Schema Registry 101

ksqlDB 101

Data Mesh 101

Articles

Patterns

FAQs

Blog