Parallelism in Kafka Consumer Projects
The OpenLegacy Kafka Consumer project supports parallel consumption of messages from a Kafka topic, enabling higher throughput and efficient processing.
To achieve true parallelism, proper topic and consumer configuration is required.
Topic Partitioning
Kafka topics are divided into partitions. Each partition can be consumed independently, allowing for parallel processing. By default, a Kafka topic has only one partition. For parallelism, you must increase the number of partitions in your topic.
Please read more about it here
Consumer Groups
All consumers who intend to process messages in parallel must belong to the same consumer group. Within a consumer group, Kafka assigns each partition to one consumer, ensuring that each message is processed by only one member of the group.
Partition-to-Consumer Ratio
Parallelism is achieved when the number of consumers is equal to or less than the number of partitions: Number of consumers ≤ Number of partitions
If you have more consumers than partitions, some consumers will remain idle.
If you have fewer consumers than partitions, each consumer may process multiple partitions.
For example, with 5 partitions and 3 consumers in the same group, 3 consumers will each process at least one partition, and the remaining partitions will be distributed among them.
You can configure the number of consumers using the property ol.kafka.consumers.NAME.numberOfInstances in the application.yml file - by default, it has the value 1.
Offset Commit Strategies
Kafka consumers track which messages have been processed using "offsets." How these offsets are committed can affect processing guarantees and latency. Offset management configuration
Auto-commit (default)
- The consumer configuration property
enable.auto.commitis set totrueby default for theCOMMITandDEAD_LETTER_QUEUEvalues in theerrorHandlingStrategyproperty of the Kafka Consumer (io.openlegacy.consumer.kafka.properties.KafkaConsumerProperties). - With auto-commit enabled, the consumer automatically commits offsets at the interval specified by
auto.commit.interval.ms(default: 5000 ms = 5 seconds). - This means a message may be processed, but its offset won't be committed until the next interval passes.
- You can tune the interval like this:
ol: kafka: client: additional-properties: "auto.commit.interval.ms": 3000
Immediate Commit after Processing
- The OpenLegacy Kafka Consumer project provides an option to commit offsets immediately after processing each message.
- You can set it in the
application.ymlfile:ol: kafka: client: additional-properties: "ol.force.commit": true - When this option is enabled,
enable.auto.commitis automatically set to false, and the Kafka consumer commits the offset as soon as the message is processed. - This disables the interval-based auto-commit and ensures that you do not need to wait for the next auto-commit interval before the offset is saved.
- This improves the accuracy of offset tracking and can reduce the risk of duplicate processing after restarts.
- Upon consumer restart or rebalance, Kafka will assign partitions based on the last committed offset, and the consumer group will not reprocess already-completed messages, speeding up recovery.
Example Configuration
- If you want 5 consumers processing in parallel, create your topic with at least 5 partitions, and have all 5 consumers join the same group.
- To commit offsets immediately after processing, configure the consumer to commit immediately after processing each message.
ol:
kafka:
client:
bootstrap-servers: http://kafka:9092
group-id: kafka-demo
additional-properties:
"ol.force.commit": true
consumers:
loan:
topics: [screens-in5]
produce-topic: screens-out
flow: loan-flow
numberOfInstances: 5
Updated about 8 hours ago
