Parallelism in Kafka Consumer Projects

The OpenLegacy Kafka Consumer project supports parallel consumption of messages from a Kafka topic, enabling higher throughput and efficient processing.

To achieve true parallelism, proper topic and consumer configuration is required.

Topic Partitioning

Kafka topics are divided into partitions. Each partition can be consumed independently, allowing for parallel processing. By default, a Kafka topic has only one partition. For parallelism, you must increase the number of partitions in your topic.

Please read more about it here

Consumer Groups

All consumers who intend to process messages in parallel must belong to the same consumer group. Within a consumer group, Kafka assigns each partition to one consumer, ensuring that each message is processed by only one member of the group.

Partition-to-Consumer Ratio

Parallelism is achieved when the number of consumers is equal to or less than the number of partitions: Number of consumers ≤ Number of partitions

If you have more consumers than partitions, some consumers will remain idle.

If you have fewer consumers than partitions, each consumer may process multiple partitions.

For example, with 5 partitions and 3 consumers in the same group, 3 consumers will each process at least one partition, and the remaining partitions will be distributed among them.

You can configure the number of consumers using the property ol.kafka.consumers.NAME.numberOfInstances in the application.yml file - by default, it has the value 1.

Offset Commit Strategies

Kafka consumers track which messages have been processed using "offsets." How these offsets are committed can affect processing guarantees and latency. Offset management configuration

Auto-commit (default)

  • The consumer configuration property enable.auto.commit is set to true by default for the COMMIT and DEAD_LETTER_QUEUE values in the errorHandlingStrategy property of the Kafka Consumer (io.openlegacy.consumer.kafka.properties.KafkaConsumerProperties).
  • With auto-commit enabled, the consumer automatically commits offsets at the interval specified by auto.commit.interval.ms (default: 5000 ms = 5 seconds).
  • This means a message may be processed, but its offset won't be committed until the next interval passes.
  • You can tune the interval like this:
    ol:
      kafka:
        client:
          additional-properties:
            "auto.commit.interval.ms": 3000

Immediate Commit after Processing

  • The OpenLegacy Kafka Consumer project provides an option to commit offsets immediately after processing each message.
  • You can set it in the application.yml file:
    ol:
      kafka:
        client:
          additional-properties:
            "ol.force.commit": true
  • When this option is enabled, enable.auto.commit is automatically set to false, and the Kafka consumer commits the offset as soon as the message is processed.
  • This disables the interval-based auto-commit and ensures that you do not need to wait for the next auto-commit interval before the offset is saved.
  • This improves the accuracy of offset tracking and can reduce the risk of duplicate processing after restarts.
  • Upon consumer restart or rebalance, Kafka will assign partitions based on the last committed offset, and the consumer group will not reprocess already-completed messages, speeding up recovery.

Example Configuration

  • If you want 5 consumers processing in parallel, create your topic with at least 5 partitions, and have all 5 consumers join the same group.
  • To commit offsets immediately after processing, configure the consumer to commit immediately after processing each message.
ol:
  kafka:
    client:
      bootstrap-servers: http://kafka:9092
      group-id: kafka-demo
      additional-properties:
        "ol.force.commit": true
    consumers:
      loan:
        topics: [screens-in5]
        produce-topic: screens-out
        flow: loan-flow
        numberOfInstances: 5