Skip to content

Kubernetes performance against rivals #236

@TheOnlyArtz

Description

@TheOnlyArtz

Hey Platformatic! first of all I wanted to say thank you for this awesome job, always fun to see competition in the scene.

We are relying heavily on Nest.js kafka's integration in my workplace and we are looking to migrate over. I'm conducting some benchmarks and stress tests to see what suits us better in a heavy throughput environment which requires low latency, sounds like this library suits the needs.

I'm benchmarking between this library and confluent's and the benchmarks I'm seeing in this repository don't add up, maybe I'm doing something wrong?

I'm stress testing the environment by sending 8000 1KB messages as fast as I can from my computer, in 40 messages per batch.

const TOTAL_MESSAGES = 8000
const BATCH_SIZE = 40
const IN_FLIGHT = 100
const LOG_EVERY = 1000
const ACKS = 1
const COMPRESSION = 'none'

(the stress test is using kafka.js tho its irrelevant for this issue)

Test environment

  • AWS Kafka instance
  • Stress test runs on my computer to the AWS (no local)
  • 2 replicas of the microservice in a kubernetes environment (with the same kafka group id to share compute)
  • Replicas are limited to 512MB of resources.
  • Node 24 alpine23 docker image
  • 512MB Cap for the pods

platformatic/kafka

While processing the messages fast enough and catching up, after only 3 minutes of running the pods gained too much memory and eventually killed by the orchestrator:
Image

This is the way I've configured the platformatic consumer and producer:

const KAFKA_REQUEST_TIMEOUT_MS = 60000
const KAFKA_CONNECTION_TIMEOUT_MS = 10000
const KAFKA_RETRY_INITIAL_TIME_MS = 300
const KAFKA_RETRY_COUNT = 8
const KAFKA_MAX_IN_FLIGHTS = 5

const options: KafkaClientOptions = {
  bootstrapBrokers: brokers,
  groupId: `${kafkaConfig.groupId}-group`,
  tls,
  requestTimeout: KAFKA_REQUEST_TIMEOUT_MS,
  connectTimeout: KAFKA_CONNECTION_TIMEOUT_MS,
  retries: KAFKA_RETRY_COUNT,
  retryDelay: KAFKA_RETRY_INITIAL_TIME_MS,
  maxInflights: KAFKA_MAX_IN_FLIGHTS,
  producerClientId: `${kafkaConfig}-producer`,
  consumerClientId: `${kafkaConfig}-consumer`,
  adminClientId: `${kafkaConfig}-admin`,
}

// producer factory
new Producer<string, string, string, string>({
  ...toKafkaBaseOptions(options, options.producerClientId),
  serializers: stringSerializers,
  idempotent: false,
})

// consumer factory
return new Consumer({
  ...toKafkaBaseOptions(options, options.consumerClientId),
  autocommit: 5000,
  groupId: options.groupId,
})

// stream
this.stream = await this.consumer.consume({
  topics,
  mode: MessagesStreamModes.LATEST,
  fallbackMode: MessagesStreamModes.LATEST,
})

// consuming messages using rxJs concurrency
private async consumeMessages(
  stream: MessagesStream<Buffer, Buffer, Buffer, Buffer>,
): Promise<void> {
  const processingStream$ = from(stream).pipe(
    takeWhile(() => !this.shuttingDown),

    mergeMap((message) => this.processMessage(message), 50)
  );

  await lastValueFrom(processingStream$, {defaultValue: undefined});
}

confluent/kafka-javascript (rdkafka-node)

I felt like the experience with confluent was so much smoother, I could run the stress test forever and these were the results (after 10 minutes)

Image

the platformatic is configured the same way as the confluent's client.
I'm really leaning towards platformatic but the benchmarks don't add up and I would love your input on that, maybe I'm just using it wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions