Skip to content

perf: Avoid creating processing buffers beyond what is needed.#19426

Merged
gianm merged 4 commits into
apache:masterfrom
gianm:msq-pbp-tcount
May 12, 2026
Merged

perf: Avoid creating processing buffers beyond what is needed.#19426
gianm merged 4 commits into
apache:masterfrom
gianm:msq-pbp-tcount

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented May 7, 2026

In Dart, processing buffers are sliced up from the merge buffer. For stages that do not use all processing threads -- perhaps because they do not have enough inputs -- we can be more efficient with memory by slicing the merge buffer based on the actual number of processors, not the number of processing threads.

This patch addresses it by deferring the choice of how many buffers are needed until the stage actually starts executing. At that point, it knows how many processors it will create.

In Dart, processing buffers are sliced up from the merge buffer.
For stages that do not use all processing threads -- perhaps because
they do not have enough inputs -- we can be more efficient with memory
by slicing the merge buffer based on the actual number of processors,
not the number of processing threads.

This patch addresses it by deferring the choice of how many buffers
are needed until the stage actually starts executing. At that point,
it knows how many processors it will create.
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels May 7, 2026
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.


This is an automated review by Codex GPT-5

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.


This is an automated review by Codex GPT-5

);
}

final int sliceSize = chunk.capacity() / requestedSlices;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defensive check that requested slices > 0 or do we trust the upstream callers always request at least one slice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a defensive check.

@gianm gianm merged commit 459c62a into apache:master May 12, 2026
63 of 64 checks passed
@gianm gianm deleted the msq-pbp-tcount branch May 12, 2026 19:33
@github-actions github-actions Bot added this to the 38.0.0 milestone May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants