Skip to content

[ML] Include inference process's RSS memory stat in /_ml/trained_models/_stats output#142312

Merged
darius-vil merged 7 commits intoelastic:mainfrom
darius-vil:darius-vil/pytorch-inference-stats
Feb 25, 2026
Merged

[ML] Include inference process's RSS memory stat in /_ml/trained_models/_stats output#142312
darius-vil merged 7 commits intoelastic:mainfrom
darius-vil:darius-vil/pytorch-inference-stats

Conversation

@darius-vil
Copy link
Copy Markdown
Contributor

@darius-vil darius-vil commented Feb 11, 2026

Related change in ml-cpp: elastic/ml-cpp#2896

I want to add a new section to inference request's response:

"process_stats": {
    "memory_rss": 1234
}

Looks like ml-cpp runs REST tests from the elastic repo, which promptly fail on a new field coming from ml-cpp.
In this case, I think the right order of things is to:

  1. Do the change in elasticsearch (this PR)
  2. Do the change in ml-cpp (draft here: https://github.com/elastic/ml-cpp/pull/2896/changes)
  3. Add an extra integration test in elasticsearch to check if that memory stat is populated: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ml/qa/native-multi-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/PyTorchModelIT.java

@darius-vil darius-vil changed the title Include inference process's RSS memory stat in /_ml/trained_models/_stats output [ML] Include inference process's RSS memory stat in /_ml/trained_models/_stats output Feb 11, 2026
@darius-vil darius-vil added :ml Machine learning >enhancement labels Feb 11, 2026
@elasticsearchmachine elasticsearchmachine added v9.4.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 11, 2026
@darius-vil darius-vil requested a review from valeriy42 February 11, 2026 14:53

@Override
protected Response mutateInstanceForVersion(Response instance, TransportVersion version) {
if (version.supports(MEMORY_STAT_TRANSPORT_VERSION) == false) {
Copy link
Copy Markdown
Contributor Author

@darius-vil darius-vil Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case anyone wonders why is this necesssary - without this BWC tests fail and this appears to be necessary

See this file's git history:

@darius-vil darius-vil force-pushed the darius-vil/pytorch-inference-stats branch 2 times, most recently from d632e63 to d300753 Compare February 12, 2026 15:07
@darius-vil darius-vil force-pushed the darius-vil/pytorch-inference-stats branch from d300753 to b45db20 Compare February 23, 2026 13:48
assert this.lastAccess != null || (inferenceCount == null || inferenceCount == 0);
}

public NodeStats(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload is needed for the serverless build to succeed.

@darius-vil darius-vil force-pushed the darius-vil/pytorch-inference-stats branch from b45db20 to b9a5bf2 Compare February 23, 2026 15:30
@darius-vil darius-vil marked this pull request as ready for review February 25, 2026 08:19
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Feb 25, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @darius-vil, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A minor comment.

private final long throughputLastPeriod;
private final Double avgInferenceTimeLastPeriod;
private final Long cacheHitCountLastPeriod;
private final Long avgInferenceProcessMemoryRssBytes;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add getter method for avgInferenceProcessMemoryRssBytes to maintain consistency with all other fields

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not, cause there's no use for it now. Most of the other getters exist only to be used in the BWC test.

@darius-vil darius-vil force-pushed the darius-vil/pytorch-inference-stats branch from 9936516 to 22160f6 Compare February 25, 2026 09:41
@darius-vil darius-vil merged commit 92ccd73 into elastic:main Feb 25, 2026
35 checks passed
@darius-vil darius-vil deleted the darius-vil/pytorch-inference-stats branch February 25, 2026 11:19
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Feb 25, 2026
…els/_stats` output (elastic#142312)

* Measure rss memory returned from ml-cpp

* Change memory stat to average

* Add bytes unit to variable names

* Fix BWC test

* Regenerate transport version

* Fix unit test and apply spotless

* Fix serverless build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants