refactor Callback implementation#677
Conversation
797f39c to
dea2a2d
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## next #677 +/- ##
==========================================
+ Coverage 96.37% 96.42% +0.05%
==========================================
Files 78 78
Lines 7138 7160 +22
==========================================
+ Hits 6879 6904 +25
+ Misses 259 256 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
096267a to
2650fb4
Compare
|
The logs are normal. We have some clean-up to do (in the IDF and with DAQ) about the PV initialization. As per my comment at slack -- please run with |
|
I ran both again for about an hour each: |
|
Also adding here (from slack) -- when you turn on ==============================================
EWM16074_live_data_memory_leak 347756ab1f (libjemalloc.so LD_PRELOAD)
DeleteWorkspace-[Notice] DeleteWorkspace successful, Duration 0.00 seconds
8890.9876 seconds: RSS 1742600 kB
-----------------------------------------------------------
--------------- Memory-allocation traces ------------------
-----------------------------------------------------------
[ Top 10 ]
/home/ux0/workspaces/SNAPRed/.pixi/envs/default/lib/python3.12/collections/__init__.py:508: size=55.1 KiB, count=242, average=233 B
/home/ux0/workspaces/SNAPRed/tests/cis_tests/live_data_memory_leak.py:96: size=50.7 KiB, count=2, average=25.4 KiB
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:342: size=49.7 KiB, count=1149, average=44 B
<frozen importlib._bootstrap_external>:757: size=49.2 KiB, count=971, average=52 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:334: size=39.1 KiB, count=500, average=80 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:331: size=39.1 KiB, count=500, average=80 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/recipe/algorithm/MantidSnapper.py:156: size=29.3 KiB, count=250, average=120 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/recipe/algorithm/MantidSnapper.py:308: size=21.9 KiB, count=250, average=90 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:340: size=21.2 KiB, count=376, average=58 B
/home/ux0/workspaces/SNAPRed/.pixi/envs/default/lib/python3.12/re/__init__.py:224: size=20.8 KiB, count=454, average=47 B
-----------------------------------------------------------
(snapred) (base) [ux0@LAP135679 SNAPRed]$
==============================================
next f65ca12829825 (libjemalloc.so LD_PRELOAD)
DeleteWorkspace-[Notice] DeleteWorkspace successful, Duration 0.00 seconds
8901.5263 seconds: RSS 2945964 kB
-----------------------------------------------------------
--------------- Memory-allocation traces ------------------
-----------------------------------------------------------
[ Top 10 ]
/home/ux0/workspaces/SNAPRed/src/snapred/meta/Callback.py:2: size=2270 KiB, count=21921, average=106 B
/home/ux0/workspaces/SNAPRed/src/snapred/meta/Callback.py:66: size=824 KiB, count=21096, average=40 B
/home/ux0/workspaces/SNAPRed/src/snapred/meta/Callback.py:31: size=601 KiB, count=1463, average=421 B
/home/ux0/workspaces/SNAPRed/src/snapred/meta/Callback.py:10: size=387 KiB, count=144, average=2755 B
/home/ux0/workspaces/SNAPRed/tests/cis_tests/live_data_memory_leak.py:96: size=50.7 KiB, count=2, average=25.4 KiB
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:342: size=50.7 KiB, count=1168, average=44 B
<frozen importlib._bootstrap_external>:757: size=49.1 KiB, count=974, average=52 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:334: size=39.1 KiB, count=500, average=80 B
/home/ux0/workspaces/SNAPRed/src/snapred/backend/dao/RunMetadata.py:331: size=39.1 KiB, count=500, average=80 B
/home/ux0/workspaces/SNAPRed/.pixi/envs/default/lib/python3.12/logging/__init__.py:457: size=36.2 KiB, count=309, average=120 B
-----------------------------------------------------------
(snapred) (base) [ux0@LAP135679 SNAPRed]$
|
walshmm
left a comment
There was a problem hiding this comment.
Was able to reproduce the callback results. Approved!
on next:
-----------------------------------------------------------
--------------- Memory-allocation traces ------------------
-----------------------------------------------------------
[ Top 10 ]
/SNS/users/wqp/git/SNAPRed/src/snapred/meta/Callback.py:2: size=2200 KiB, count=21151, average=107 B
/SNS/users/wqp/git/SNAPRed/src/snapred/meta/Callback.py:66: size=802 KiB, count=20544, average=40 B
/SNS/users/wqp/git/SNAPRed/src/snapred/meta/Callback.py:31: size=592 KiB, count=1422, average=426 B
/SNS/users/wqp/git/SNAPRed/src/snapred/meta/Callback.py:10: size=382 KiB, count=140, average=2791 B
/SNS/users/wqp/git/SNAPRed/./tests/cis_tests/live_data_memory_leak.py:96: size=50.7 KiB, count=2, average=25.4 KiB
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:342: size=50.6 KiB, count=1168, average=44 B
<frozen importlib._bootstrap_external>:757: size=49.3 KiB, count=976, average=52 B
/SNS/users/wqp/git/SNAPRed/.pixi/envs/qa/lib/python3.12/threading.py:293: size=46.0 KiB, count=124, average=380 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:334: size=39.1 KiB, count=500, average=80 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:331: size=39.1 KiB, count=500, average=80 B
-----------------------------------------------------------
on branch:
--------------- Memory-allocation traces ------------------
-----------------------------------------------------------
[ Top 10 ]
/SNS/users/wqp/git/SNAPRed/./tests/cis_tests/live_data_memory_leak.py:96: size=50.7 KiB, count=2, average=25.4 KiB
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:342: size=50.2 KiB, count=1149, average=45 B
<frozen importlib._bootstrap_external>:757: size=49.5 KiB, count=973, average=52 B
/SNS/users/wqp/git/SNAPRed/.pixi/envs/qa/lib/python3.12/threading.py:293: size=46.0 KiB, count=124, average=380 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:334: size=39.1 KiB, count=500, average=80 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/dao/RunMetadata.py:331: size=39.1 KiB, count=500, average=80 B
/SNS/users/wqp/git/SNAPRed/.pixi/envs/qa/lib/python3.12/collections/__init__.py:508: size=32.7 KiB, count=148, average=226 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/recipe/algorithm/MantidSnapper.py:156: size=28.7 KiB, count=245, average=120 B
/SNS/users/wqp/git/SNAPRed/.pixi/envs/qa/lib/python3.12/re/__init__.py:224: size=24.1 KiB, count=525, average=47 B
/SNS/users/wqp/git/SNAPRed/src/snapred/backend/recipe/algorithm/MantidSnapper.py:308: size=21.9 KiB, count=250, average=90 B
-----------------------------------------------------------
b6082ec to
dd452c1
Compare
The old implementation was creating a new `Callback` class every time `callback(clazz)` was called, then dynamically attaching delegated methods to that class before returning an instance. This meant we were repeatedly allocating distinct callback class objects, each with its own type-object overhead and its own set of generated forwarding methods. The memory issue appears to have come from both the volume of those class allocations and the fact that the generated class objects persisted longer than expected. This refactor fixes that by moving to a stable top-level `Callback`, generating forwarding behavior via `CallbackMeta`, and caching one subclass per wrapped type instead of recreating classes on every call.
- Replace the define-class-in-closure pattern with a cached per-type `Callback` subclass.
- Use `CallbackMeta` to generate forwarding magic methods at class creation time instead of patching them on afterward.
- Expand magic-method pass-through so wrapped primitive types and other built-ins behave more transparently.
- Centralize forwarding behavior in `_FORWARDED_MAGIC_METHODS` and `_make_forwarder()` to simplify maintenance.
- Tighten attribute access/assignment handling to better separate internal state from forwarded behavior.
- Preserve the existing “not populated” safeguards while improving debugging via `_wrapped_type`, `__repr__`, and `__str__`.
Note (this next change was not strictly necessary, but it makes testing on a laptop, a bit easier):
- `LoadLiveData` algorithm deletion: match 'MantidSnapper' changes from EWM13905
To test:
Two new test scripts are provided at `tests/cis_tests/live_data_memory_leak.py` and `tests/cis_tests/live_data_memory_leak_v2.py`. The former is a minimal reproducer of the metadata-only loop from the live-data pane; the latter is a mantid-only version of the same thing.
These scripts will run the metadata-only part of SNAPRed's live-data loop, and display a memory-usage summary after each execution. Before these changes, the former script would only run for 20 cycles or so before triggering an OOM-kill on a laptop with 32 GB of memory. After these changes, I've successfully run the script for over 12 hours of cycling.
For convenience, I've attached my version of `dev.yml`. I generally use a mirror of the relevent `/SNS` tree. Note that in the following you must use the `LD_PRELOAD`: otherwise Mantid's extensive memory fragmentation will confuse the test results. (I actually used `jemalloc` for my tests, but `tbbmalloc` should work as well.)
The commands to run the test script should be:
``` bash
export LD_PRELOAD=${HOME}/mambaforge/envs/mantid-developer/lib/libtbbmalloc.so
env=dev tests/scripts/live_data_memory_leak.py
```
dd452c1 to
5b40f3e
Compare


Description of work
This PR deals with a major memory-allocation issue in SNAPRed caused by the previous implementation and usage of the
Callbackclass byMantidSnapper.The old implementation was creating a new
Callbackclass every timecallback(clazz)was called, then dynamically attaching delegated methods to that class before returning an instance. This meant we were repeatedly allocating distinct callback class objects, each with its own type-object overhead and its own set of generated forwarding methods. The memory issue appears to have come from both the volume of those class allocations and the fact that the generated class objects persisted longer than expected. This refactor fixes that by moving to a stable top-levelCallback, generating forwarding behavior viaCallbackMeta, and caching one subclass per wrapped type instead of recreating classes on every call.Explanation of work
This commit includes the following changes:
Callbacksubclass.CallbackMetato generate forwarding magic methods at class creation time instead of patching them on afterward._FORWARDED_MAGIC_METHODSand_make_forwarder()to simplify maintenance._wrapped_type,__repr__, and__str__.Note (this next change was not strictly necessary, but it makes testing on a laptop, a bit easier):
LoadLiveDataalgorithm deletion: match 'MantidSnapper' changes from EWM13905To test
New unit tests were added to cover the new changes.
Dev testing
Two new test scripts are provided at
tests/cis_tests/live_data_memory_leak.pyandtests/cis_tests/live_data_memory_leak_v2.py. The former is a minimal reproducer of the metadata-only loop from the live-data pane; the latter is a mantid-only version of the same thing.These scripts will run the metadata-only part of SNAPRed's live-data loop, and display a memory-usage summary after each execution. Before these changes, the former script would only run for 20 cycles or so before triggering an OOM-kill on a laptop with 32 GB of memory. After these changes, I've successfully run the script for over 12 hours of cycling.
For convenience, I've attached my version of
dev.yml. I generally use a mirror of the relevent/SNStree. Note that in the following you must use theLD_PRELOAD: otherwise Mantid's extensive memory fragmentation will confuse the test results. (I actually usedjemallocfor my tests, buttbbmallocshould work as well.)The commands to run the test script should be:
Link to EWM item
EWM#16074
Verification
Acceptance Criteria
This list is for ease of reference, and does not replace reading the EWM story as part of the review. Verify this list matches the EWM story before reviewing.