feat(profiling): incremental delta export for heap tracker by vlad-scherbich · Pull Request #18125 · DataDog/dd-trace-py

vlad-scherbich · 2026-05-17T15:12:27Z

Description

Switches the heap tracker's snapshot export from emitting the full live set every time to emitting only the deltas since the last snapshot: positive samples for newly-tracked allocations (ADD) and negative-value tombstones for freed ones (REMOVE). The backend (libdatadog → agent → UI) integrates deltas to compute the running live-heap view.

Why this matters:

At sample_size = default, today's full export iterates everything in allocs_m on every snapshot — work that's mostly redundant when live state is stable. Delta path is empty in steady state.
Microbench shows ~5× CPU reduction in representative steady-state workloads and ~500× p90 snapshot-latency reduction at large n (see scripts/heap_export_microbench.py).

Mechanics:

untrack_no_cpython moves the freed traceback's unique_ptr into a REMOVE event in pending_changes (kept alive until the next export emits the tombstone, then returned to the pool).
add_sample_no_cpython queues an ADD event with a raw pointer to the live traceback in allocs_m.
export_heap_no_cpython drains pending_changes; for REMOVEs it negates heap_space lazily at emit time (guarded by tombstone_applied so retries don't re-flip the sign) and emits.
New Sample::negate_heap_space() API splits the negate step from the emit step so a failed emit can be retried without toggling sign on every attempt.

Testing

Six new unit tests in tests/profiling/collector/test_memalloc.py:

test_delta_export_emits_negative_tombstones_after_free — second snapshot after a free emits negative-value tombstones for the allocator stack.
test_delta_export_steady_state_skips_unchanged_samples — unchanged live set => second snapshot has zero positive allocator samples.
test_delta_export_churn_net_heap_is_zero — balanced alloc/free churn nets to zero (positives + negatives cancel via pprof aggregation).
test_delta_export_no_resync_for_steady_state — 15 unchanging snapshots; none re-emit live state (no sneak resync; see 'Risks').
test_delta_export_net_heap_is_zero_across_snapshots — drain workload across multiple snapshots converges to net zero, exercising the retain-on-failure retry path.
test_delta_export_heavy_churn_stays_consistent — sustained churn that grows pending_changes past its initial reserve doesn't crash, hang, or leak.

Full memalloc suite: 31 passed, 5 skipped, no failures. Existing tests test_memory_collector_python_interface_with_allocation_tracking and ..._no_deletion updated to reflect that snapshots now carry deltas (not cumulative live state).

Risks

Wire-format contract change (highest risk). The backend now sees deltas instead of full snapshots. Agent intake, pprof aggregation, and the heap-profile UI must accept negative-value heap-space samples and integrate across uploads. libdatadog already accepts negative i64 end-to-end (libdd-profiling-protobuf/src/sample.rs:29 — explicit i64, no sign validation). The agent and UI need verification before this can ship.
No periodic resync. v1 does not emit a periodic full-snapshot — earlier drafts did, but emitting plain positives in a delta-integrating backend would double-count live state without a reset marker. Implication: if any delta upload is lost (network failure, agent crash mid-upload), the backend's running state diverges from live until the profiler restarts (which clears allocs_m and pending_changes). Profiler restarts are common in production Python (process recycling, fork), so drift is bounded in practice — but it is not bounded by snapshot cadence. Adding a true resync requires either a backend-side reset marker or emitting compensating negative+positive pairs for every live entry; both are out of scope here.
Retain-on-failure with bounded retry. If libdatadog rejects a sample (Profile_add2 returns false), the event is retained in pending_changes for retry on the next snapshot, up to MAX_EXPORT_RETRIES = 2. After that the event is dropped to bound buffer growth on persistent rejection. ADD and REMOVE drops still pair up (the partner event for the same ptr lives in the same buffer and follows the same retry cap), so the backend's running state stays consistent even under drops.
Cuckoo fast-reject filter not included. This PR is independent of the cuckoo-filter free-path optimization (PR perf(profiling): incremental data export in memalloc #18116). They compose cleanly and can land in either order.

Additional Notes

Microbench at scripts/heap_export_microbench.py for measuring delta vs baseline snapshot CPU on this branch.
Addresses Copilot review feedback (vector realloc safety in hook path) and chatgpt-codex-connector review feedback (preserve pending deltas on export failure, no silent drop on rejection).

cit-pr-commenter-54b7da · 2026-05-17T15:14:48Z

Codeowners resolved as

ddtrace/profiling/collector/_memalloc_heap.cpp                          @DataDog/profiling-python

vlad-scherbich · 2026-05-18T14:04:54Z

@codex please reivew

datadog-datadog-prod-us1 · 2026-05-18T14:04:59Z

Tests

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

❄️ 2 New flaky tests detected

test_delta_export_churn_net_heap_is_zero[py3.10] from test_memalloc.py (Fix with Cursor)
net heap-space for allocator stack should be 0 after balanced churn, got 640 (delta path may have skipped REMOVE tombstones)
assert 640 == 0
New test introduced in this PR is flaky.
test_delta_export_churn_net_heap_is_zero[py3.9] from test_memalloc.py (Fix with Cursor)
net heap-space for allocator stack should be 0 after balanced churn, got 648 (delta path may have skipped REMOVE tombstones)
assert 648 == 0
New test introduced in this PR is flaky.
View in Flaky Test Management

ℹ️ Info

No other issues found (see more)

🧪 All tests passed

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: deef68f | Docs | Datadog PR Page | Give us feedback!}

Copilot

Pull request overview

This PR changes the heap memory profiler export model from cumulative heap snapshots to incremental delta exports, including negative tombstones for freed allocations and periodic full-state resyncs.

Changes:

Adds pending heap change tracking with ADD/REMOVE events in the native heap tracker.
Adds negative heap sample export support to the profiling sample wrapper.
Updates and expands memalloc tests for delta semantics, tombstones, steady state, churn, and resync behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`ddtrace/profiling/collector/_memalloc_heap.cpp`	Implements heap delta tracking, tombstone emission, and periodic resync logic.
`ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp`	Adds negative heap sample export behavior.
`ddtrace/internal/datadog/profiling/dd_wrapper/include/sample.hpp`	Declares the new negative sample export API.
`tests/profiling/collector/test_memalloc.py`	Updates existing heap assertions and adds delta export regression tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ab8114437

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T14:09:12Z

+        flush_pending_to_pool_no_cpython();
+        export_full_snapshot_no_cpython();
+        snapshots_since_resync = 0;


Emit explicit reset marker before full resync snapshot

The resync branch flushes pending deltas and then emits allocs_m via plain export_sample() calls, but this payload is indistinguishable from normal positive ADD deltas. In a delta-integrating backend, that means the full live set is added on top of already-integrated state, so every resync interval can inflate heap totals instead of correcting drift. If resync is intended to recover from dropped uploads, this path needs an explicit reset/generation signal (or equivalent protocol change) before sending the full snapshot.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T14:09:12Z

+        pending_changes.clear();
+        ++snapshots_since_resync;


Preserve pending deltas when sample export fails

In delta mode, export_sample() / export_sample_negative() return a success flag, but failures are ignored and pending_changes is always cleared afterward. If ddog_prof_Profile_add2 rejects any sample in this snapshot, those ADD/REMOVE deltas are dropped permanently (until a later resync), causing silent state drift and missing frees/allocations in the exported heap stream. The delta buffer should only be cleared after successful emission (or failed events should be retried).

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Each export now emits only what changed since the previous snapshot (positive samples for new allocations, negative-value tombstones for freed ones), with a periodic full-snapshot resync every 10 snapshots to bound backend drift if a delta upload is dropped. Adds Sample::export_sample_negative to the dd_wrapper for emitting tombstones; libdatadog's ddog_prof_Profile_add2 accepts negative i64 values end-to-end. Live tracebacks under untrack are moved into a REMOVE event so they stay alive until the next export, then returned to the pool. ADD events hold raw pointers into allocs_m; lifetime is guaranteed by either allocs_m or a subsequent REMOVE in the same pending buffer.

vlad-scherbich · 2026-05-19T20:20:01Z

@codex please review

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ba017f576

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T20:24:08Z

+            if (evt.kind == change_event::REMOVE) {
+                pool_put_no_cpython(std::move(evt.owner));
+            }


Keep REMOVE-owned traceback alive while ADD retries remain

When an ADD event fails export and is retained for retry, it only keeps a raw tb pointer. If the matching REMOVE event later succeeds in the same pass, this branch immediately returns evt.owner to the pool, which clears/reuses the same traceback_t. On the next snapshot, the retained ADD retry dereferences a stale/recycled pointer, causing corrupted samples or a use-after-free style crash. This occurs whenever Profile_add2 rejects the ADD but accepts the later REMOVE for the same allocation sequence.

Useful? React with 👍 / 👎.

…a export For allocations that are tracked AND freed within the same snapshot interval, the previous delta path emitted both an ADD (positive) and a REMOVE (negative tombstone) — libdatadog aggregated them to a net-zero bucket but still paid two Profile_add2 calls per pair. On alloc-then-free hot loops (per the rapid_python_http_smoke_test handle_alloc_pressure workload) this dominated CPU and surfaced as a ~7% CPU regression with +111% Locked Time vs the full-snapshot baseline. Side map void* -> pending ADD index makes the untrack hot path O(1) for finding the matching ADD; when found, the ADD is flagged `collapsed` and skipped at export time, no REMOVE event is queued, and the traceback is returned to the pool immediately. Microbench Scenario D (balanced churn, 500 pairs / snapshot) drops from ~13.5 ms to ~39 us — ~340x faster than the no-collapse delta path and within ~4x of the full-snapshot baseline. Side map is cleared at the end of each export; retained-on-failure ADDs lose their collapse capability on the next snapshot (acceptable since retain is rare and the existing aggregation-to-zero fallback still works correctly without collapse). Cleared on fork alongside the rest of the delta state. Correctness is covered by the existing delta tests (churn-net-heap-is-zero, heavy-churn-stays-consistent, large-heap-overhead); they pass on this build because the wire-format contract is unchanged — collapse just skips emission of pairs the backend would aggregate to zero anyway.

Previously, a same-snapshot ADD/REMOVE collapse marked the ADD as 'collapsed' in place and left it in pending_changes; the export loop iterated and skipped these entries. Under sustained alloc-then-free churn that buffer fills with skipped entries — at 500 churn pairs per snapshot, ~12.5K wasted branch checks on each export pass. Swap-and-pop removes the canceled ADD from pending_changes at collapse time: the last element moves into the freed slot, and if it's another indexed ADD, its pending_add_idx entry is rewritten to the new position. pending_changes stays compact across churn and the collapsed flag on change_event is no longer needed. Microbench Scenario D (balanced churn, 500 pairs per snapshot): before: 39 us median after: 13 us median At 50 pairs the cost is now the same as 500 pairs (~14 us median), confirming snapshot work is decoupled from churn volume.

…dary When export retains events for retry (libdatadog Profile_add2 rejection), the previous code cleared pending_add_idx entirely, which meant retained ADDs lost their ability to collapse with a later REMOVE — the next snapshot would queue a separate tombstone for that ptr instead of canceling the still-pending ADD, paying an extra Profile_add2 call. Rebuild pending_add_idx after the std::move(retained) so ADD events that survive the retry boundary stay indexed by ptr. Cost is O(retained.size()) per export, which is bounded: retained events only appear on libdatadog rejection and are capped at MAX_EXPORT_RETRIES per event before being dropped.

vlad-scherbich force-pushed the vlad/memalloc-support-mem-domain branch from 89c6829 to 1ee4530 Compare May 18, 2026 02:07

vlad-scherbich changed the base branch from vlad/memalloc-support-mem-domain to main May 18, 2026 13:32

vlad-scherbich changed the base branch from main to vlad/memalloc-support-mem-domain May 18, 2026 13:32

vlad-scherbich force-pushed the vlad/memalloc-incremental-export branch from 2e4a3b9 to 2ab8114 Compare May 18, 2026 13:53

vlad-scherbich requested a review from Copilot May 18, 2026 14:05

Copilot started reviewing on behalf of vlad-scherbich May 18, 2026 14:05 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread ddtrace/profiling/collector/_memalloc_heap.cpp Outdated

Comment thread ddtrace/profiling/collector/_memalloc_heap.cpp Outdated

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Base automatically changed from vlad/memalloc-support-mem-domain to main May 18, 2026 16:33

vlad-scherbich added changelog/no-changelog A changelog entry is not required for this PR. Profiling Continous Profling labels May 19, 2026

vlad-scherbich force-pushed the vlad/memalloc-incremental-export branch 2 times, most recently from 176731d to 5a701ef Compare May 19, 2026 18:45

vlad-scherbich requested a review from Copilot May 19, 2026 19:02

Copilot started reviewing on behalf of vlad-scherbich May 19, 2026 19:02 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread ddtrace/profiling/collector/_memalloc_heap.cpp

vlad-scherbich force-pushed the vlad/memalloc-incremental-export branch from 5a701ef to e0c9c76 Compare May 19, 2026 20:02

vlad-scherbich force-pushed the vlad/memalloc-incremental-export branch from e0c9c76 to 6ba017f Compare May 19, 2026 20:03

vlad-scherbich requested a review from Copilot May 19, 2026 20:19

Copilot started reviewing on behalf of vlad-scherbich May 19, 2026 20:20 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

vlad-scherbich added 3 commits May 20, 2026 11:45

Conversation

vlad-scherbich commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Risks

Additional Notes

Uh oh!

cit-pr-commenter-54b7da Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

vlad-scherbich commented May 18, 2026

Uh oh!

datadog-datadog-prod-us1 Bot commented May 18, 2026 • edited by datadog-prod-us1-5 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

vlad-scherbich commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vlad-scherbich commented May 17, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented May 17, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented May 18, 2026 •

edited by datadog-prod-us1-5 Bot

Loading