Skip to content

fix(native): fix incorrect stacktraces on Linux (ELF segment merging + DWARF unwinding)#1588

Merged
jpnurmi merged 13 commits intomasterfrom
fix/native-backend-stacktraces
Mar 27, 2026
Merged

fix(native): fix incorrect stacktraces on Linux (ELF segment merging + DWARF unwinding)#1588
jpnurmi merged 13 commits intomasterfrom
fix/native-backend-stacktraces

Conversation

@mujacica
Copy link
Copy Markdown
Contributor

@mujacica mujacica commented Mar 22, 2026

Fixes two root causes of incorrect stacktraces reported by customers on Linux:

  1. ELF segment merging: /proc/pid/maps shows separate mappings per ELF LOAD segment. Without merging, base_of_image pointed to the code segment instead of the real ELF load base, causing all RVA computations to be off by the .text section offset. This broke server-side CFI unwinding.

  2. DWARF-based backtrace capture: The daemon's FP-based stack walking fails on x86_64 (and any -fomit-frame-pointer build) because RBP is used as a general-purpose register. Now captures backtrace using libunwind (DWARF .eh_frame) in the signal handler and stores it in shared memory. The daemon prefers this over FP-walking when available. Also primes libunwind's cache at init time to avoid dl_iterate_phdr deadlocks.

@mujacica mujacica changed the title fix(native): fix incorrect stacktraces on Linux (ELF segment merging … fix(native): fix incorrect stacktraces on Linux (ELF segment merging + DWARF unwinding) Mar 22, 2026
@supervacuus
Copy link
Copy Markdown
Collaborator

Since this duplicates code from the Linux module finder, why not extract what you need from there?

@mujacica
Copy link
Copy Markdown
Contributor Author

@supervacuus that is the plan - just did a quick and dirty version to share with customer to try it out. Need to extract it from Unit testing code

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Memory leak: resolved not freed in error paths
    • Added sentry_free(resolved) to both error paths at lines 1634 and 1646 to prevent memory leak.
  • ✅ Fixed: Redundant uname() syscall in system info stream
    • Removed second uname() call and moved CSD version string generation inside the first uname() success block.

Create PR

Or push these changes by commenting:

@cursor push d5f42a5262
Preview (d5f42a5262)
diff --git a/src/backends/native/minidump/sentry_minidump_linux.c b/src/backends/native/minidump/sentry_minidump_linux.c
--- a/src/backends/native/minidump/sentry_minidump_linux.c
+++ b/src/backends/native/minidump/sentry_minidump_linux.c
@@ -515,17 +515,14 @@
 
     // Populate OS version from uname(), matching Crashpad behavior
     struct utsname uts;
+    char csd_version[256] = "";
     if (uname(&uts) == 0) {
         int major = 0, minor = 0, patch = 0;
         sscanf(uts.release, "%d.%d.%d", &major, &minor, &patch);
         sysinfo.major_version = (uint32_t)major;
         sysinfo.minor_version = (uint32_t)minor;
         sysinfo.build_number = (uint32_t)patch;
-    }
 
-    // Write CSD version string with full uname info, matching Crashpad
-    char csd_version[256] = "";
-    if (uname(&uts) == 0) {
         snprintf(csd_version, sizeof(csd_version), "%s %s %s %s", uts.sysname,
             uts.release, uts.version, uts.machine);
     }
@@ -1634,6 +1631,7 @@
         SENTRY_WARN("failed to write module list structure");
         sentry_free(mod_infos);
         sentry_free(module_list);
+        sentry_free(resolved);
         return -1;
     }
 
@@ -1645,6 +1643,7 @@
                     "name/CV patching");
         sentry_free(mod_infos);
         sentry_free(module_list);
+        sentry_free(resolved);
         return dir->rva ? 0 : -1;
     }

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@mujacica
Copy link
Copy Markdown
Contributor Author

@cursor push d5f42a5

mujacica and others added 9 commits March 27, 2026 09:15
…+ DWARF unwinding)

Fixes two root causes of incorrect stacktraces reported by customers on Linux:

1. ELF segment merging: /proc/pid/maps shows separate mappings per ELF LOAD
   segment. Without merging, base_of_image pointed to the code segment instead
   of the real ELF load base, causing all RVA computations to be off by the
   .text section offset. This broke server-side CFI unwinding.

2. DWARF-based backtrace capture: The daemon's FP-based stack walking fails
   on x86_64 (and any -fomit-frame-pointer build) because RBP is used as a
   general-purpose register. Now captures backtrace using libunwind (DWARF
   .eh_frame) in the signal handler and stores it in shared memory. The daemon
   prefers this over FP-walking when available. Also primes libunwind's cache
   at init time to avoid dl_iterate_phdr deadlocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract sentry__procmaps_parse_module_line() and
sentry__module_mapping_push() from the modulefinder as public API, and
use sentry__module_mapping_push() in the minidump writer's
resolve_modules() instead of duplicating the ELF segment merge logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes ClangCL -Wdocumentation error on Windows CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tackwalking

The native backend's minidump writer produced dumps that differed from
crashpad's in several ways that prevented symbolicator/minidump-stackwalk
from performing CFI-based unwinding:

- Module sizes used page-aligned /proc/maps sizes instead of ELF PT_LOAD
  segment ranges (compute_elf_size_from_phdrs)
- Module names used full filesystem paths instead of ELF DT_SONAME
  (read_elf_soname), causing symbol file lookup mismatches
- OS version fields (major/minor/build) were left as zeros instead of
  being populated via uname()
- Exception flags (si_code) were always 0 instead of reflecting the
  actual signal sub-code (e.g. SEGV_MAPERR)

With these fixes, native minidumps produce identical stackwalking results
to crashpad minidumps when processed with symbols (verified: 20-frame
deep call chain with trust=cfi on aarch64 with -fomit-frame-pointer).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix memory leak: free 'resolved' array in error paths at lines 1634 and 1646
- Fix redundant uname() syscall: reuse first call result for CSD version string

Applied via @cursor push command
@mujacica mujacica force-pushed the fix/native-backend-stacktraces branch from a712c2d to 6188949 Compare March 27, 2026 08:21
}

return max_vaddr - min_vaddr;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELF parsing functions duplicate existing modulefinder logic

Low Severity

compute_elf_size_from_phdrs and read_elf_soname duplicate ELF parsing patterns already present in sentry_modulefinder_linux.c (program header reading, section header traversal, etc.). The reviewer flagged this, and the author acknowledged it's a "quick and dirty version" that needs extraction. Duplicating this low-level ELF parsing increases the risk that a fix in one copy won't be applied to the other.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as a future exercise

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

jpnurmi and others added 2 commits March 27, 2026 14:21
parsed.inode was set to 0 early in the loop but unconditionally
overwritten just before calling sentry__module_mapping_push (lines
1506-1511), making the initial assignment dead code.
The loop guesses .dynstr by grabbing the first SHT_STRTAB section,
but that may be .shstrtab. When the sh_link override from .dynamic
fails bounds checking, clear the fallback so we bail out instead of
reading section names as SO names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi merged commit 8128cbe into master Mar 27, 2026
86 of 87 checks passed
@jpnurmi jpnurmi deleted the fix/native-backend-stacktraces branch March 27, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants