Skip to content

Add libbacktrace fallback for stack traces on musl/Alpine#3581

Open
hanxizh9910 wants to merge 22 commits into
valkey-io:unstablefrom
hanxizh9910:libbacktrace-alpine-support
Open

Add libbacktrace fallback for stack traces on musl/Alpine#3581
hanxizh9910 wants to merge 22 commits into
valkey-io:unstablefrom
hanxizh9910:libbacktrace-alpine-support

Conversation

@hanxizh9910

@hanxizh9910 hanxizh9910 commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Problem

On Alpine Linux (musl-based), execinfo.h is not available, so Valkey produces no stack trace on crash. This makes debugging crashes on Alpine very difficult.

Solution

Add libbacktrace as a fallback for stack frame collection on systems without execinfo.h.

  • src/config.h: Split HAVE_BACKTRACE (generic capability) from HAVE_EXECINFO (execinfo-specific). When USE_LIBBACKTRACE is defined and HAVE_EXECINFO is not, HAVE_BACKTRACE is still set so the crash handler compiles in on Alpine.
  • src/debug.c:
    • Add valkey_backtrace() using backtrace_simple() as a drop-in replacement for backtrace() on musl. Macro-aliases backtrace() to valkey_backtrace() so no other code needs to change.
    • Fix fallback messages in symbolizeWithLibbacktrace to correctly distinguish between "execinfo fallback available" and "no fallback available".
  • src/server.c / src/server.h: Initialize backtrace_state once at startup via initLibbacktraceFrameState() to avoid calling malloc from the signal handler, which could deadlock if a crash occurs while the allocator lock is held.

Test Changes

  • Updated bio thread count check from 3 to 5
  • Removed debugCommand pattern check

Notes

Testing

Built on Alpine 3.23 with USE_LIBBACKTRACE=yes and triggered a crash via DEBUG SEGFAULT. Stack traces are produced correctly for all threads.

  • After the implementation:
Part of alpine implementation screenshot

Summary by CodeRabbit

  • New Features

    • Optional libbacktrace support with startup initialization to improve crash symbolization when available.
  • Bug Fixes

    • Separate detection for execinfo-based backtraces and improved logic so libbacktrace is enabled when configured.
    • Safer, conditional symbolization and clearer fallbacks when native backtrace APIs are unavailable.
  • Tests

    • Runtime test updated to detect libbacktrace-enabled builds and adjust behavior.

Review Change Stack

@hanxizh9910 hanxizh9910 marked this pull request as draft April 28, 2026 21:12
@codecov

codecov Bot commented Apr 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.69%. Comparing base (a813df0) to head (23cfe1e).
⚠️ Report is 40 commits behind head on unstable.

Files with missing lines Patch % Lines
src/debug.c 76.47% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3581      +/-   ##
============================================
- Coverage     76.94%   76.69%   -0.26%     
============================================
  Files           162      162              
  Lines         80656    80673      +17     
============================================
- Hits          62058    61869     -189     
- Misses        18598    18804     +206     
Files with missing lines Coverage Δ
src/server.c 89.48% <100.00%> (-0.03%) ⬇️
src/server.h 100.00% <ø> (ø)
src/debug.c 55.11% <76.47%> (+0.27%) ⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 2 times, most recently from 836b39d to 71ae67a Compare April 30, 2026 00:36
@hanxizh9910 hanxizh9910 marked this pull request as ready for review April 30, 2026 01:35
@rainsupreme rainsupreme self-requested a review April 30, 2026 19:43

@rainsupreme rainsupreme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good change! Thanks for the fix 😊 I just had a couple ideas to simplify/document a little more.

I'm wondering why the stack trace in your testing was so short. Normally I'd expect it to be somewhat longer. Maybe if you built with CFLAGS="-g" there would be more info. I don't think this is too important though.

Comment thread src/debug.c
bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL);
}

static int valkey_backtrace(void **trace, int max_size) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a comment here explaining that we preallocate bt_frame_state because allocating is unsafe while handling an async signal (such as crash handling). This probably isn't immediately obvious and might help out the next person to read this code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree! I just added the comment.

Comment thread src/debug.c Outdated
#ifdef HAVE_BACKTRACE
#define BACKTRACE_MAX_SIZE 100

#if defined(USE_LIBBACKTRACE) && !defined(HAVE_EXECINFO)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When USE_LIBBACKTRACE is enabled, we should use backtrace_simple() from libbacktrace for frame collection unconditionally — not just when execinfo.h is missing. Both call _Unwind_Backtrace() under the hood, so there's no benefit to keeping the glibc path as a special case. This simplifies the guard here to just #ifdef USE_LIBBACKTRACE

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch from 5e41066 to c75b577 Compare May 6, 2026 20:26
@hanxizh9910

Copy link
Copy Markdown
Contributor Author

This looks like a good change! Thanks for the fix 😊 I just had a couple ideas to simplify/document a little more.

I'm wondering why the stack trace in your testing was so short. Normally I'd expect it to be somewhat longer. Maybe if you built with CFLAGS="-g" there would be more info. I don't think this is too important though.

Hi, the build already includes -g by default(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L142). The short trace is likely due to -03 optimization(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L23) which inlines many functions

@rainsupreme rainsupreme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this improvement!

@roshkhatri roshkhatri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good fix and I have also reviewed the container PR, that is well. just have one question about grabbing the io-threads backtrace.

Also check if we have a test to tests this change in daily.yml

Comment thread src/debug.c Outdated
/* Preallocate at startup: backtrace_create_state() calls malloc, which is not
* async-signal-safe and could deadlock if called from a crash signal handler. */
void initLibbacktraceFrameState(void) {
bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, should we make this threadsafe? meaning backtrace_create_state(NULL, 1, bt_simple_error_cb, NULL);

I just want to be sure that we would not mess up the stacktrace when geting the stack threads when io-threads > 1

backtrace_create_state has an option to enable threaded=1

@hanxizh9910 hanxizh9910 added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label May 11, 2026
@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 2 times, most recently from e8e2574 to 4d12fc5 Compare May 12, 2026 06:23
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds libbacktrace-backed crash stack collection, separates execinfo.h detection into HAVE_EXECINFO, forces HAVE_BACKTRACE when USE_LIBBACKTRACE is enabled, updates symbolization fallbacks, and initializes libbacktrace state at server startup.

Changes

Libbacktrace integration with separated backtrace detection

Layer / File(s) Summary
Backtrace feature detection
src/config.h
Defines HAVE_EXECINFO for execinfo.h availability and ensures HAVE_BACKTRACE is defined when USE_LIBBACKTRACE is set even if platform detection did not.
Libbacktrace stack collection
src/debug.c
Conditionally include <execinfo.h> when HAVE_EXECINFO is set; add preallocated bt_frame_state, initLibbacktraceFrameState(), valkey_backtrace() using backtrace_simple, and #define backtrace(trace,size) valkey_backtrace(trace,size) when USE_LIBBACKTRACE.
Error handling and symbolization
src/debug.c
On libbacktrace symbolization failures or fork errors, use backtrace_symbols_fd only if HAVE_EXECINFO is available; otherwise emit explicit fallback messages. Use symbolizeWithLibbacktrace() for eip symbolization when USE_LIBBACKTRACE is enabled.
Public API and server initialization
src/server.h, src/server.c
Add guarded prototype initLibbacktraceFrameState() and call it from initServer() when USE_LIBBACKTRACE is enabled to initialize libbacktrace frame state at startup.
Test backtrace detection
tests/support/util.tcl
Add runtime check that greps the server binary for initLibbacktraceFrameState inside system_backtrace_supported to exercise the libbacktrace detection path in tests.

Sequence Diagram

sequenceDiagram
  participant Server as Server (logStackTrace)
  participant ValkeyBacktrace as valkey_backtrace
  participant Libbacktrace as libbacktrace (backtrace_simple)
  Server->>ValkeyBacktrace: invoke backtrace(trace, size)
  ValkeyBacktrace->>Libbacktrace: backtrace_simple(bt_frame_state, callback)
  Libbacktrace-->>ValkeyBacktrace: addresses via callback
  ValkeyBacktrace-->>Server: return frame addresses
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I padded through configs, nose to the code,
Found frames and symbols down a new road,
Libbacktrace readied, state snug and tight,
When crashes arrive, the stack hops to light,
A rabbit's cheer for traces found tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main objective of the pull request: adding libbacktrace as a fallback mechanism to enable stack traces on musl-based systems like Alpine Linux.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly explains the problem (no stack traces on Alpine Linux) and details the solution across multiple files with specific implementation changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/server.c`:
- Around line 2915-2917: Move the libbacktrace pre-initialization so it runs
before signal handlers are registered: call initLibbacktraceFrameState() (under
the USE_LIBBACKTRACE ifdef) prior to invoking setupSignalHandlers(), ensuring
the libbacktrace state is initialized before any signal handler can run; update
the order in the initialization sequence accordingly and keep the
USE_LIBBACKTRACE guard around initLibbacktraceFrameState().

In `@tests/support/util.tcl`:
- Around line 1226-1233: The current catch block mistakenly errors out when it
finds "USE_LIBBACKTRACE" (leftover debug) so the function proceeds to the musl
check and returns 0; change the logic so that when grep finds "USE_LIBBACKTRACE"
(i.e. buildinfo is non-empty) the function returns 1 to indicate backtrace
support instead of calling error. Specifically, update the catch block that
inspects buildinfo (from exec grep -a "USE_LIBBACKTRACE" $::VALKEY_SERVER_BIN)
to avoid the intentional error and instead set/return a success value (1) when
buildinfo is non-empty so the presence of USE_LIBBACKTRACE correctly signals
supported backtraces.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e662b773-4c5f-4289-883f-3356a863ab06

📥 Commits

Reviewing files that changed from the base of the PR and between fdd9039 and 843029c.

📒 Files selected for processing (5)
  • src/config.h
  • src/debug.c
  • src/server.c
  • src/server.h
  • tests/support/util.tcl

Comment thread src/server.c
Comment thread tests/support/util.tcl

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/support/util.tcl (1)

1228-1230: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

system_backtrace_supported should not throw on a positive libbacktrace detection

At Line 1229, this turns a supported case into a hard error, so the proc no longer reliably returns a boolean capability result.

🐛 Proposed fix
-    if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} {
-        error "PROOF: Found initLibbacktraceFrameState symbol - this code path IS being executed on [exec uname -s]"
-    }
+    if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} {
+        return 1
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/support/util.tcl` around lines 1228 - 1230, The check in
system_backtrace_supported currently calls error when initLibbacktraceFrameState
is found, which turns a supported case into an exception; change that branch so
it does not throw but instead returns a truthy boolean (e.g., return 1 or set
the proc's result to true) when grep finds the symbol, and ensure the proc still
returns a falsy boolean when the symbol is not found—update the block
referencing initLibbacktraceFrameState in tests/support/util.tcl to return
success rather than calling error so system_backtrace_supported consistently
returns a boolean capability result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/support/util.tcl`:
- Around line 1228-1230: The check in system_backtrace_supported currently calls
error when initLibbacktraceFrameState is found, which turns a supported case
into an exception; change that branch so it does not throw but instead returns a
truthy boolean (e.g., return 1 or set the proc's result to true) when grep finds
the symbol, and ensure the proc still returns a falsy boolean when the symbol is
not found—update the block referencing initLibbacktraceFrameState in
tests/support/util.tcl to return success rather than calling error so
system_backtrace_supported consistently returns a boolean capability result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e4fccd43-fc20-4805-9006-c102d0604848

📥 Commits

Reviewing files that changed from the base of the PR and between c750932 and 867eb3d.

📒 Files selected for processing (1)
  • tests/support/util.tcl

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 3 times, most recently from 0f1ab09 to 23cfe1e Compare May 13, 2026 05:39

@roshkhatri roshkhatri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…cktrace returns 1 instead of 0

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…stered

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…e is being executed

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
… resolved

Module crash tests expect assertCrash and modulesCollectInfo to appear
in stack traces, but Alpine/musl doesn't resolve symbols. Make these
checks conditional on symbol availability, same as debugCommand check.

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch from 23cfe1e to 9bc6942 Compare June 2, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants