fix: memory and concurrency cleanup#540
Conversation
- Add CMakeLists.txt mirroring configure.ac feature checks - Add config/config.h.cmake.in template for cmake builds - Add build-cmake-linux CI job (gcc/clang matrix) - Add build-windows CI job (MSYS2/MinGW-w64, continue-on-error) - Windows crash dump collection via WER LocalDumps - Gate -Wall by compiler ID (GNU/Clang vs MSVC) - Use CMAKE_DL_LIBS instead of hardcoded -ldl - Parse net-snmp-config --libs into proper NETSNMP_LIBRARIES - Add SNMP_LOCALNAME compile check for feature parity - No C source files modified Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine.h depends on types from common.h (MYSQL, pid_t, size_t, pthread types, RESULTS_BUFFER from config.h). Add a compile-time guard that produces a clear error if spine.h is included without common.h, rather than cascading type errors. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
strncasecmp() returns 0 on match, but the comparisons used the raw return value as truthy, inverting the logic. TCP matched non-TCP strings and vice versa. Also fix: comparisons against 'hostname' instead of 'token' (lines 1020, 1032), and strncasecmp length 3 for 4-char strings "TCP6"/"UDP6" (should be 4). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
getarg(opt, &argv) advances the argv pointer on each call. Three successive calls in the --mode handler consumed three argv entries instead of one, corrupting subsequent argument parsing when using the space-separated form (--mode online). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
SECURITY.md promises spine refuses to start when spine.conf has any group or world bit set or is owned by someone other than root or the startup euid. The old soft-warn path leaked credentials on a misconfigured deploy. Open the file with O_NOFOLLOW so a planted symlink at /etc/spine.conf cannot redirect credential loading, add O_CLOEXEC so the fd does not leak into poll-script children, and treat ELOOP, mode drift, and owner mismatch as fatal. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The previous handler called stdio, time(3), localtime(3), strftime(3), and exit(3). None are async-signal-safe, so a SIGSEGV mid-allocation could deadlock inside libc. Pre-format per-signal messages at startup, emit with write(2), and use _exit(128 + signo) on SIGSEGV/SIGBUS/SIGFPE/ SIGABRT to skip atexit handlers. Drop SIGPIPE from the fatal set and ignore it process-wide so a poll script closing stdout early no longer kills spine. Remove the redundant legacy signal() loop. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…e into children Every nft_popen'd poll script used to inherit the full descriptor table, including privileged ICMP sockets, the authenticated MySQL connection, the audit netlink fd, and the log file. A poll script that shells out could reuse any of them. Set cloexec at creation time: SOCK_CLOEXEC atomically on Linux/BSD with fcntl fallback for macOS, fcntl after each raw ICMP socket(), the MySQL net.fd after connect, the audit fd after audit_open(), and O_CLOEXEC on the log open(). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Two bugs: strncpy(name->hostname, hostname, sizeof(name->hostname)) leaves the buffer unterminated when the source equals or exceeds the bound, and strncpy(...) followed by name->hostname[strlen(token)] = 0 writes one-past-the-buffer for tokens >= sizeof. Funnel both branches through strncopy(), which caps at the bound and always NUL-terminates. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
SNMPv3 engine IDs are opaque octet strings (RFC 3411). strlen() on the engine-ID buffer truncates at the first 0x00, which any RFC-compliant engine ID is free to contain. Decode the Cacti hex-string form at DB load time, carry bytes + length in target_t and host_t, and pass both to snmp_host_init(). The hex path stays as a transitional fallback. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR addresses a broad set of security-review findings focused on memory safety, concurrency robustness, and safer cross-platform runtime behavior.
Changes:
- Adds new platform abstraction layers (process, fd, socket, error, ICMP) and integrates them into hot paths (PHP, SNMP, DB retry sleeps).
- Introduces new security/ops integrations (systemd sd_notify helpers, audit logging with rate limiting, sandbox stubs/implementations, SELinux/AppArmor assets).
- Updates build/release/tooling ecosystem (Docker, Debian/RPM packaging, local test scripts, CI workflows, Nix flake, docs) to align with the CMake/Ninja build path.
Reviewed changes
Copilot reviewed 123 out of 185 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/spine_audit.c / src/spine_audit.h | Adds libaudit wrapper + rate limiting for auditable security events |
| src/sql.c | Adds DB write short-circuit for dry-run; improves retry sleep portability; tightens SSL verification; sets CLOEXEC on MySQL socket |
| src/platform/* | Introduces portable wrappers for sleep/env, fds, sockets, spawn, ICMP, and error formatting |
| src/snmp_engine_id.c / src/snmp.c / src/snmp.h | Adds standalone engine-id decoder and extends SNMP session init to support binary engine IDs |
| src/systemd_notify.[ch] / etc/systemd/* | Adds sd_notify integration and systemd unit/timer templates |
| scripts/* / packaging/* / Dockerfile* / .github/workflows/* | Moves build/test/release automation towards CMake/Ninja and expands security posture automation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * -1 means "not yet attempted"; -2 means "attempted and failed, stop | ||
| * retrying" so a non-audit-enabled kernel doesn't burn syscalls forever. */ | ||
| #ifdef HAVE_LIBAUDIT | ||
| static int g_audit_fd = -1; |
There was a problem hiding this comment.
g_audit_fd and the rate-limit buckets (g_buckets) are mutated without any synchronization. If spine_audit_event() can be reached concurrently from multiple threads (e.g., worker threads tripping the circuit breaker), this introduces data races (undefined behavior) and can also leak/duplicate audit fds due to concurrent audit_open() calls. Mandatory fix: guard rate_allow() + the lazy-open block with a mutex (or use atomics + double-checked locking for g_audit_fd), and make the per-bucket counters thread-safe (mutex per bucket or a single global lock).
| int notified; | ||
| }; | ||
|
|
||
| static struct rate_bucket g_buckets[AUDIT_BUCKET__COUNT]; |
There was a problem hiding this comment.
g_audit_fd and the rate-limit buckets (g_buckets) are mutated without any synchronization. If spine_audit_event() can be reached concurrently from multiple threads (e.g., worker threads tripping the circuit breaker), this introduces data races (undefined behavior) and can also leak/duplicate audit fds due to concurrent audit_open() calls. Mandatory fix: guard rate_allow() + the lazy-open block with a mutex (or use atomics + double-checked locking for g_audit_fd), and make the per-bucket counters thread-safe (mutex per bucket or a single global lock).
| static int rate_allow(enum audit_bucket b, const char *op) { | ||
| time_t now = g_clock(); | ||
| struct rate_bucket *rb = &g_buckets[b]; | ||
|
|
||
| if (rb->window_start == 0 || now - rb->window_start >= SPINE_AUDIT_RATE_WINDOW_SECS) { | ||
| rb->window_start = now; | ||
| rb->count = 0; | ||
| rb->notified = 0; | ||
| } | ||
|
|
||
| if (rb->count < SPINE_AUDIT_RATE_MAX) { | ||
| rb->count++; | ||
| return 1; | ||
| } | ||
|
|
||
| if (!rb->notified) { | ||
| fprintf(stderr, "NOTE: audit rate limit reached for op=%s\n", | ||
| op ? op : "(null)"); | ||
| rb->notified = 1; | ||
| } | ||
| return 0; | ||
| } | ||
|
|
||
| void spine_audit_event(const char *op, const char *detail, int result) { | ||
| if (!rate_allow(classify(op), op)) return; | ||
|
|
||
| #ifdef HAVE_LIBAUDIT | ||
| if (g_audit_fd == -2) return; | ||
| if (g_audit_fd == -1) { | ||
| g_audit_fd = audit_open(); | ||
| if (g_audit_fd < 0) { | ||
| g_audit_fd = -2; | ||
| return; | ||
| } |
There was a problem hiding this comment.
g_audit_fd and the rate-limit buckets (g_buckets) are mutated without any synchronization. If spine_audit_event() can be reached concurrently from multiple threads (e.g., worker threads tripping the circuit breaker), this introduces data races (undefined behavior) and can also leak/duplicate audit fds due to concurrent audit_open() calls. Mandatory fix: guard rate_allow() + the lazy-open block with a mutex (or use atomics + double-checked locking for g_audit_fd), and make the per-bucket counters thread-safe (mutex per bucket or a single global lock).
| static void spine_mysql_set_cloexec(MYSQL *mysql) { | ||
| #ifndef _WIN32 | ||
| if (mysql == NULL) return; | ||
| int fd = (int) mysql->net.fd; | ||
| if (fd < 0) return; | ||
| int fl = fcntl(fd, F_GETFD); | ||
| if (fl >= 0) { | ||
| (void) fcntl(fd, F_SETFD, fl | FD_CLOEXEC); | ||
| } |
There was a problem hiding this comment.
This accesses mysql->net.fd, which is an internal struct member and not part of the public MySQL C API. Even if it works on current MySQL/MariaDB client libs, it’s brittle across connector versions and can break compilation (or silently change semantics) when the library updates its internal layout. Prefer the supported API when available (e.g., mysql_get_socket() behind a feature test) and fall back to best-effort behavior if the API isn’t present.
| static void spine_nanosleep_us(unsigned long microseconds) { | ||
| struct timespec req; | ||
| req.tv_sec = (time_t)(microseconds / 1000000UL); | ||
| req.tv_nsec = (long)((microseconds % 1000000UL) * 1000UL); | ||
| while (nanosleep(&req, &req) == -1 && (req.tv_sec > 0 || req.tv_nsec > 0)) { | ||
| /* Resume after EINTR with remaining time. */ | ||
| } | ||
| } |
There was a problem hiding this comment.
These retry loops don’t check errno == EINTR. If nanosleep() fails for any other reason (e.g., EINVAL due to an unexpected platform quirk), the loop can spin indefinitely because the “remaining time” condition is not a reliable proxy for “interrupted by signal.” Mandatory fix: retry only when nanosleep() fails with errno == EINTR; otherwise break/return.
| hex += 2; | ||
| hex_len -= 2; | ||
| } | ||
| if (hex_len == 0 || (hex_len % 2) != 0 || (int)(hex_len / 2) > bin_cap) { |
There was a problem hiding this comment.
(int)(hex_len / 2) > bin_cap can overflow/truncate when hex_len is large (size_t → int), potentially turning an oversized input into a “passes” case. Fix by keeping the comparison in size_t space (e.g., compare (hex_len / 2) to (size_t)bin_cap) after the existing bin_cap <= 0 guard.
| if (hex_len == 0 || (hex_len % 2) != 0 || (int)(hex_len / 2) > bin_cap) { | |
| if (hex_len == 0 || (hex_len % 2) != 0 || (hex_len / 2) > (size_t)bin_cap) { |
| @@ -0,0 +1,308 @@ | |||
| name: Static Analysis | |||
There was a problem hiding this comment.
The PR title/description focuses on “memory and concurrency cleanup from the security review”, but the diff also introduces substantial build/packaging/CI/workflow infrastructure (multiple new workflows, Nix flake, Docker and packaging conversion to CMake/Ninja, SELinux/AppArmor assets, systemd units/timer, devcontainer, etc.). This scope expansion makes the PR harder to review and reason about operationally. Mandatory: either update the PR description to explicitly list these additional deliverables, or split the infrastructure/build-system work into a separate PR from the runtime safety fixes.
|
|
||
| # Sanity checks | ||
| [ ! -e configure.ac ] && echo "ERROR: Your current working directory must be the SVN check out of Spine" && exit -1 | ||
| [ ! -e CMakeLists.txt ] && echo "ERROR: Your current working directory must be a Spine source checkout" && exit -1 |
There was a problem hiding this comment.
exit -1 is not portable in shell scripts (exit codes are 0–255; -1 becomes 255). Use exit 1 (or another explicit positive code) for predictable behavior across shells and CI environments.
| #elif defined(__sun) || defined(__sun__) | ||
| (void) pthread_setname_np(pthread_self(), name); |
There was a problem hiding this comment.
The condition redundantly checks defined(__sun) twice. This should be deduplicated (e.g., defined(__sun) || defined(__sun__) if you intended both macros, or just one if only one is expected in practice) to reduce noise and avoid confusion during future portability work.
| #if defined(MARIADB_BASE_VERSION) || defined(MARIADB_VERSION_ID) | ||
| # define SPINE_SSL_VERIFY_T my_bool | ||
| #elif defined(MYSQL_VERSION_ID) && MYSQL_VERSION_ID >= 80000 | ||
| # define SPINE_SSL_VERIFY_T bool | ||
| #else | ||
| # define SPINE_SSL_VERIFY_T my_bool | ||
| #endif |
There was a problem hiding this comment.
SPINE_SSL_VERIFY_T is defined inside a .c file but not #undef’d afterward. Even though it’s in a single translation unit, leaving helper macros defined beyond their usage increases the chance of accidental reuse/collision in later edits. Consider #undef SPINE_SSL_VERIFY_T after the last use (optional but helps keep preprocessor scope tight).
| size_t body_len = BUFSIZE + 64; | ||
| char *body = malloc(body_len + 2); | ||
| ASSERT_TRUE(body != NULL); | ||
| memcpy(body, "DB_Host ", 8); |
| remaining = MEGA_BUFSIZE - (qp - querybuf); | ||
| qp += snprintf(qp, remaining, " WHERE disabled = ''"); | ||
| size_t remaining = (size_t)(MEGA_BUFSIZE - (qp - querybuf)); | ||
| int n; |
| sqlp += n; remaining -= (size_t)n; | ||
|
|
||
| sqlp += append_hostrange(sqlp, "host_id"); | ||
| remaining = BUFSIZE - (size_t)(sqlp - sqlbuf); |
| @@ -711,12 +765,30 @@ | |||
| set.end_host_id, | |||
| num_rows)); | |||
| } else { | |||
| size_t remaining = BUFSIZE; | |||
| int n; | |||
| if (buff[i] != '\0') { | ||
| spine_config_warn("WARNING: %s:%d embedded NUL byte rejected\n", | ||
| file, lineno); | ||
| return -1; |
|
|
||
| static int long_flag_is_cred(const char *flag, size_t flag_len) { | ||
| int i; | ||
| size_t n; |
|
|
||
| static int long_flag_is_cred(const char *flag, size_t flag_len) { | ||
| int i; | ||
| size_t n; |
98ab52a to
34169c8
Compare
| char *body = malloc(body_len + 2); | ||
| ASSERT_TRUE(body != NULL); | ||
| memcpy(body, "DB_Host ", 8); | ||
| memset(body + 8, 'x', body_len - 8); |
| ASSERT_TRUE(body != NULL); | ||
| memcpy(body, "DB_Host ", 8); | ||
| memset(body + 8, 'x', body_len - 8); | ||
| body[body_len] = '\n'; |
| memcpy(body, "DB_Host ", 8); | ||
| memset(body + 8, 'x', body_len - 8); | ||
| body[body_len] = '\n'; | ||
| body[body_len + 1] = '\0'; |
H1 (SECURITY.md) refuses startup when spine.conf has any group or world bit set or is owned by someone other than root or the startup euid. The integration fixtures shipped the file at the COPY/checkout default of 0644, which H1 correctly rejects. Dockerfile: add RUN chmod 0600 after the COPY so the baked conf (owned by root, mode 0600) satisfies both H1 checks. This fixes the Docker Integration Tests, Poll Timing Benchmark, and build-cmake-linux- sanitizers jobs that use the production image without a bind-mount override. docker-compose.yml: the bind-mount of tests/snmpv3/spine/spine.conf arrives at 0644 from the git checkout and cannot be chmod'd through a :ro mount. Switch to a staging mount at spine.conf.src, set user: "0:0" so the process euid is root, and add a wrapper entrypoint that installs the staged conf to /tmp/spine.conf at 0600 before exec-ing spine. ci.yml / distro-matrix.yml: add chmod 0600 tests/snmpv3/spine/spine.conf before the cmake build in both FreeBSD jobs so ctest passes on FreeBSD 14 (Tier 1). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…hecks
read_spine_config() relied on sscanf("%15s %255s"), which silently
truncated keywords > 15 bytes and values > 255 bytes and dropped the
remainder of any value containing whitespace. The new tokenizer reads
whole lines, rejects over-length lines, embedded NULs, and overlong
keywords with a warning, and preserves interior whitespace so passwords
with spaces round-trip. DB_Port, RDB_Port, DB_UseSSL, RDB_UseSSL, and
CircuitBreakerThreshold now range-check via strtoul. Config open uses
O_NOFOLLOW|O_CLOEXEC so a swapped symlink cannot redirect credential
reads, and ELOOP is reported distinctly. Security-relevant warnings
route through spine_config_warn() so systemd captures them unconditionally.
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Spine inherits whatever umask the service manager hands it, commonly 0022 on systemd. A 0022 mask leaves log files group-readable, which matters because spine.log captures query errors that can include hosts, user names, and SQL fragments. umask(027) is installed immediately after spine_capture_startup_euid(), before any fd is opened, so the log file, pid file, and any temp files land as 0640. The prior umask is stashed and logged at debug level so operators can see what they inherited. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The inline scrub in main() only fired on the happy path. die() bypassed it, leaving credentials in memory for any post-mortem that catches the fatal-error exit (core dump, journal rotation, etc.). Extracted the logic to spine_scrub_secrets() in util.c so both paths go through the same code. The helper uses explicit_bzero() when available (glibc >= 2.25, musl, *BSD) and a volatile-pointer memset fallback elsewhere. Signal handlers remain untouched; they must stay AS-safe and scrubbing from them is not. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
mkstemp(3) honours the process umask, so on Linux with umask 022 temp files land at 0644. open(2) mode applies only when O_CREAT creates a new file; re-opening an existing mkstemp path with O_CREAT|O_TRUNC,0600 leaves the inode mode untouched. H1 then rejects every test config with rc=-1 before the parser is even reached. Replace write_conf with make_test_conf, which calls fchmod(fd, 0600) on the mkstemp fd before writing content, matching the pattern used in test_config_perms.c. Also add umask(077) and spine_capture_startup_euid() at the top of main so the owner check passes (the test process creates the file, so geteuid() matches st_uid). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
If strftime returns 0 the buffer is left untouched, so the subsequent strncat in spine_log would read uninitialized stack bytes. Zero the first byte up front so the buffer is always a valid C string. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine_auth and spine_priv are fixed BUFSIZE arrays. The strcat-based list builder has no bounds check, so a large net-snmp capability set could walk off the end. Replace with a local APPEND_CSV_TOKEN macro that tracks used length and truncates instead of overflowing. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The per-device details slot can be cleared by reload or delete under LOCK_THDET while a worker is still running. The completion block dereferences details[device_counter] unconditionally, which segfaults if the slot was freed. Hold LOCK_THDET while testing for NULL and skip the accounting with a debug log when the slot is gone. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The trip path dropped spine_cb_lock and then logged host_id and skip_cycles. A concurrent reload can free the entry between unlock and log, turning the second read into a use-after-free. Snapshot both values into stack locals under the lock, unlock, then log from the locals. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
34169c8 to
9c595a3
Compare
|
Superseded by consolidated PR #535. All 11 memory/concurrency cleanup commits are now on feat/distro-test-matrix via fast-forward. |
Memory and concurrency cleanup from the security review.
Scope
log: initializeflogmessagebeforestrftimeto prevent reading uninitialized stack on strftime failure.util: guarddie()strncatagainst size underflow; replacestrcatin auth/priv list builder with boundedsnprintf-append; guardsnprintfaccumulators against truncation inappend_hostrangeand related sites.breaker: stack-copy state before unlock so logging survives a future concurrent delete; clampCircuitBreakerThresholdto[1, 1000000].nft_popen: replaceassert(prev != NULL)with a runtime null check for concurrent unlink (safer underNDEBUG).db: null-guardmysql_fetch_rowdereferences at six hot-loop call sites.php: breakreadpipeloop when the result buffer is full (prevents 0-size read spin).log: redact SNMP community strings and auth args (-c,-u,-a,-x,-p,--password,--secret) from command-error logs.signals: setsa.sa_flagsexplicitly per sigaction block.poller: null-guarddetails[device_counter]in the worker thread.Verification
21/21 tests pass. New tests:
test_snprintf_guard,test_redact_args; extendedtest_circuit_breakerwith clamp case.