CXXCBC-830: Unstick Jenkins matrix, shrink Windows wall-clock, survive fork(child)#952
Merged
Conversation
46a1d30 to
19a6035
Compare
…e fork(child) The cxx-scripted-build-pipeline regressed on every Linux stage after CXXCBC-777 / CXXCBC-778 added gRPC and protobuf as FetchContent dependencies, the Windows stage's wall-clock grew to ~2 h, and once the build matrix turned green the integration suite then surfaced a null-pointer dereference in the fork example plus a stale env-vs-runtime gate in one observability test. Seven consecutive runs (5223, 5224, 5226, 5227, 5229, 5232, 5238) peeled off cascading failures at successive layers; CXXCBC-830 enumerates the observed symptoms. Four groups of code-side fixes: * Third-party wiring. cmake/ThirdPartyDependencies.cmake sets CURL_ENABLE_EXPORT_TARGET=OFF and CURL_DISABLE_SRP=ON before the OpenTelemetry cpmaddpackage so curl can configure and compile against our static BoringSSL on Linux hosts without libcurl-dev. cmake/Cache.cmake stops propagating find_program's CACHE_BINARY-NOTFOUND sentinel into CMAKE_*_COMPILER_LAUNCHER. cmake/CompilerWarnings.cmake confines four clang-only -Wno-* flags to the CLANG_WARNINGS list so GCC's -Werror is happy. * Toolchain strictness. core/protocol/cmd_mutate_in.hxx, test/test_integration_tracer.cxx, and tools/sysinfo.cxx each get a single-line change to satisfy GCC 8/9/10's stricter -Wconversion, -Wpedantic, and -Wsign-compare respectively. test/test_unit_mcbp_session.cxx moves a constexpr int into the lambda body so MSVC accepts it without making clang complain about an unused capture. * Windows throughput. CMakeLists.txt adds add_compile_options(/MP) for MSVC at top scope, scoped to C/CXX via $<COMPILE_LANGUAGE> so it does not reach BoringSSL's NASM-assembled x86 .asm sources. All FetchContent dependencies inherit the flag. bin/build-tests.rb and .github/workflows/build.yml let --parallel default to Etc.nprocessors instead of capping at 1 or 2. cmake/GrpcProtobuf.cmake marks six unused gRPC subtargets EXCLUDE_FROM_ALL so grpc_unsecure et al. stop landing in ALL_BUILD. * Runtime correctness. core/cluster.cxx stops std::move()'ing the tracer_, meter_, app_telemetry_meter_, app_telemetry_reporter_, and orphan_reporter_ members out during close(): the public-API cluster::notify_fork(fork_event::child) recreates the outer impl, but pre-fork bucket/collection refs keep the inner core impl alive through their shared_ptr, so observability fields zeroed by close() produced null derefs on the next call. Stop in place instead, with a comment block above close() explaining the rule for future maintainers (members whose public accessor does not check stopped_ must not be moved out on close). test/test_integration_wrapper_tracer.cxx changes its version gate from integration.ctx.version (sourced from TEST_SERVER_VERSION env var, defaults to 6.6.0 when unset and the Jenkins pipeline does not export it) to integration.cluster_version() (runtime-derived from /pools/default), aligning with every other test in the suite. The Jenkins pipeline definition still passes /m:4 explicitly; that external one-line change is documented in jenkins-disk-defense.md. Local reconfigure + SDK rebuild + affected-test rebuild under GCC 16 are clean; matrix validation comes from the next cxx-scripted-build-pipeline run.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses CI/build regressions introduced by new FetchContent dependencies (gRPC/protobuf/OpenTelemetry), reduces Windows build wall-clock time, and fixes a fork(child) runtime crash by making cluster shutdown preserve observability state for still-live pre-fork references.
Changes:
- CMake/CI: adjust curl/OpenTelemetry wiring, cache launcher handling, and compiler warning flags to keep GCC/MSVC/Clang builds green.
- Windows throughput: enable MSVC multiprocess compilation, increase build parallelism defaults, and exclude unused gRPC subtargets from ALL_BUILD.
- Runtime/tests: prevent observability shared_ptr members from being moved out during
cluster_impl::close(), and align an integration test’s version gate with runtime-derived cluster version.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
CMakeLists.txt |
Enables MSVC /MP (scoped by compile language) to improve Windows compile throughput. |
cmake/ThirdPartyDependencies.cmake |
Forces curl config options to work with static BoringSSL in the OpenTelemetry FetchContent path. |
cmake/Cache.cmake |
Prevents propagating *-NOTFOUND cache sentinel into compiler launcher variables. |
cmake/CompilerWarnings.cmake |
Moves clang-only -Wno-* suppressions out of common/GCC warning sets to avoid GCC -Werror failures. |
cmake/GrpcProtobuf.cmake |
Marks unused gRPC targets EXCLUDE_FROM_ALL to avoid compiling them on Windows ALL_BUILD. |
core/protocol/cmd_mutate_in.hxx |
Fixes a GCC conversion warning via explicit cast in bitmasking. |
core/cluster.cxx |
Stops observability members in place during close to avoid null derefs after fork(child) with still-live references. |
tools/sysinfo.cxx |
Fixes a sign-compare warning in PID range validation via explicit cast. |
test/test_unit_mcbp_session.cxx |
Adjusts constexpr placement to satisfy MSVC without triggering clang unused-capture warnings. |
test/test_integration_wrapper_tracer.cxx |
Switches feature gating to runtime-derived cluster_version() to match suite behavior. |
test/test_integration_tracer.cxx |
Removes an extra trailing semicolon to satisfy stricter compiler diagnostics. |
bin/build-tests.rb |
Defaults build parallelism to Etc.nprocessors rather than a fixed low value. |
.github/workflows/build.yml |
Stops pinning CB_NUMBER_OF_JOBS, relying on the script’s new auto-detection. |
thejcfactor
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The cxx-scripted-build-pipeline regressed on every Linux stage after CXXCBC-777 / CXXCBC-778 added gRPC and protobuf as FetchContent dependencies, the Windows stage's wall-clock grew to ~2 h, and once the build matrix turned green the integration suite then surfaced a null-pointer dereference in the fork example plus a stale env-vs-runtime gate in one observability test. Seven consecutive runs (5223, 5224, 5226, 5227, 5229, 5232, 5238) peeled off cascading failures at successive layers; CXXCBC-830 enumerates the observed symptoms.
Four groups of code-side fixes:
Third-party wiring. cmake/ThirdPartyDependencies.cmake sets CURL_ENABLE_EXPORT_TARGET=OFF and CURL_DISABLE_SRP=ON before the OpenTelemetry cpmaddpackage so curl can configure and compile against our static BoringSSL on Linux hosts without libcurl-dev. cmake/Cache.cmake stops propagating find_program's CACHE_BINARY-NOTFOUND sentinel into CMAKE__COMPILER_LAUNCHER. cmake/CompilerWarnings.cmake confines four clang-only -Wno- flags to the CLANG_WARNINGS list so GCC's -Werror is happy.
Toolchain strictness. core/protocol/cmd_mutate_in.hxx, test/test_integration_tracer.cxx, and tools/sysinfo.cxx each get a single-line change to satisfy GCC 8/9/10's stricter -Wconversion, -Wpedantic, and -Wsign-compare respectively. test/test_unit_mcbp_session.cxx moves a constexpr int into the lambda body so MSVC accepts it without making clang complain about an unused capture.
Windows throughput. CMakeLists.txt adds add_compile_options(/MP) for MSVC at top scope, scoped to C/CXX via $<COMPILE_LANGUAGE> so it does not reach BoringSSL's NASM-assembled x86 .asm sources. All FetchContent dependencies inherit the flag. bin/build-tests.rb and .github/workflows/build.yml let --parallel default to Etc.nprocessors instead of capping at 1 or 2. cmake/GrpcProtobuf.cmake marks six unused gRPC subtargets EXCLUDE_FROM_ALL so grpc_unsecure et al. stop landing in ALL_BUILD.
Runtime correctness. core/cluster.cxx stops std::move()'ing the tracer_, meter_, app_telemetry_meter_, app_telemetry_reporter_, and orphan_reporter_ members out during close(): the public-API cluster::notify_fork(fork_event::child) recreates the outer impl, but pre-fork bucket/collection refs keep the inner core impl alive through their shared_ptr, so observability fields zeroed by close() produced null derefs on the next call. Stop in place instead, with a comment block above close() explaining the rule for future maintainers (members whose public accessor does not check stopped_ must not be moved out on close). test/test_integration_wrapper_tracer.cxx changes its version gate from integration.ctx.version (sourced from TEST_SERVER_VERSION env var, defaults to 6.6.0 when unset and the Jenkins pipeline does not export it) to integration.cluster_version() (runtime-derived from /pools/default), aligning with every other test in the suite.
The Jenkins pipeline definition still passes /m:4 explicitly; that external one-line change is documented in jenkins-disk-defense.md. Local reconfigure + SDK rebuild + affected-test rebuild under GCC 16 are clean; matrix validation comes from the next cxx-scripted-build-pipeline run.