Skip to content

CXXCBC-830: Unstick Jenkins matrix, shrink Windows wall-clock, survive fork(child)#952

Merged
avsej merged 1 commit into
couchbase:mainfrom
avsej:jenkins-fixes
May 23, 2026
Merged

CXXCBC-830: Unstick Jenkins matrix, shrink Windows wall-clock, survive fork(child)#952
avsej merged 1 commit into
couchbase:mainfrom
avsej:jenkins-fixes

Conversation

@avsej
Copy link
Copy Markdown
Member

@avsej avsej commented May 18, 2026

The cxx-scripted-build-pipeline regressed on every Linux stage after CXXCBC-777 / CXXCBC-778 added gRPC and protobuf as FetchContent dependencies, the Windows stage's wall-clock grew to ~2 h, and once the build matrix turned green the integration suite then surfaced a null-pointer dereference in the fork example plus a stale env-vs-runtime gate in one observability test. Seven consecutive runs (5223, 5224, 5226, 5227, 5229, 5232, 5238) peeled off cascading failures at successive layers; CXXCBC-830 enumerates the observed symptoms.

Four groups of code-side fixes:

  • Third-party wiring. cmake/ThirdPartyDependencies.cmake sets CURL_ENABLE_EXPORT_TARGET=OFF and CURL_DISABLE_SRP=ON before the OpenTelemetry cpmaddpackage so curl can configure and compile against our static BoringSSL on Linux hosts without libcurl-dev. cmake/Cache.cmake stops propagating find_program's CACHE_BINARY-NOTFOUND sentinel into CMAKE__COMPILER_LAUNCHER. cmake/CompilerWarnings.cmake confines four clang-only -Wno- flags to the CLANG_WARNINGS list so GCC's -Werror is happy.

  • Toolchain strictness. core/protocol/cmd_mutate_in.hxx, test/test_integration_tracer.cxx, and tools/sysinfo.cxx each get a single-line change to satisfy GCC 8/9/10's stricter -Wconversion, -Wpedantic, and -Wsign-compare respectively. test/test_unit_mcbp_session.cxx moves a constexpr int into the lambda body so MSVC accepts it without making clang complain about an unused capture.

  • Windows throughput. CMakeLists.txt adds add_compile_options(/MP) for MSVC at top scope, scoped to C/CXX via $<COMPILE_LANGUAGE> so it does not reach BoringSSL's NASM-assembled x86 .asm sources. All FetchContent dependencies inherit the flag. bin/build-tests.rb and .github/workflows/build.yml let --parallel default to Etc.nprocessors instead of capping at 1 or 2. cmake/GrpcProtobuf.cmake marks six unused gRPC subtargets EXCLUDE_FROM_ALL so grpc_unsecure et al. stop landing in ALL_BUILD.

  • Runtime correctness. core/cluster.cxx stops std::move()'ing the tracer_, meter_, app_telemetry_meter_, app_telemetry_reporter_, and orphan_reporter_ members out during close(): the public-API cluster::notify_fork(fork_event::child) recreates the outer impl, but pre-fork bucket/collection refs keep the inner core impl alive through their shared_ptr, so observability fields zeroed by close() produced null derefs on the next call. Stop in place instead, with a comment block above close() explaining the rule for future maintainers (members whose public accessor does not check stopped_ must not be moved out on close). test/test_integration_wrapper_tracer.cxx changes its version gate from integration.ctx.version (sourced from TEST_SERVER_VERSION env var, defaults to 6.6.0 when unset and the Jenkins pipeline does not export it) to integration.cluster_version() (runtime-derived from /pools/default), aligning with every other test in the suite.

The Jenkins pipeline definition still passes /m:4 explicitly; that external one-line change is documented in jenkins-disk-defense.md. Local reconfigure + SDK rebuild + affected-test rebuild under GCC 16 are clean; matrix validation comes from the next cxx-scripted-build-pipeline run.

@avsej avsej force-pushed the jenkins-fixes branch 4 times, most recently from 46a1d30 to 19a6035 Compare May 22, 2026 18:43
@avsej avsej changed the title Skip curl export target so OpenTelemetry-fetched curl can configure with BoringSSL CXXCBC-830: Unstick Jenkins matrix, shrink Windows wall-clock, survive fork(child) May 22, 2026
…e fork(child)

The cxx-scripted-build-pipeline regressed on every Linux stage after
CXXCBC-777 / CXXCBC-778 added gRPC and protobuf as FetchContent
dependencies, the Windows stage's wall-clock grew to ~2 h, and once
the build matrix turned green the integration suite then surfaced a
null-pointer dereference in the fork example plus a stale env-vs-runtime
gate in one observability test. Seven consecutive runs (5223, 5224,
5226, 5227, 5229, 5232, 5238) peeled off cascading failures at
successive layers; CXXCBC-830 enumerates the observed symptoms.

Four groups of code-side fixes:

* Third-party wiring. cmake/ThirdPartyDependencies.cmake sets
  CURL_ENABLE_EXPORT_TARGET=OFF and CURL_DISABLE_SRP=ON before the
  OpenTelemetry cpmaddpackage so curl can configure and compile
  against our static BoringSSL on Linux hosts without libcurl-dev.
  cmake/Cache.cmake stops propagating find_program's
  CACHE_BINARY-NOTFOUND sentinel into CMAKE_*_COMPILER_LAUNCHER.
  cmake/CompilerWarnings.cmake confines four clang-only -Wno-*
  flags to the CLANG_WARNINGS list so GCC's -Werror is happy.

* Toolchain strictness. core/protocol/cmd_mutate_in.hxx,
  test/test_integration_tracer.cxx, and tools/sysinfo.cxx each get
  a single-line change to satisfy GCC 8/9/10's stricter
  -Wconversion, -Wpedantic, and -Wsign-compare respectively.
  test/test_unit_mcbp_session.cxx moves a constexpr int into the
  lambda body so MSVC accepts it without making clang complain
  about an unused capture.

* Windows throughput. CMakeLists.txt adds add_compile_options(/MP)
  for MSVC at top scope, scoped to C/CXX via $<COMPILE_LANGUAGE> so
  it does not reach BoringSSL's NASM-assembled x86 .asm sources.
  All FetchContent dependencies inherit the flag. bin/build-tests.rb
  and .github/workflows/build.yml let --parallel default to
  Etc.nprocessors instead of capping at 1 or 2.
  cmake/GrpcProtobuf.cmake marks six unused gRPC subtargets
  EXCLUDE_FROM_ALL so grpc_unsecure et al. stop landing in ALL_BUILD.

* Runtime correctness. core/cluster.cxx stops std::move()'ing the
  tracer_, meter_, app_telemetry_meter_, app_telemetry_reporter_, and
  orphan_reporter_ members out during close(): the public-API
  cluster::notify_fork(fork_event::child) recreates the outer impl,
  but pre-fork bucket/collection refs keep the inner core impl alive
  through their shared_ptr, so observability fields zeroed by close()
  produced null derefs on the next call. Stop in place instead, with
  a comment block above close() explaining the rule for future
  maintainers (members whose public accessor does not check stopped_
  must not be moved out on close). test/test_integration_wrapper_tracer.cxx
  changes its version gate from integration.ctx.version (sourced from
  TEST_SERVER_VERSION env var, defaults to 6.6.0 when unset and the
  Jenkins pipeline does not export it) to integration.cluster_version()
  (runtime-derived from /pools/default), aligning with every other
  test in the suite.

The Jenkins pipeline definition still passes /m:4 explicitly; that
external one-line change is documented in jenkins-disk-defense.md.
Local reconfigure + SDK rebuild + affected-test rebuild under GCC 16
are clean; matrix validation comes from the next
cxx-scripted-build-pipeline run.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses CI/build regressions introduced by new FetchContent dependencies (gRPC/protobuf/OpenTelemetry), reduces Windows build wall-clock time, and fixes a fork(child) runtime crash by making cluster shutdown preserve observability state for still-live pre-fork references.

Changes:

  • CMake/CI: adjust curl/OpenTelemetry wiring, cache launcher handling, and compiler warning flags to keep GCC/MSVC/Clang builds green.
  • Windows throughput: enable MSVC multiprocess compilation, increase build parallelism defaults, and exclude unused gRPC subtargets from ALL_BUILD.
  • Runtime/tests: prevent observability shared_ptr members from being moved out during cluster_impl::close(), and align an integration test’s version gate with runtime-derived cluster version.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
CMakeLists.txt Enables MSVC /MP (scoped by compile language) to improve Windows compile throughput.
cmake/ThirdPartyDependencies.cmake Forces curl config options to work with static BoringSSL in the OpenTelemetry FetchContent path.
cmake/Cache.cmake Prevents propagating *-NOTFOUND cache sentinel into compiler launcher variables.
cmake/CompilerWarnings.cmake Moves clang-only -Wno-* suppressions out of common/GCC warning sets to avoid GCC -Werror failures.
cmake/GrpcProtobuf.cmake Marks unused gRPC targets EXCLUDE_FROM_ALL to avoid compiling them on Windows ALL_BUILD.
core/protocol/cmd_mutate_in.hxx Fixes a GCC conversion warning via explicit cast in bitmasking.
core/cluster.cxx Stops observability members in place during close to avoid null derefs after fork(child) with still-live references.
tools/sysinfo.cxx Fixes a sign-compare warning in PID range validation via explicit cast.
test/test_unit_mcbp_session.cxx Adjusts constexpr placement to satisfy MSVC without triggering clang unused-capture warnings.
test/test_integration_wrapper_tracer.cxx Switches feature gating to runtime-derived cluster_version() to match suite behavior.
test/test_integration_tracer.cxx Removes an extra trailing semicolon to satisfy stricter compiler diagnostics.
bin/build-tests.rb Defaults build parallelism to Etc.nprocessors rather than a fixed low value.
.github/workflows/build.yml Stops pinning CB_NUMBER_OF_JOBS, relying on the script’s new auto-detection.

Comment thread tools/sysinfo.cxx
@avsej avsej merged commit 8cddb8e into couchbase:main May 23, 2026
59 of 64 checks passed
@avsej avsej deleted the jenkins-fixes branch May 23, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants