Skip to content

Speed up dotnet test: staged approach (xunit.runner.json first; in-process CLI only as separately-merged refactors) #94

@nickna

Description

@nickna

Background

dotnet test runs at <10% CPU on a 12-core machine across ~10,300 tests in a single assembly (SharpTS.Tests, xUnit v2, no xunit.runner.json). Wall-time variance is ~30s+ run-to-run on the Windows reference box, so any change must be measured against a median-of-3 baseline.

This issue was attempted once (2026-04-30) and fully reverted. That attempt bundled in-process CLI + in-process compiled execution + AsyncLocal ProcessBuiltIns + xunit.runner.json + collection re-shuffles + thread tuning all at once. Wall time went 5:13 baseline → 4:07 best → 6:17 final, with new sporadic flakes and CLR fatal 0x80131506 host crashes (collectible-ALC unload races; Assembly.LoadFrom simple-name collisions on assemblies all named "test"). Lessons captured in the project memory note feedback_test_perf_changes.md. The cost of getting this wrong is high; the cost of staging it small is low.

Root causes (unchanged from original analysis)

  1. Process-spawning integration tests dominate wall time. ~152 tests use CliTestHelper.RunCli / StandaloneDllTests to spawn dotnet SharpTS.dll. Each spawn pays ~300–800 ms of JIT/startup; the xUnit worker blocks in WaitForExit.
  2. xUnit v2 default = collection-per-class. [Theory] rows within a class run serially. Slowest classes: CliVarTests (23), CliCompileTests (18), CliScriptExecutionTests (15), CliBundlerTests (15), CliErrorTests (12).
  3. Broad [Collection] groupings serialize unrelated tests. TimerTestsCollection sets DisableParallelization = true and pulls in 6+ files (TimerTests, MicrotaskTests, TimersModuleTests, TimersPromisesModuleTests, ProcessNextTickTests, FsAsyncTests). ScriptArgs and ClusterTests add more.
  4. Single test assembly → no inter-assembly parallelism.

Mandatory protocol for this work

  • Establish baseline first. Median of 3 clean dotnet test runs before any change. A single timing is meaningless given run-to-run variance.
  • Change one variable at a time. Land, measure, repeat. No bundled PRs.
  • Never quote wall time from a run that aborted or test-host-crashed. A truncated run is not a faster run.
  • Don't preemptively widen DisableParallelization collections. Each addition serializes more tests with each other. Only add to a serialized collection after directly observing the class fail under parallel load.

Staged plan

Stage 1 — xunit.runner.json (low risk, do this first)

Add SharpTS.Tests/xunit.runner.json:

{
  "parallelizeAssembly": true,
  "parallelizeTestCollections": true,
  "maxParallelThreads": -1
}

Wire via csproj:

<ItemGroup>
  <Content Include="xunit.runner.json">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </Content>
</ItemGroup>

Start with -1 (= core count). Only consider oversubscribing (24) if a measured run shows worker idle time during the subprocess-bound phase.

Acceptance: measured median wall time improves; pass/fail counts unchanged; no new flakes after 3 consecutive clean runs.

Stage 2 — Document dotnet test --no-build (free)

Add a one-liner to CLAUDE.md for iteration loops where source is unchanged.

Stage 3 — Split largest [Theory] classes (mechanical, low risk)

Only if Stage 1 leaves measurable headroom. Targets: CliVarTests (23), CliCompileTests (18), CliScriptExecutionTests (15), CliBundlerTests (15), CliErrorTests (12). Split into smaller classes so theory rows can run in parallel under collection-per-class.

Acceptance: measured improvement on the affected classes' time; no new flakes.

Stage 4 — Audit TimerTests / ScriptArgs / ClusterTests collections (per-file, evidence-driven)

For each file currently in a DisableParallelization collection, demonstrate that removing it from that collection either (a) passes 3× under parallel load, or (b) fails — in which case it stays put and the failure mode is documented inline.

Files to review:

  • Infrastructure/TimerTestsCollection.cs
  • SharedTests/{TimerTests,MicrotaskTests}.cs
  • SharedTests/BuiltInModules/{TimersModuleTests,TimersPromisesModuleTests,ProcessNextTickTests,FsAsyncTests,ClusterModuleTests}.cs
  • InterpreterTests/CommandLineArgumentTests.cs, SharedTests/CommandLineArgumentTests.cs

Acceptance: any move out of a serialized collection passes 3 consecutive parallel runs.

Stage 5 — In-process CLI (largest theoretical win, largest risk; separate PRs per prerequisite)

This is not a single refactor. The previous attempt's failure modes prove that the prerequisites are real architectural changes that must land independently and be validated in isolation. Do not start Stage 5 unless Stages 1–4 are exhausted and the user explicitly opts in.

Prerequisite PRs (each independently mergeable, each measured):

5a. Extract CliEntry.Run(string[] args, TextWriter stdout, TextWriter stderr) → int from Program.cs. Program.Main becomes a thin shim. No behavioral change yet. Land + ship.

5b. AsyncLocal<TextWriter> Console redirector. Build a shared redirector and convert every current Console.SetOut/SetError caller to it:

  • SharpTS.Tests/LspTests/Helpers/LspBridgeTestHelper.cs
  • SharpTS.Tests/SdkTests/DiagnosticReporterTests.cs (CaptureStdOut, CaptureStdErr)
  • SharpTS.Tests/Infrastructure/TestHarness.cs (CompileAndRun)

While any raw Console.SetOut survives, concurrent in-process tests siphon each other's stdout → empty-output assertion failures. All-or-nothing.

5c. AsyncLocal-ize ProcessBuiltIns for argv, cwd-shadow, env. Note: CWD is process-global with no AsyncLocal equivalent — the policy is that in-process tests must use absolute paths and never mutate Environment.CurrentDirectory. Document this in a contributing note.

5d. Fix TestHarness assembly naming. new ILCompiler("test") is called in 7 places. Assembly.LoadFrom keys identity by simple name → parallel tests collide and see each other's IL → host crash. Every compiled assembly needs a GUID-based unique name and a matching runtimeconfig.json filename.

5e. Decide the process.argv compiled-mode policy. Compilation/RuntimeEmitter.ProcessHelpers.cs::EmitProcessGetArgv reads Environment.GetCommandLineArgs() directly in emitted IL — it does not go through ProcessBuiltIns. The ~7 compiled CommandLineArgumentTests cases will see the test runner's command line, not the test's args. Either:

  • rewrite the IL emitter to call through an AsyncLocal-aware shim, or
  • keep those specific tests on the subprocess path (CliTestHelper.RunCliInProcess returns null/throws → fall back to Process.Start).

5f. Convert CliTestHelper.RunCli to call CliEntry.Run in-process, with the subprocess path retained as a fallback (used by 5e and at least one RealPackageSmokeTests case to exercise the real Program.Main).

Acceptance for Stage 5 (cumulative, after 5a–5f):

  • Median wall time improves measurably vs. post-Stage-4 baseline.
  • 3 consecutive clean runs with no new flakes.
  • At least one real-process smoke test still exercises Program.Main.
  • No host-process crashes (0x80131506 or otherwise) across 5 consecutive runs.

Acceptance criteria (overall)

  • Each landed stage has a documented before/after median wall time (3 runs each).
  • No regressions in test pass/fail counts at any stage.
  • No new flakiness — verified by 3 consecutive clean runs after the stage lands.
  • At least one real-process smoke test still exercises Program.Main if Stage 5f lands.

Out of scope / explicit non-goals

  • Bundling multiple stages into one PR.
  • Quoting speedups from truncated/crashed runs.
  • Adding more files to DisableParallelization collections without observed parallel-load failure.
  • In-process execution of compiled DLLs (Stage 5d's risk surface) without GUID assembly naming and runtimeconfig parity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions