Speed up dotnet test: staged approach (xunit.runner.json first; in-process CLI only as separately-merged refactors)

## Background

`dotnet test` runs at <10% CPU on a 12-core machine across ~10,300 tests in a single assembly (`SharpTS.Tests`, xUnit v2, no `xunit.runner.json`). Wall-time variance is ~30s+ run-to-run on the Windows reference box, so any change must be measured against a median-of-3 baseline.

**This issue was attempted once (2026-04-30) and fully reverted.** That attempt bundled in-process CLI + in-process compiled execution + `AsyncLocal` `ProcessBuiltIns` + `xunit.runner.json` + collection re-shuffles + thread tuning all at once. Wall time went 5:13 baseline → 4:07 best → 6:17 final, with new sporadic flakes and CLR fatal `0x80131506` host crashes (collectible-ALC unload races; `Assembly.LoadFrom` simple-name collisions on assemblies all named `"test"`). Lessons captured in the project memory note `feedback_test_perf_changes.md`. **The cost of getting this wrong is high; the cost of staging it small is low.**

## Root causes (unchanged from original analysis)

1. **Process-spawning integration tests dominate wall time.** ~152 tests use `CliTestHelper.RunCli` / `StandaloneDllTests` to spawn `dotnet SharpTS.dll`. Each spawn pays ~300–800 ms of JIT/startup; the xUnit worker blocks in `WaitForExit`.
2. **xUnit v2 default = collection-per-class.** `[Theory]` rows within a class run serially. Slowest classes: `CliVarTests` (23), `CliCompileTests` (18), `CliScriptExecutionTests` (15), `CliBundlerTests` (15), `CliErrorTests` (12).
3. **Broad `[Collection]` groupings serialize unrelated tests.** `TimerTestsCollection` sets `DisableParallelization = true` and pulls in 6+ files (`TimerTests`, `MicrotaskTests`, `TimersModuleTests`, `TimersPromisesModuleTests`, `ProcessNextTickTests`, `FsAsyncTests`). `ScriptArgs` and `ClusterTests` add more.
4. **Single test assembly** → no inter-assembly parallelism.

## Mandatory protocol for this work

- **Establish baseline first.** Median of 3 clean `dotnet test` runs *before* any change. A single timing is meaningless given run-to-run variance.
- **Change one variable at a time.** Land, measure, repeat. No bundled PRs.
- **Never quote wall time from a run that aborted or test-host-crashed.** A truncated run is not a faster run.
- **Don't preemptively widen `DisableParallelization` collections.** Each addition serializes more tests with each other. Only add to a serialized collection after directly observing the class fail under parallel load.

## Staged plan

### Stage 1 — `xunit.runner.json` (low risk, do this first)

Add `SharpTS.Tests/xunit.runner.json`:
```json
{
  "parallelizeAssembly": true,
  "parallelizeTestCollections": true,
  "maxParallelThreads": -1
}
```

Wire via csproj:
```xml
<ItemGroup>
  <Content Include="xunit.runner.json">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </Content>
</ItemGroup>
```

Start with `-1` (= core count). Only consider oversubscribing (`24`) if a *measured* run shows worker idle time during the subprocess-bound phase.

**Acceptance:** measured median wall time improves; pass/fail counts unchanged; no new flakes after 3 consecutive clean runs.

### Stage 2 — Document `dotnet test --no-build` (free)

Add a one-liner to `CLAUDE.md` for iteration loops where source is unchanged.

### Stage 3 — Split largest `[Theory]` classes (mechanical, low risk)

Only if Stage 1 leaves measurable headroom. Targets: `CliVarTests` (23), `CliCompileTests` (18), `CliScriptExecutionTests` (15), `CliBundlerTests` (15), `CliErrorTests` (12). Split into smaller classes so theory rows can run in parallel under collection-per-class.

**Acceptance:** measured improvement on the affected classes' time; no new flakes.

### Stage 4 — Audit `TimerTests` / `ScriptArgs` / `ClusterTests` collections (per-file, evidence-driven)

For each file currently in a `DisableParallelization` collection, demonstrate that removing it from that collection either (a) passes 3× under parallel load, or (b) fails — in which case it stays put and the failure mode is documented inline.

Files to review:
- `Infrastructure/TimerTestsCollection.cs`
- `SharedTests/{TimerTests,MicrotaskTests}.cs`
- `SharedTests/BuiltInModules/{TimersModuleTests,TimersPromisesModuleTests,ProcessNextTickTests,FsAsyncTests,ClusterModuleTests}.cs`
- `InterpreterTests/CommandLineArgumentTests.cs`, `SharedTests/CommandLineArgumentTests.cs`

**Acceptance:** any move out of a serialized collection passes 3 consecutive parallel runs.

### Stage 5 — In-process CLI (largest theoretical win, largest risk; **separate PRs per prerequisite**)

This is *not* a single refactor. The previous attempt's failure modes prove that the prerequisites are real architectural changes that must land independently and be validated in isolation. Do **not** start Stage 5 unless Stages 1–4 are exhausted and the user explicitly opts in.

**Prerequisite PRs (each independently mergeable, each measured):**

5a. **Extract `CliEntry.Run(string[] args, TextWriter stdout, TextWriter stderr) → int`** from `Program.cs`. `Program.Main` becomes a thin shim. No behavioral change yet. Land + ship.

5b. **`AsyncLocal<TextWriter>` Console redirector.** Build a shared redirector and convert *every* current `Console.SetOut`/`SetError` caller to it:
- `SharpTS.Tests/LspTests/Helpers/LspBridgeTestHelper.cs`
- `SharpTS.Tests/SdkTests/DiagnosticReporterTests.cs` (`CaptureStdOut`, `CaptureStdErr`)
- `SharpTS.Tests/Infrastructure/TestHarness.cs` (`CompileAndRun`)

While any raw `Console.SetOut` survives, concurrent in-process tests siphon each other's stdout → empty-output assertion failures. All-or-nothing.

5c. **`AsyncLocal`-ize `ProcessBuiltIns`** for argv, cwd-shadow, env. Note: CWD is process-global with no `AsyncLocal` equivalent — the policy is that in-process tests **must use absolute paths** and never mutate `Environment.CurrentDirectory`. Document this in a contributing note.

5d. **Fix `TestHarness` assembly naming.** `new ILCompiler("test")` is called in 7 places. `Assembly.LoadFrom` keys identity by simple name → parallel tests collide and see each other's IL → host crash. Every compiled assembly needs a GUID-based unique name **and** a matching `runtimeconfig.json` filename.

5e. **Decide the `process.argv` compiled-mode policy.** `Compilation/RuntimeEmitter.ProcessHelpers.cs::EmitProcessGetArgv` reads `Environment.GetCommandLineArgs()` *directly* in emitted IL — it does not go through `ProcessBuiltIns`. The ~7 compiled `CommandLineArgumentTests` cases will see the test runner's command line, not the test's args. Either:
  - rewrite the IL emitter to call through an AsyncLocal-aware shim, or
  - keep those specific tests on the subprocess path (`CliTestHelper.RunCliInProcess` returns `null`/throws → fall back to `Process.Start`).

5f. **Convert `CliTestHelper.RunCli` to call `CliEntry.Run` in-process**, with the subprocess path retained as a fallback (used by 5e and at least one `RealPackageSmokeTests` case to exercise the real `Program.Main`).

**Acceptance for Stage 5 (cumulative, after 5a–5f):**
- Median wall time improves measurably vs. post-Stage-4 baseline.
- 3 consecutive clean runs with no new flakes.
- At least one real-process smoke test still exercises `Program.Main`.
- No host-process crashes (`0x80131506` or otherwise) across 5 consecutive runs.

## Acceptance criteria (overall)

- [ ] Each landed stage has a documented before/after median wall time (3 runs each).
- [ ] No regressions in test pass/fail counts at any stage.
- [ ] No new flakiness — verified by 3 consecutive clean runs after the stage lands.
- [ ] At least one real-process smoke test still exercises `Program.Main` if Stage 5f lands.

## Out of scope / explicit non-goals

- Bundling multiple stages into one PR.
- Quoting speedups from truncated/crashed runs.
- Adding more files to `DisableParallelization` collections without observed parallel-load failure.
- In-process execution of compiled DLLs (Stage 5d's risk surface) without GUID assembly naming and runtimeconfig parity.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up dotnet test: staged approach (xunit.runner.json first; in-process CLI only as separately-merged refactors) #94

Background

Root causes (unchanged from original analysis)

Mandatory protocol for this work

Staged plan

Stage 1 — `xunit.runner.json` (low risk, do this first)

Stage 2 — Document `dotnet test --no-build` (free)

Stage 3 — Split largest `[Theory]` classes (mechanical, low risk)

Stage 4 — Audit `TimerTests` / `ScriptArgs` / `ClusterTests` collections (per-file, evidence-driven)

Stage 5 — In-process CLI (largest theoretical win, largest risk; separate PRs per prerequisite)

Acceptance criteria (overall)

Out of scope / explicit non-goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Speed up dotnet test: staged approach (xunit.runner.json first; in-process CLI only as separately-merged refactors) #94

Description

Background

Root causes (unchanged from original analysis)

Mandatory protocol for this work

Staged plan

Stage 1 — xunit.runner.json (low risk, do this first)

Stage 2 — Document dotnet test --no-build (free)

Stage 3 — Split largest [Theory] classes (mechanical, low risk)

Stage 4 — Audit TimerTests / ScriptArgs / ClusterTests collections (per-file, evidence-driven)

Stage 5 — In-process CLI (largest theoretical win, largest risk; separate PRs per prerequisite)

Acceptance criteria (overall)

Out of scope / explicit non-goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Stage 1 — `xunit.runner.json` (low risk, do this first)

Stage 2 — Document `dotnet test --no-build` (free)

Stage 3 — Split largest `[Theory]` classes (mechanical, low risk)

Stage 4 — Audit `TimerTests` / `ScriptArgs` / `ClusterTests` collections (per-file, evidence-driven)