Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pages/directory-and-name-resolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ public Task<SqlDatabase> Build(
string? databaseSuffix = null,
[CallerMemberName] string memberName = "")
```
<sup><a href='/src/LocalDb/SqlInstance.cs#L133-L155' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConventionBuildSignature' title='Start of snippet'>anchor</a></sup>
<sup><a href='/src/LocalDb/SqlInstance.cs#L138-L160' title='Snippet source file'>snippet source</a> | <a href='#snippet-ConventionBuildSignature' title='Start of snippet'>anchor</a></sup>
<!-- endSnippet -->

With these parameters the database name is the derived as follows:
Expand Down Expand Up @@ -150,7 +150,7 @@ If full control over the database name is required, there is an overload that ta
/// </summary>
public async Task<SqlDatabase> Build(string dbName)
```
<sup><a href='/src/LocalDb/SqlInstance.cs#L170-L177' title='Snippet source file'>snippet source</a> | <a href='#snippet-ExplicitBuildSignature' title='Start of snippet'>anchor</a></sup>
<sup><a href='/src/LocalDb/SqlInstance.cs#L175-L182' title='Snippet source file'>snippet source</a> | <a href='#snippet-ExplicitBuildSignature' title='Start of snippet'>anchor</a></sup>
<!-- endSnippet -->

Which can be used as follows:
Expand Down
16 changes: 16 additions & 0 deletions src/LocalDb.MultiProcessHelper/LocalDb.MultiProcessHelper.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net10.0</TargetFramework>
<SignAssembly>true</SignAssembly>
<AssemblyOriginatorKeyFile>..\key.snk</AssemblyOriginatorKeyFile>
<RootNamespace>LocalDb.MultiProcessHelper</RootNamespace>
<GeneratePackageOnBuild>false</GeneratePackageOnBuild>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\LocalDb\LocalDb.csproj" />
<PackageReference Include="ProjectDefaults" PrivateAssets="all" />
</ItemGroup>
</Project>
189 changes: 189 additions & 0 deletions src/LocalDb.MultiProcessHelper/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
// Child-process driver for the multi-process race tests.
// Mode "wrapper-start": <wrapper-start> <instanceName> <directory> <signalFile>
// Reproduces the symmetric race — runs Wrapper.Start once and reports the outcome.
// Mode "killer": <killer> <instanceName> <signalFile> <durationMs>
// Calls LocalDbApi.StopAndDelete(name) in a tight loop for the given duration.
// Mode "victim": <victim> <instanceName> <signalFile> <durationMs>
// Opens a SqlConnection to (LocalDb)\name in a tight loop, captures the first
// exception whose Win32 native error code is 0x89C50107 (LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST)
// and exits 0 to signal "race observed". Any other failure exits 1. If no error fires
// within the duration, exits 2 ("race not observed in window").

using System.ComponentModel;
using LocalDb;
using Microsoft.Data.SqlClient;

if (args.Length < 1)
{
Console.Error.WriteLine("Usage: <mode> <args...> (mode is wrapper-start | killer | victim)");
return 64;
}

var mode = args[0];
return mode switch
{
"wrapper-start" => await RunWrapperStartAsync(args.AsSpan()[1..].ToArray()),
"killer" => await RunKillerAsync(args.AsSpan()[1..].ToArray()),
"victim" => await RunVictimAsync(args.AsSpan()[1..].ToArray()),
_ => Fail($"Unknown mode: {mode}")
};

int Fail(string message)
{
Console.Error.WriteLine(message);
return 64;
}

async Task<int> RunWrapperStartAsync(string[] args)
{
if (args.Length < 3)
{
return Fail("wrapper-start usage: <instanceName> <directory> <signalFile>");
}
var instanceName = args[0];
var directory = args[1];
var signalFile = args[2];

await WaitForSignalAsync(signalFile);

try
{
using var wrapper = new Wrapper(instanceName, directory);
Func<SqlConnection, Task> noOp = _ => Task.CompletedTask;
wrapper.Start(new DateTime(2000, 1, 1), noOp);
await wrapper.AwaitStart();
Console.Out.WriteLine($"pid {Environment.ProcessId}: success");
return 0;
}
catch (Exception exception)
{
ReportException(exception);
return 1;
}
}

async Task<int> RunKillerAsync(string[] args)
{
if (args.Length < 3)
{
return Fail("killer usage: <instanceName> <signalFile> <durationMs>");
}
var instanceName = args[0];
var signalFile = args[1];
var durationMs = int.Parse(args[2]);

await WaitForSignalAsync(signalFile);

var deadline = Environment.TickCount64 + durationMs;
var killCount = 0;
while (Environment.TickCount64 < deadline)
{
try
{
LocalDbApi.StopAndDelete(instanceName);
killCount++;
}
catch
{
// Expected — the instance may already be gone, or a victim is using it. Keep hammering.
}
}
Console.Out.WriteLine($"pid {Environment.ProcessId} killer: {killCount} StopAndDelete cycles");
return 0;
}

async Task<int> RunVictimAsync(string[] args)
{
if (args.Length < 3)
{
return Fail("victim usage: <instanceName> <signalFile> <durationMs>");
}
var instanceName = args[0];
var signalFile = args[1];
var durationMs = int.Parse(args[2]);

await WaitForSignalAsync(signalFile);

var connectionString = $@"Data Source=(LocalDb)\{instanceName};Initial Catalog=master;Pooling=False;Connect Timeout=2";
var deadline = Environment.TickCount64 + durationMs;
var attempts = 0;
Exception? otherError = null;

while (Environment.TickCount64 < deadline)
{
attempts++;
try
{
await using var connection = new SqlConnection(connectionString);
await connection.OpenAsync();
}
catch (SqlException sql)
{
if (HasNativeCode(sql, unchecked((int)0x89C50107)))
{
Console.Out.WriteLine(
$"pid {Environment.ProcessId} victim: observed LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST (0x89C50107) on attempt {attempts}: {FirstLine(sql.Message)}");
return 0;
}
otherError = sql;
}
catch (Exception other)
{
otherError = other;
}
}

if (otherError == null)
{
Console.Error.WriteLine($"pid {Environment.ProcessId} victim: no errors after {attempts} attempts in {durationMs}ms");
return 2;
}

Console.Error.WriteLine($"pid {Environment.ProcessId} victim: {attempts} attempts, no 0x89C50107; last other error: {otherError.GetType().Name}: {FirstLine(otherError.Message)}");
var inner = otherError.InnerException;
while (inner != null)
{
Console.Error.WriteLine($" inner: {inner.GetType().Name}: {FirstLine(inner.Message)}");
inner = inner.InnerException;
}
return 1;
}

bool HasNativeCode(Exception exception, int code)
{
var current = exception;
while (current != null)
{
if (current is Win32Exception win32 && win32.NativeErrorCode == code)
{
return true;
}
current = current.InnerException;
}
return false;
}

async Task WaitForSignalAsync(string signalFile)
{
while (!File.Exists(signalFile))
{
await Task.Delay(20);
}
}

void ReportException(Exception exception)
{
Console.Error.WriteLine($"pid {Environment.ProcessId}: {exception.GetType().Name}: {FirstLine(exception.Message)}");
var inner = exception.InnerException;
while (inner != null)
{
Console.Error.WriteLine($" inner: {inner.GetType().Name}: {FirstLine(inner.Message)}");
if (inner is Win32Exception win32)
{
Console.Error.WriteLine($" NativeErrorCode: 0x{win32.NativeErrorCode:X8}");
}
inner = inner.InnerException;
}
}

string FirstLine(string message) => message.Replace("\r", "").Split('\n')[0];
117 changes: 117 additions & 0 deletions src/LocalDb.MultiProcessHelper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Multi-process race reproducers

This folder is a regression-test scaffold, not a shipped artifact. It exists to deterministically reproduce a class of races in `Wrapper.InnerStart` that surface when multiple OS processes share a single LocalDB user instance.

## Symptom

When two test-host processes (e.g. two `dotnet test` invocations, or Rider's runner concurrent with a CLI run) target the same `SqlInstance<T>` for the same Windows user, intermittent failures appear with stack traces like:

```
SetUp : Microsoft.Data.SqlClient.SqlException :
A network-related or instance-specific error occurred while establishing a connection to SQL Server.
... error: 50 - Local Database Runtime error occurred.
The specified LocalDB instance does not exist.
----> System.ComponentModel.Win32Exception : Unknown error (0x89c50107)
at Wrapper.OpenMasterConnection() in C:\projects\localdb\src\LocalDb\Wrapper.cs:line 274
at Wrapper.CreateAndDetachTemplate(...) in ...:line 229
at Wrapper.CreateDatabaseFromTemplate(String name) in ...:line 83
at EfLocalDb.SqlInstance`1.Build(String, IEnumerable`1) in ...:line 73
at EfLocalDbNunit.LocalDbTestBase`1.Reset() in ...:line 85
```

The `0x89C50107` native code is `LOCALDB_ERROR_INSTANCE_DOES_NOT_EXIST`. Other manifestations of the same underlying race include SQL deadlocks during `CREATE DATABASE [template]` and `Operating system error 2: cannot find the file specified` on `template.mdf`.

Once a machine is in this state it tends to stay broken: every subsequent `dotnet test` triggers the same race because the wrapper directory is empty (no `template.mdf`), so every process re-runs the destructive `StopAndDelete + CleanStart` branch.

## Root cause

`Wrapper.InnerStart` (LocalDb.csproj, `Wrapper.cs`):

```csharp
var info = LocalDbApi.GetInstance(instance);
if (!info.Exists) { CleanStart(); return; }
if (!info.IsRunning) { LocalDbApi.StartInstance(instance); }
if (!File.Exists(DataFile))
{
LocalDbApi.StopAndDelete(instance);
CleanStart(); // CreateInstance + StartInstance + CreateAndDetachTemplate
return;
}
```

There are two unsynchronized concurrency surfaces here:

1. **In-process** — `Wrapper.semaphoreSlim` is declared but never `WaitAsync`'d; two `Wrapper` instances for the same instance name running in the same process race on `LocalDbApi.*` calls and on the SQL DDL inside `CreateAndDetachTemplate`.
2. **Cross-process** — even if (1) were fixed with an in-process lock, the `LocalDbApi.*` calls reach into the per-Windows-user LocalDB metadata, which is shared across all processes belonging to that user. Two processes both running `InnerStart` against the same instance race on `StopAndDelete` / `CreateInstance` / `StartInstance` and on the same master DB.

Both surfaces dissolve under one fix: serialize `InnerStart` per instance name with an in-process lock **and** a named cross-process mutex.

## The three reproducer tests

| Test | Race surface | Failure surfaced |
|---|---|---|
| `ConcurrentStartTests.ConcurrentStartWithMissingTemplateShouldNotRace` | In-process (two `Wrapper` instances, one process, no helper exe) | SQL deadlock 1205 during `CREATE DATABASE [template]` |
| `MultiProcessConcurrentStartTests.MultiProcessConcurrentStartShouldNotRace` | Multi-process, symmetric (3 child processes all running `Wrapper.Start`) | SQL deadlock OR `template.mdf` not found OR `0x89C50107` (varies by timing) |
| `InstanceDoesNotExistRaceTests.KillerVsVictimSurfacesInstanceDoesNotExist` | Multi-process, asymmetric (one killer hammering `StopAndDelete`, one victim opening `SqlConnection`) | **Exact `0x89C50107` deterministically** — victim only exits 0 when it observes that specific code |

## Why each part exists

### `LocalDb.MultiProcessHelper` project

The asymmetric/multi-process tests need to spawn separate Windows processes via `Process.Start`. A Windows process needs an executable; an executable needs an entry point; that entry point lives in `Program.cs`.

We can't reuse `LocalDb.Tests.exe` for this — its entry point is owned by the test runner (NUnit + Microsoft.Testing.Platform), and we'd have to either fight the runner or invoke `dotnet test --filter` recursively (slow and awkward). A purpose-built console exe is simpler and faster.

### `Program.cs` with three modes (`wrapper-start`, `killer`, `victim`)

Different tests need different child behaviors. Rather than ship three executables, the same exe takes a mode argument:

- **`wrapper-start`** — full `Wrapper.Start` cycle. Used by the symmetric multi-process test where every child runs the same code path.
- **`killer`** — bare `LocalDbApi.StopAndDelete(name)` in a tight loop. Maximizes the chance of catching a victim mid-handshake.
- **`victim`** — `SqlConnection.OpenAsync` in a tight loop, walking exception chains for `Win32Exception.NativeErrorCode == 0x89C50107`. Exits 0 the first time it observes that exact code, exits 1/2 otherwise.

Splitting the killer and victim into separate processes is what makes `0x89C50107` reliably reproducible — symmetric children all running `Wrapper.Start` race on multiple things at once and surface a mix of error types; the asymmetric setup isolates the specific race window where the LocalDB API resolves the instance name as "does not exist."

### Strong-name signing (`SignAssembly` + `..\key.snk` in the .csproj)

`Wrapper`, `LocalDbApi`, and `DirectoryFinder` are `internal` types in the LocalDb assembly. The LocalDb assembly is strong-named and grants `InternalsVisibleTo` only to assemblies whose public key matches a specific `PublicKey=...` blob. For the helper to use those internal types, it must be signed with the same key. `..\key.snk` is the existing project-wide signing key (the same one Benchmark uses).

Alternative considered: drive the race entirely through `EfLocalDb.SqlInstance<T>` (a public API). That works but requires defining a `DbContext` and adds EF Core to the helper's dependency surface. Reaching for `Wrapper` directly keeps the helper minimal and exercises exactly the layer where the race lives.

### `InternalsVisibleTo` entry for `LocalDb.MultiProcessHelper`

Standard IVT plumbing — added next to the existing entries in `src/LocalDb/InternalsVisibleTo.cs`. Same `PublicKey=` blob as the others (it's the public half of `key.snk`).

### `<ProjectReference ... ReferenceOutputAssembly="false" Private="false" />` in `LocalDb.Tests.csproj`

The test project does **not** want to link the helper's assembly into its own output — it only wants the helper exe to exist on disk before tests run. `ReferenceOutputAssembly="false"` says "build it, but don't add a reference to its DLL in my compile inputs." `Private="false"` says "don't copy its outputs into my bin folder." With both set, the helper builds whenever the test project does (so a fresh `dotnet test` always finds an up-to-date helper), but there's no compile-time coupling between them.

The test resolves the helper path at runtime via `HelperExeResolver.cs`, which walks up from the test's `bin/<Config>/net10.0/` to find the sibling project's matching `bin/<Config>/net10.0/LocalDb.MultiProcessHelper.exe`.

### `LocalDb.slnx` entry

Nothing surprising — registers the new project so tooling (Rider, Visual Studio, `dotnet sln` operations) sees it. Without this the project still builds via the test project's `ProjectReference`, but it won't appear in solution-level views.

### Signal-file barrier (`signalFile` argument)

`Process.Start` spin-up jitter is on the order of 100–300 ms — wider than the actual race window for `0x89C50107`, which is microseconds. If children just started running their work immediately, the slowest child would always lose the race in a predictable order, and the test would be flaky.

The barrier flips this around: each child spawns, waits in a polling loop for a signal file to appear, and only proceeds once the parent test creates that file. The parent waits 750 ms after spawning all children (giving them enough time to load their CLR and reach the wait loop), then writes the signal — releasing them within a few ms of each other. That's tight enough to land the children in the actual race window reliably.

### `HelperExeResolver` (shared lookup)

Both multi-process tests need to find the helper exe at runtime, and the path resolution is non-trivial enough to want one place to update if the build layout changes. Pulling it out also avoids duplicate logic that could drift between the two tests.

## Suggested fix in `LocalDb`

Wire up the existing `Wrapper.semaphoreSlim` field around `InnerStart`'s body to handle the in-process race, and add a named OS mutex keyed on the instance name (e.g. `Global\\LocalDb_Wrapper_InnerStart_{instanceName}`) around the entire `InnerStart` operation to handle the cross-process race. Both tests in this folder should pass once that lock is in place; if either still fails, the lock isn't covering the right span.

## Running the tests

```powershell
dotnet test src/LocalDb.Tests/LocalDb.Tests.csproj `
--configuration Release `
--filter "FullyQualifiedName~ConcurrentStart|FullyQualifiedName~MultiProcessConcurrentStart|FullyQualifiedName~KillerVsVictim"
```

The deterministic `KillerVsVictimSurfacesInstanceDoesNotExist` finishes in ~8 s. The symmetric `MultiProcessConcurrentStartShouldNotRace` finishes in ~15-30 s. The in-process `ConcurrentStartWithMissingTemplateShouldNotRace` finishes in ~2 minutes (it intentionally rebuilds the template 5× for a non-flaky signal).
Loading
Loading