diff --git a/README.md b/README.md index a3a24990..bf335b0b 100644 --- a/README.md +++ b/README.md @@ -32,24 +32,28 @@ ## Performance at a Glance -| 1.67M gets/sec | 10.77M reads/sec | 798.25K rows/sec | 1.04K commits/sec | +| 1.99M gets/sec | 9.68M COUNTs/sec | 799.29K rows/sec | 890 commits/sec | |:-:|:-:|:-:|:-:| -| Collection point reads | Concurrent reader burst (8x reused) | Durable InsertBatch B10000 | Concurrent durable writes | +| Collection point reads | Concurrent reader burst (8x reused) | Durable `InsertBatch` B10000 | Concurrent durable writes | -Intel i9-11900K, .NET SDK 10.0.202, .NET runtime 10.0.6, Windows 10.0.26300. Snapshot promoted from the April 21, 2026 release-core run with release guardrails at PASS=185, WARN=0, SKIP=0, FAIL=0. Full results live in the benchmark suite. +Intel i9-11900K, 16 logical cores, Windows 10.0.26300, .NET SDK 10.0.203, .NET runtime 10.0.7. Snapshot promoted from the May 6, 2026 release-core suite; latest release guardrail compare passed May 6, 2026 with PASS=187, WARN=0, SKIP=0, FAIL=0. Full results live in the benchmark suite. --- -## Write Durability Modes +## Durable API Top Lines -Default CSharpDB benchmarks run in fully durable mode. CSharpDB also supports a less-durable buffered mode for workloads that want much higher write throughput and can tolerate a larger crash-loss window. +Default CSharpDB file-backed benchmarks are fully durable: WAL fsync-on-commit unless a row explicitly says otherwise. In-memory rows show the same API paths without disk durability. -| Mode | SQL Single INSERT | SQL Batch x100 | Collection Single PUT | Collection Batch x100 | -|------|------------------:|---------------:|----------------------:|----------------------:| -| Durable (default) | 279.4 ops/sec | 26.71K rows/sec | 273.5 ops/sec | 25.92K docs/sec | -| Buffered | 21.17K ops/sec | 456.63K rows/sec | 19.30K ops/sec | 399.76K docs/sec | +| Surface | Single write | Batch x100 | Point read | Concurrent read | +|---|---:|---:|---:|---:| +| SQL file-backed | 267.1 ops/sec | 25.56K rows/sec | 1.48M ops/sec | 9.68M COUNTs/sec | +| SQL hybrid incremental-durable | 276.1 ops/sec | 26.55K rows/sec | 1.47M ops/sec | 10.04M COUNTs/sec | +| SQL in-memory | 259.48K ops/sec | 934.22K rows/sec | 1.49M ops/sec | 10.26M COUNTs/sec | +| Collection file-backed | 265.7 ops/sec | 24.53K docs/sec | 1.99M ops/sec | - | +| Collection hybrid incremental-durable | 276.9 ops/sec | 25.75K docs/sec | 2.02M ops/sec | - | +| Collection in-memory | 262.14K ops/sec | 969.55K docs/sec | 2.02M ops/sec | - | -`Durable` is fsync-on-commit. `Buffered` is less durable and analogous to SQLite WAL `synchronous=NORMAL`. The durable row is from the April 21, 2026 release-core snapshot; the buffered row remains from the April 7, 2026 buffered rerun because release-core currently promotes durable numbers. Full methodology and the complete matrix live in the benchmark suite README. +Source: `master-table-20260506-024609-median-of-3.csv` from the May 6, 2026 release-core snapshot. Full methodology and storage-mode detail live in the benchmark suite README. --- @@ -59,12 +63,50 @@ The current release-core concurrent write rows measure the intended shared-inser | Workload | Writers | Commit window | Durable Commits/sec | Commits/flush | Notes | |----------|--------:|--------------:|--------------------:|--------------:|-------| -| Shared auto-commit `INSERT` | 4 | `0` | 270.5 | 1.00 | One durable flush per commit | -| Shared auto-commit `INSERT` | 4 | `250us` | 517.4 | 1.99 | Group commit roughly doubles throughput | -| Shared auto-commit `INSERT` | 8 | `0` | 266.2 | 1.00 | Still flush-bound with no commit window | -| Shared auto-commit `INSERT` | 8 | `250us` | 1.04K | 3.98 | Current release-core headline row | +| Shared auto-commit `INSERT` | 4 | `0` | 247.0 | 1.00 | One durable flush per commit | +| Shared auto-commit `INSERT` | 4 | `250us` | 463.4 | 1.99 | Group commit roughly doubles throughput | +| Shared auto-commit `INSERT` | 8 | `0` | 239.2 | 1.00 | Still flush-bound with no commit window | +| Shared auto-commit `INSERT` | 8 | `250us` | 890.1 | 3.94 | Current release-core headline row | -Source: `concurrent-write-diagnostics-20260421-220505-median-of-3.csv` from the April 21, 2026 release-core run. Full methodology and tuning notes live in the benchmark suite README. +Focused hot insert fan-in diagnostics cover the newer right-edge and auto-ID shapes that are not part of the release-core scorecard yet: + +| Insert shape | Writers/window | Durable Commits/sec | Commits/flush | +|---|---:|---:|---:| +| Serialized explicit hot right-edge | `W8 + 250us` | 278.4 | 1.00 | +| Concurrent explicit hot right-edge | `W8 + 250us` | 910.3 | 3.33 | +| Concurrent auto-ID hot right-edge | `W8 + 250us` | 913.1 | 3.34 | +| Concurrent explicit disjoint ranges | `W8 + 250us` | 1,049.6 | 3.96 | + +Sources: release-core `concurrent-write-diagnostics-20260506-032735-median-of-3.csv`; focused insert fan-in diagnostic `insert-fan-in-diagnostics-20260505-233424.csv`. Focused rows remain diagnostic until the release-core suite includes those shapes directly. Full methodology and tuning notes live in the benchmark suite README. + +--- + +## Local SQLite Reference + +Same-runner SQLite rows use Microsoft.Data.Sqlite 10.0.7 with WAL + `synchronous=FULL`. They are comparison points, not universal claims. + +| Workload | CSharpDB | SQLite WAL+FULL | +|---|---:|---:| +| Durable prepared bulk insert B1000 | 211.99K rows/sec | 155.66K rows/sec | +| SQL point lookup | 1.48M ops/sec | 93.91K ops/sec | + +Source: `sqlite-compare-20260506-035128-median-of-3.csv` from the May 6, 2026 release-core snapshot. + +--- + +## Generated Collection Fast Path + +The source-generated collection path is opt-in through `GetGeneratedCollectionAsync(...)`. It mainly improves collection payload CPU, direct field extraction, and index-reader paths; one-row durable writes can still be WAL-flush-bound. + +| Path | Source-gen JSON | Generated binary | Gain | Allocation | +|---|---:|---:|---:|---| +| Encode payload | 600.1 ns | 306.2 ns | 1.96x | 552 B to 136 B | +| Decode payload | 2,277.9 ns | 371.9 ns | 6.12x | 1,240 B to 480 B | +| Indexed int field read | 187.23 ns | 29.74 ns | 6.30x | 0 B to 0 B | +| Text field UTF-8 read | 185.82 ns | 27.26 ns | 6.82x | 56 B to 0 B | +| Key match | 21.48 ns | 19.91 ns | 1.08x | 0 B to 0 B | + +Source: `BenchmarkDotNet.Artifacts/results/CSharpDB.Benchmarks.Micro.GeneratedCollection*Benchmarks-report.csv`. These rows are diagnostic microbenchmarks, not release-core scorecard rows. --- diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index 4a614773..288e03b5 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -1,442 +1,149 @@ # What's New -## v3.6.0 - -v3.6.0 adds trusted, in-process C# scalar functions and commands across -CSharpDB's user-facing expression and automation surfaces. Host applications -can now register C# callbacks when opening or hosting a database, then call -those callbacks from SQL, SQL-backed triggers and procedures, Admin Forms -formulas/events/actions, Admin Reports calculated text and preview lifecycle -events, and pipeline filter/derive/hook expressions. - -The release also adds tableless scalar `SELECT` support, common built-in scalar -functions, Admin callback catalog metadata, SQL autocomplete for built-ins and -tableless-safe host callbacks, and local Admin artifact cleanup to keep -incremental builds fast. - -### Trusted C# Scalar Functions - -- Added the shared `DbFunctionRegistry`, `DbFunctionRegistryBuilder`, - `DbScalarFunctionDelegate`, and `DbScalarFunctionOptions` public model in - `CSharpDB.Primitives`. -- Added `DatabaseOptions.Functions` plus `ConfigureFunctions(...)` so embedded - hosts can register scalar functions when opening file-backed, in-memory, or - hybrid databases. -- SQL expression evaluation now resolves registered scalar functions in - projections, filters, ordering expressions, `INSERT`/`UPDATE` expressions, - trigger bodies, and stored SQL procedure bodies. -- Direct clients can pass trusted functions through `DirectDatabaseOptions`; - HTTP and gRPC clients still do not serialize delegates and can only call - functions registered inside the remote host process. -- Admin Forms formulas and Admin Reports calculated expressions can use the - same registry while preserving existing arithmetic and aggregate behavior. -- Pipeline filter and derived-column expressions can call registered functions; - package definitions store expressions plus generated automation metadata, but - never C# function bodies. -- Scalar callback registration now carries `CanRunWithoutFrom` metadata so - hosts can identify functions that are safe to discover in tableless - `SELECT ...` contexts. -- Added the usage guide at `docs/trusted-csharp-functions/README.md`. - -### Tableless SELECT And Built-In Scalar Functions - -- SQL now supports scalar `SELECT` statements without a `FROM` clause through a - single-row planner source. -- Tableless statements such as `SELECT Date();`, `SELECT abs(1123.34);`, and - `SELECT Slugify('Hello World');` can execute without inventing a dummy table - when the expression does not need row context. -- Added a central built-in scalar dispatcher for common text, date/time, - numeric, conversion, and null helpers, including functions such as `ABS`, - `DATE`, `DATESERIAL`, `DATEADD`, `DATEDIFF`, `LEN`, `UCASE`, `LCASE`, - `ROUND`, `IFNULL`, and `NZ`. -- Query planning now infers built-in scalar return types where possible. -- Query paging and Admin result serialization now handle the internal - tableless single-row source. -- BLOB procedure parameters can now round-trip through tableless - `SELECT @payload;` rather than failing on the old unsupported-path - assumption. - -### Admin Callback Catalog And Formula UX - -- The Admin navigation now groups callbacks under `Callbacks / Internal` and - `Callbacks / External`. -- Internal callbacks show built-in formula functions separately from registered - host callbacks, so the list remains navigable as the built-in surface grows. -- External callbacks show host-registered/user-created callbacks such as sample - functions and automation commands. -- Callback details now surface whether a scalar callback is marked for - tableless `SELECT`. -- SQL editor completion now suggests built-in scalar functions and host - callbacks marked with `CanRunWithoutFrom`. -- Admin Forms formulas now have an Access-style function catalog/helper and - domain-function support for common form expressions. - -### Trusted Commands And Form Events - -- Added the shared `DbCommandRegistry`, `DbCommandRegistryBuilder`, - `DbCommandDelegate`, `DbCommandContext`, `DbCommandResult`, and - `DbCommandOptions` public model in `CSharpDB.Primitives`. -- `DbCommandOptions` now includes `Timeout` and `IsLongRunning`, and - `DbCommandRegistryBuilder.AddAsyncCommand(...)` registers `Task`-based host - callbacks without manual `ValueTask` wrapping. -- Command timeouts cancel the command invocation token and surface as command - failures through the existing Forms, Reports, and Pipelines dispatch paths; - external cancellation is still propagated as cancellation. -- Admin Forms can now store form-level event bindings that reference trusted - command names instead of storing C# source. -- The Forms data-entry runtime dispatches `OnOpen`, `OnLoad`, `BeforeInsert`, - `AfterInsert`, `BeforeUpdate`, `AfterUpdate`, `BeforeDelete`, and - `AfterDelete`. -- `BeforeInsert`, `BeforeUpdate`, and `BeforeDelete` can cancel the requested - write by returning `DbCommandResult.Failure(...)`; after-events report errors - without attempting to roll back a completed write. -- Command context arguments include current record fields converted to - `DbValue`; metadata includes the Forms surface, form id/name, table name, and - event name. -- `AddCSharpDbAdminForms(...)` now has a command-registration overload for - trusted host applications. -- The Admin Forms designer preserves and edits form-level event bindings - instead of dropping automation metadata during save. -- Added a command button control that invokes a trusted host command on click, - passing current record fields, optional configured arguments, and form - metadata to the command callback. -- Added control-level Admin Forms event bindings for `OnClick`, `OnChange`, - `OnGotFocus`, and `OnLostFocus`, so ordinary controls can invoke trusted - host commands without being command buttons. -- The Forms property inspector now edits selected-control event bindings using - the same registered-command picker and JSON argument editor as form-level - events. -- Added shared declarative action sequence metadata with `RunCommand`, - `SetFieldValue`, `ShowMessage`, and `Stop` steps for Admin Forms automation. - Form and control event bindings can now be command-only, - action-sequence-only, or a command followed by an action sequence. -- Added built-in rendered-form actions for `NewRecord`, `SaveRecord`, - `DeleteRecord`, `RefreshRecords`, `PreviousRecord`, `NextRecord`, and - `GoToRecord`, so command buttons and control events can drive common form - workflows without host C# callbacks. -- Action sequence steps can now include a simple condition such as - `Status = 'Ready'`, `Amount > 0`, or `IsActive`; false conditions skip that - step, while malformed conditions fail through the normal step failure path. -- Forms can now store reusable named action sequences on `FormDefinition` and - invoke them from event/button sequences with `RunActionSequence`, including - optional per-call arguments and a nesting guard for recursive loops. -- The form-event and selected-control event editors now include a visual - action-sequence editor for adding, ordering, removing, and configuring - command, reusable sequence, field, message, stop, built-in record actions, - and per-step conditions. -- The Forms property inspector now includes a reusable action-sequence library - editor at the form level, and event action editors can pick those named - sequences while preserving missing names for portable metadata. -- The action-sequence editor uses registered-command pickers when commands are - available, preserves missing command names for portable form metadata, and - keeps JSON editing limited to optional argument payloads. -- Action sequences store names, arguments, field targets, and literal values - only. They do not store C# source, serialize delegates, or run untrusted code. -- Added shared command argument conversion helpers so Forms, Reports, and - Pipelines pass host command arguments with the same `DbValue` conversion - rules. -- Admin Reports can now bind `OnOpen`, `BeforeRender`, and `AfterRender` - preview lifecycle events to trusted commands. The preview service passes - report/source metadata plus row, truncation, page, and schema-drift metrics. -- `AddCSharpDbAdminReports(...)` now has a command-registration overload for - trusted host applications. -- Pipeline packages can now include trusted command hooks for `OnRunStarted`, - `OnBatchCompleted`, `OnRunSucceeded`, and `OnRunFailed`. Package JSON stores - hook names, arguments, and generated automation metadata only; command bodies - remain host-registered code. -- Pipeline hook failures fail the run through `PipelineRunResult`; failure-hook - errors are appended to the failed run summary instead of recursively - dispatching more failure hooks. -- Admin Forms command buttons now refresh their executing/disabled state before - and after async command work, so long-running trusted commands give visible - runtime feedback in the form surface. - -### Stored Automation Metadata - -- Added shared `DbAutomationMetadata`, command references, and scalar-function - references so portable definitions can declare the trusted host callbacks - they expect without storing C# code. -- Admin Forms, Admin Reports, and pipeline packages now regenerate automation - metadata during repository save/load or package serialization/deserialization. - Older JSON without automation metadata is backfilled on read. -- Form metadata captures trusted form events, command buttons, selected-control - events, reusable action sequences, action-sequence `RunCommand` steps, and - computed-formula scalar functions. -- Report metadata captures preview lifecycle command bindings and calculated - text scalar functions. -- Pipeline package metadata captures command hooks and scalar functions used by - filter and derived-column expressions; package validation reports stale - automation manifests so packages can be re-exported. - -### Developer Experience - -- Added `samples/trusted-csharp-host`, a VS Code-ready C# host project for - writing and debugging trusted C# callbacks in ordinary application code. -- The sample registers a trusted scalar function, calls it from SQL, registers - a trusted command, and runs an Admin Forms action sequence that sets a field - before invoking that command. -- The sample includes local `.vscode` launch/tasks files so developers can open - the sample folder, press `F5`, and set breakpoints inside callback code. -- The direct Admin launcher now cleans stale Admin artifact snapshots, builds - once, and runs with `--no-build` so old generated artifacts do not slow - startup. -- Added repo-level MSBuild cleanup for `src/CSharpDB.Admin/artifacts` and - default excludes for generated artifact folders. -- Added the async I/O batching follow-up note at - `docs/query-and-durable-write-performance/async-io-batching-follow-up.md`. - -### Behavior And Safety - -- Function names are case-insensitive SQL identifiers, and registration rejects - duplicate user names or collisions with reserved built-ins such as `TEXT`, - `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX`. -- Arity is validated before invocation, missing SQL functions fail with the - existing unknown scalar function path, and thrown delegate exceptions are - wrapped with the function name before normal statement/transaction rollback. -- `NullPropagating = true` returns `NULL` without invoking the delegate when - any argument is `NULL`; otherwise `DbValue.Null` is passed explicitly. -- V1 remains scalar-only, synchronous, trusted, and in-process. It does not - persist C# source, sandbox code, load database-owned plugin assemblies, - marshal delegates over HTTP/gRPC, or add aggregate/table-valued/procedure - UDFs. -- Query planning keeps custom functions on the residual expression path in V1: - no index pushdown, generated columns, constant folding, or cost assumptions - are inferred from user functions. +## version3.7.0 + +version3.7.0 focuses on query-planner observability, opt-in adaptive join +reoptimization, faster paged view browsing, and the benchmark/documentation +close-out work around the current optimizer and async I/O roadmap phases. It +also carries smaller but important polish for SQL result metadata, fulfillment +sample lookup indexes, DataGen direct-load throughput, and benchmark regression +analysis. + +### Planner Observability + +- Added SQL-first planner diagnostics through `EXPLAIN ESTIMATE FOR SELECT`, + `EXPLAIN ESTIMATE FOR WITH`, and compound query estimate support. +- Added public `sys.planner_*` virtual catalogs for planner histograms, heavy + hitters, and composite index prefix statistics. +- Added bounded estimate diagnostics for stats freshness, lookup and filter + estimates, index choices, hash build-side selection, and join reordering. +- Added an Admin Query tab Estimate action and Plan tab rendering for planner + diagnostic rowsets. +- Documented how to read plan output, debug missing or stale stats, and spot + common query-planning red flags. + +### Adaptive Join Reoptimization + +- Added opt-in phase-one adaptive query reoptimization through + `DatabaseOptions.AdaptiveQueryReoptimization` and + `EnableAdaptiveQueryReoptimization(...)`. +- Added ADO.NET direct embedded connection-string support for + `Adaptive Query Reoptimization=true`; remote endpoint connections reject the + key so hosts enable the feature server-side. +- Added adaptive join wrappers that can switch eligible index nested-loop joins + to hash joins before rows are emitted when observed outer cardinality + diverges. +- Added adaptive hash join build-side flipping for eligible inner joins when + the planned build side is materially larger than estimated. +- Added internal diagnostics for eligible queries, attempts, successful + switches, rejected switches, divergence events, buffered rows, and fail-closed + fallback reasons. +- Kept default query behavior unchanged and suppressed adaptation for risky + shapes such as compound query children, correlated subqueries, cross/right + joins, and `SELECT *` cases where visible column order could change. + +### View And Lookup Planning + +- Taught row-goal planning to reorder eligible simple view join chains before + building the view operator tree, so bounded `LIMIT`/`OFFSET` view queries can + use the same streaming lookup plans as equivalent inline SQL. +- Updated the Admin DataGrid view path to page views with bounded + `LIMIT`/`OFFSET` instead of opening unbounded forward-only view cursors. +- Fixed the Query tab grid layout so the row grid is the only scroll container + and the pagination bar stays fixed below the rows. +- Improved lookup-join planning by using indexed local predicate estimates when + cardinality stats are unavailable or weaker. +- Preserved right-side local predicates as residual join filters for lookup + joins and passed estimated row counts into index scans as capacity hints. +- Added an `orders(customer_id)` lookup index to the fulfillment sample schema + for customer-filtered order views. + +### SQL Result Metadata + +- Propagated query column types through engine transport, HTTP API/client DTOs, + and the Admin DataGrid. +- View/query result metadata now preserves `ColumnTypes` across local and + remote SQL execution paths. +- Updated client SQL execution coverage so column names and column types are + both asserted for query results. +- Updated the API package reference for `Scalar.AspNetCore` from `2.14.10` to + `2.14.11`. + +### DataGen And Collection Fast Path Close-Out + +- Closed the current generated collection fast-path roadmap phase and split + package ergonomics and broader generator coverage into future roadmap items. +- Refreshed benchmark-facing docs with the current release-core snapshot and + generated collection codec diagnostics. +- Moved CSharpDB.DataGen direct loads onto the write-optimized storage preset. +- Reused SQL row buffers before `InsertBatch` copies. +- Documented the DataGen fast-path choices and capped direct collection + document target sizes to stay within the current inline collection payload + envelope. + +### Benchmarks And Roadmap Close-Out + +- Marked the current advanced cost-based query optimizer and async I/O batching + roadmap phases as completed current-phase work. +- Kept adaptive reoptimization and public histogram inspection documented as + separate planned/follow-up items where appropriate. +- Added optimizer close-out diagnostics for heavy-hitter equality, histogram + range estimates, composite-prefix correlation, and bounded join reordering. +- Added async I/O close-out diagnostics for save/backup/restore, vacuum and FK + logical rewrites, database inspector scans, and live WAL inspector scans. +- Refreshed roadmap, query/durable-write docs, async I/O audit notes, benchmark + catalog, README performance tables, and release-core manifest metadata with + the May 6 benchmark baseline. +- Added a BenchmarkDotNet WAL point-read benchmark for primary-key reads across + WAL-backed and checkpointed states at 100, 1k, 5k, and 10k target frames. +- Updated `Compare-Baseline.ps1` so performance checks can compare a + configurable time metric such as `P95` through `metricColumn` while keeping + `Mean` as the default. ### Tests And Benchmarks -- Added registry and SQL coverage for case-insensitive lookup, duplicate and - built-in collision rejection, null propagation, deterministic metadata, - missing functions, thrown functions, rollback behavior, triggers, and stored - SQL procedures. -- Added direct-client, Admin Forms, Admin Reports, pipeline validation, and - pipeline runtime tests for registered scalar functions. -- Added command-registry, form-event dispatcher, event JSON round-trip, and - Forms data-entry tests for create/update/delete event dispatch and - before-event cancellation. -- Added designer-state tests for action sequences, plus command-button and - control-event tests covering event binding preservation and registered - command invocation from rendered forms. -- Added Forms action-sequence tests for event dispatch, mutable record updates, - command button action-only clicks, and JSON round-tripping. -- Added report-event dispatcher and preview lifecycle tests, pipeline hook - serialization/validation/orchestrator tests, and shared command argument - conversion tests. -- Added automation metadata tests covering manifest extraction, JSON - round-tripping, repository persistence/backfill, pipeline package - import/export, and stale package metadata validation. -- Added async command and timeout coverage for the command registry, Admin - Forms dispatcher, Admin Reports dispatcher, and pipeline hook orchestration. -- Added Forms built-in action tests covering rendered command-button dispatch, - next/previous/go-to navigation, and create/save/refresh/delete workflows. -- Added conditional action tests for skip/run behavior, condition failure, - rendered built-in action skipping, metadata propagation, and JSON - round-tripping. -- Added parser, planner, SQL execution, direct-client, procedure, query paging, - and Admin completion tests for tableless `SELECT`, built-in scalar functions, - BLOB parameter round-tripping, and tableless-safe callback autocomplete. -- Added Admin callback catalog tests for tableless callback metadata. -- Added Admin Forms formula evaluator tests for the built-in function catalog - and Access-style formula helpers. -- Same-machine affected benchmark comparison against the pre-feature HEAD - baseline showed no material regression in the main write/query guardrails: - -| Suite | Worst current change | Best current change | -|-------|---------------------:|--------------------:| -| Insert | `+3.76%` | `-3.38%` | -| Join | `+6.65%` | `-6.93%` | -| PointLookup | `+5.15%` | `-9.25%` | -| QueryPlanCache | `+1.62%` | `-4.45%` | -| ScanProjection | `+0.20%` | `-18.12%` | -| TriggerDispatch | `+0.77%` | `-4.52%` | -| BatchEvaluation | `+10.53%` | `-10.36%` | - -The one notable row was the synthetic BatchEvaluation delegate -filter/projection case at `+10.53%`; its paired specialized path improved by -`-10.36%`, allocations were unchanged, and the affected guardrail suites were -otherwise neutral to improved. +- Added parser, catalog, planner, client, HTTP, and benchmark coverage for + planner diagnostics and `EXPLAIN ESTIMATE`. +- Added adaptive reoptimization engine/operator tests, ADO.NET option tests, + and the `AdaptiveReoptimizationBenchmark` diagnostic suite. +- Added regression coverage for bounded simple-view row-goal planning, late + unindexed detail joins, purchase-order style join chains, and Admin DataGrid + view paging SQL generation. +- Added tests for right-side join predicates, unique text-filter lookup joins, + and SQL result column type metadata. +- Added benchmark suites for optimizer close-out, async I/O close-out, planner + catalog diagnostics, and WAL point-read regression analysis. ### Validation -- `git status --short --branch` -- `dotnet restore CSharpDB.slnx` -- `.\scripts\Test-NoLegacyCoreReferences.ps1` - - Passed through the script's PowerShell fallback after the local packaged - `rg.exe` could not be launched normally in this desktop environment. -- `dotnet build CSharpDB.slnx -c Release --no-restore` - - Passed with `0` warnings and `0` errors. -- `dotnet test CSharpDB.slnx -c Release --no-build -m:1 -- RunConfiguration.DisableParallelization=true` - - Non-parallel unit test run passed with `1,663` tests. -- Phase 5 local validation used `dotnet build CSharpDB.slnx --no-restore -m:1` - and `dotnet test CSharpDB.slnx --no-build -m:1 -- RunConfiguration.DisableParallelization=true` - - Debug non-parallel unit test run passed with `1,703` tests after adding - automation metadata coverage. -- Phase 6A async-command hardening validation used - `dotnet build CSharpDB.slnx --no-restore -m:1` and - `dotnet test CSharpDB.slnx --no-build -m:1 -- RunConfiguration.DisableParallelization=true` - - Debug non-parallel unit test run passed with `1,709` tests. -- Phase 6B built-in form action validation used - `dotnet build CSharpDB.slnx --no-restore -m:1` and - `dotnet test CSharpDB.slnx --no-build -m:1 -- RunConfiguration.DisableParallelization=true` - - Debug non-parallel unit test run passed with `1,712` tests. -- Phase 6C conditional form action validation used - `dotnet build CSharpDB.slnx --no-restore -m:1` and - `dotnet test CSharpDB.slnx --no-build -m:1 -- RunConfiguration.DisableParallelization=true` - - Debug non-parallel unit test run passed with `1,715` tests. -- `dotnet pack` smoke for the release workflow packages with - `-p:Version=3.6.0` - - Produced `11` local packages: - `CSharpDB`, `CSharpDB.Client`, `CSharpDB.Data`, `CSharpDB.Engine`, - `CSharpDB.EntityFrameworkCore`, `CSharpDB.Execution`, - `CSharpDB.Pipelines`, `CSharpDB.Primitives`, `CSharpDB.Sql`, - `CSharpDB.Storage`, and `CSharpDB.Storage.Diagnostics`. -- `.\scripts\Publish-CSharpDbDaemonRelease.ps1 -Version 3.6.0 -Runtime win-x64 -OutputRoot artifacts\daemon-release-local` - - Produced `csharpdb-daemon-v3.6.0-win-x64.zip` and `SHA256SUMS.txt`. -- Latest tableless/callback stabilization validation used - `dotnet test .\CSharpDB.slnx -m:1 --no-restore -v:minimal /nr:false /p:UseSharedCompilation=false /p:TestTfmsInParallel=false -- RunConfiguration.DisableParallelization=true` - - Debug non-parallel unit test run passed with `1,877` tests. - -### Review Notes - -- The highest-risk runtime changes are in expression evaluation and planner - plumbing: custom functions are intentionally kept off the index-pushdown and - batch-fast-path planning assumptions in V1. -- Remote hosts must register functions in the daemon/API host process; direct - clients can register functions locally through `DirectDatabaseOptions`, but - callback delegates are never serialized over HTTP or gRPC. -- Admin Forms and Reports use the shared registries, but their formula and - automation surfaces remain narrower than SQL or stored macro systems: - formulas stay expression-focused, command hooks invoke host-owned code by - name, and declarative action sequences store only limited action metadata - rather than executable scripts in database metadata. -- `SELECT ...` without `FROM` is represented internally as a single-row source - and is intended for scalar expressions that do not need row context. -- `CanRunWithoutFrom` is currently discovery metadata for the Admin catalog and - SQL editor autocomplete; it is not yet a hard runtime denial gate for manually - typed tableless callback calls. - -## v3.5.0 - -v3.5.0 focuses on the collection binary payload fast path, generated -collection codec performance, targeted UTF-8 span plumbing, and refreshed -release benchmark publishing. It also includes the confirmed CSharpDB Studio -admin UI access-parity notes and static mockups that are part of this branch. - -### Collection Binary Payload Fast Path - -- Added the opt-in source-generated collection fast path for fixed generated - field order, compact type/null metadata, and raw value payloads. -- Existing non-generated collection paths continue to use their current JSON - and binary document behavior. -- Generated collection models can now use direct binary record encode/decode - instead of routing through the slower document-shaped path. -- `CollectionDocumentCodec` now parses direct binary payload headers once - and decodes the key/document from that parsed header. -- `CollectionBinaryDocumentCodec` now has single-segment - `ReadOnlySpan` lookup overloads for top-level binary document fields. -- `CollectionPayloadCodec` fast header parsing now favors the common binary - payload marker path. -- Collection field and index bindings now expose generated accessors to faster - direct field-reader paths where available. - -### UTF-8 Text Index And Compare Paths - -- Added targeted UTF-8 span plumbing for collection text index/read/compare - paths. -- Reduced transient allocations in top-level string property reads by avoiding - per-call `byte[]` and `byte[][]` path materialization where the - single-segment path applies. -- Updated ordered text index key comparison coverage. - -### Tests And Benchmarks - -- Added `GeneratedCollectionCodecBenchmarks`. -- Expanded generated collection model tests around binary payload support. -- Expanded binary document codec tests for direct field access. -- Added ordered text index key codec coverage. -- Refreshed `tests/CSharpDB.Benchmarks/README.md` and - `release-core-manifest.json` from the April 26, 2026 release-core artifacts. -- Recorded the collection binary payload investigation, noisy initial guardrail - compare, focused retry, and final passing release guardrail in - `tests/CSharpDB.Benchmarks/HISTORY.md`. - -Focused collection investigation: - -| Metric | Result | -|--------|-------:| -| Matched rows vs same-machine HEAD baseline | `60` | -| Faster matched rows | `50` | -| Slower matched rows | `10` | -| Median matched speedup | `+4.1%` | -| Mean matched speedup | `+4.8%` | - -Focused recovery highlights: - -| Benchmark | Before | After | Change | -|-----------|-------:|------:|-------:| -| Collection field read, missing field | `223.22 ns` | `136.73 ns` | `+38.7%` | -| Collection decode, direct payload | `333.80 ns` | `155.20 ns` | `+53.5%` | -| Collection field read, early field | `108.60 ns` | `49.77 ns` | `+54.2%` | -| Collection field compare, late text field | `159.55 ns` | `97.60 ns` | `+38.8%` | -| Collection field compare, bound accessor | `128.03 ns` | `92.83 ns` | `+27.5%` | - -Path-index follow-up highlights: - -| Benchmark | Final vs same-machine HEAD baseline | -|-----------|------------------------------------:| -| Nested path equality via `FindByIndex` | `+53.6%` | -| Nested path equality via `FindByPath` | `+55.9%` | -| Array path equality via `FindByIndex` | `+52.1%` | -| Text range path lookup | `+42.7%` | -| Guid equality path lookup | `+59.4%` | - -Published scorecard examples from the refreshed benchmark README: - -| Area | Result | -|------|-------:| -| SQL file-backed single insert | `450.4 ops/sec` | -| SQL file-backed batch x100 | `41.88K rows/sec` | -| Collection file-backed put | `447.3 ops/sec` | -| Collection file-backed batch x100 | `42.28K docs/sec` | -| Collection hot point get | `1.60M ops/sec` | -| CSharpDB InsertBatch B1000 | `233.06K rows/sec` | - -### CSharpDB Studio Admin UI Notes - -- Added admin forms access-parity notes. -- Added admin reports access-parity notes. -- Added static CSharpDB Studio admin UI mockups under `www/admin-ui-mockups`. -- Included dashboard, data, query, heavy operations, reports designer, mobile - forms/reports, command palette, sidebar, and shared styling mockups. - -### Validation - -- `dotnet build CSharpDB.slnx -c Release --no-restore` -- `dotnet test CSharpDB.slnx -c Release --no-build -m:1 -- RunConfiguration.DisableParallelization=true` -- Non-parallel unit test run passed with `1,652` tests. -- `dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --release-core --repeat 3 --repro` +- `dotnet test .\CSharpDB.slnx -c Release` +- `dotnet build .\CSharpDB.slnx -c Release --no-restore` +- `dotnet test .\CSharpDB.slnx -c Release --no-build` +- `dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --optimizer-closeout --repro` +- `dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --async-io-closeout --repro` - `pwsh -NoProfile .\tests\CSharpDB.Benchmarks\scripts\Run-Perf-Guardrails.ps1 -Mode release` -- `pwsh -NoProfile .\tests\CSharpDB.Benchmarks\scripts\Compare-Baseline.ps1 -ThresholdsPath .\tests\CSharpDB.Benchmarks\perf-thresholds.json -CurrentMicroResultsDir .\tests\CSharpDB.Benchmarks\results\.tmp-current-micro-run -ReportPath .\tests\CSharpDB.Benchmarks\results\perf-guardrails-last.md` -- `pwsh -NoProfile .\tests\CSharpDB.Benchmarks\scripts\Update-BenchmarkReadme.ps1 -RunManifest .\tests\CSharpDB.Benchmarks\release-core-manifest.json` -- `Get-Content -Raw .\tests\CSharpDB.Benchmarks\release-core-manifest.json | ConvertFrom-Json` -- `pwsh -NoProfile .\tests\CSharpDB.Benchmarks\scripts\Update-BenchmarkReadme.ps1 -RunManifest .\tests\CSharpDB.Benchmarks\release-core-manifest.json -DryRun` -- `git diff --check -- tests\CSharpDB.Benchmarks\README.md tests\CSharpDB.Benchmarks\HISTORY.md tests\CSharpDB.Benchmarks\release-core-manifest.json` - -Release benchmark guardrail result: - -```text -Compared 185 rows against baseline. PASS=185, WARN=0, SKIP=0, FAIL=0 -``` + - Reported `PASS=187, WARN=0, SKIP=0, FAIL=0`. +- `dotnet build .\src\CSharpDB.Admin\CSharpDB.Admin.csproj -c Release` +- `dotnet test .\tests\CSharpDB.Admin.Forms.Tests\CSharpDB.Admin.Forms.Tests.csproj -c Release --filter FullyQualifiedName~DataGridTests` +- `dotnet test .\tests\CSharpDB.Tests\CSharpDB.Tests.csproj -c Release --filter FullyQualifiedName~SimpleViewLateUnindexedDetailJoinWithLimit|FullyQualifiedName~JoinChainWithLimit|FullyQualifiedName~PurchaseOrderLineJoinChainWithLimit` +- Targeted generated collection tests, trim smoke publish, DataGen Release + build, relational direct-load smoke, and document direct-load smoke passed. +- `dotnet build tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -c Release --no-restore` +- `dotnet run -c Release --project tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --micro --filter *WalPointReadBenchmarks* --job Dry` +- Focused lookup-join and SQL column type tests passed for: + `Join_WithWhereOnRightPrimaryKeyLookupSide_AppliesPredicate`, + `Join_WithUniqueTextFilterAndIndexedDependentSide_UsesLookupJoins`, and + `ExecuteSqlAsync_ReturnsQueryColumnTypes`. ### Review Notes -- The highest-risk runtime changes are in the generated collection model and - collection payload codec paths: `CollectionModelGenerator`, - `CollectionPayloadCodec`, `CollectionBinaryDocumentCodec`, - `CollectionDocumentCodec`, `CollectionIndexedFieldReader`, and collection - index binding. -- The generator diff is intentionally large because generated code now owns the - binary record encode/decode shape for opt-in generated models. -- The benchmark README generated region should be edited through - `release-core-manifest.json` plus `scripts/Update-BenchmarkReadme.ps1`, not - manually. +- Adaptive query reoptimization is opt-in and intentionally leaves default + planning behavior unchanged. +- `EXPLAIN ESTIMATE` and `sys.planner_*` are diagnostic surfaces; normal select + planning should not depend on the diagnostic rowset materialization path. +- Simple view row-goal reordering is scoped to eligible join chains and bounded + paging shapes. +- The SQL result DTO column type addition is additive for API/client callers. +- The new WAL point-read benchmark provides a stable current signal, but it + needs a captured historical baseline before it can be used as a release + regression gate. diff --git a/docs/architecture.md b/docs/architecture.md index 981ce3a0..a7a0f8cf 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -333,7 +333,7 @@ On database open, all schemas are loaded into in-memory dictionaries for fast lo | **CTEs** | `WITH ... AS (...) SELECT ...` | | **Set operations** | `UNION`, `INTERSECT`, `EXCEPT` (inside top-level queries, views, and CTE bodies) | | **Subqueries** | Scalar subqueries, `IN (SELECT ...)`, `EXISTS (SELECT ...)`, correlated evaluation in `WHERE`, non-aggregate projection, `UPDATE`/`DELETE` expressions | -| **Statistics** | `ANALYZE [table]` — refreshes `sys.table_stats` and `sys.column_stats` | +| **Statistics** | `ANALYZE [table]` — refreshes `sys.table_stats`, `sys.column_stats`, and public planner-stat diagnostics under `sys.planner_*` | | **Identity** | `INTEGER PRIMARY KEY IDENTITY` — auto-increment columns with persisted high-water mark | | **Distinct** | `SELECT DISTINCT`, `DISTINCT` inside aggregates | diff --git a/docs/performance.md b/docs/performance.md index c2e94b03..875886ff 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -189,7 +189,7 @@ This is the "real SQL" scenario: non-trivial joins, selective predicates, and pl - Create indexes on join keys and on selective filter columns. - Run `ANALYZE` after bulk loads and after major distribution changes. -- Inspect `sys.table_stats` and `sys.column_stats` when a plan is not behaving as expected. +- Inspect `sys.table_stats`, `sys.column_stats`, `sys.planner_histograms`, `sys.planner_heavy_hitters`, and `sys.planner_index_prefix_stats` when a plan is not behaving as expected. Use `EXPLAIN ESTIMATE FOR ` to see which stats were used or ignored without executing the query. See [Debugging Slow Queries With EXPLAIN ESTIMATE](query-execution-pipeline.md#debugging-slow-queries-with-explain-estimate) for how to interpret the Plan rows. ```sql CREATE INDEX idx_orders_customer ON orders(customer_id); @@ -295,6 +295,7 @@ This is narrower than normal ingest: several writer tasks are hitting the same s - Keep the baseline at `UseWriteOptimizedPreset()`. - For `8` in-process writers, benchmark WAL preallocation first. - For `4` in-process writers, benchmark a small durable batch window first. +- For concurrent one-row insert commits that cannot be application-batched, benchmark `ImplicitInsertExecutionMode.ConcurrentWriteTransactions` with a small durable batch window. - Do not cargo-cult batch-window tuning from the shared-writer case into the single-writer case. ### Why @@ -305,6 +306,12 @@ The current concurrent write diagnostics say: - close followers: `W8_Batch250us` at `1070.4`, `W8_Batch0` at `1068.2` - best `4`-writer row: `W4_Batch250us` at about `553.4` commits/sec +The insert fan-in diagnostics now split the insert shapes: + +- disjoint explicit keys remain the easiest case: `AutoCommitConcurrent_ExplicitIdDisjoint_W8_Batch250us` reached about `1,754 commits/sec` with `commitsPerFlush = 3.99` +- hot explicit right-edge rows: `AutoCommitConcurrent_ExplicitId_W8_Batch250us` reached about `1,433 commits/sec` with `commitsPerFlush = 3.29` +- hot auto-generated IDs: `AutoCommitConcurrent_AutoId_W8_Batch250us` reached about `1,441 commits/sec` with `commitsPerFlush = 3.32` + But the single-writer durable diagnostics say the opposite for batch windows: - `BatchWindow(250us)`: about `267.2` ops/sec @@ -314,6 +321,7 @@ So: - shared-writer contention can justify small wait windows or preallocation - single-writer durable paths usually cannot +- `InsertBatch` and explicit transaction batching remain the high-throughput ingest path when the application can group rows ## Scenario 7: Cold File Reads, Tooling, And Cache-Pressured Workloads diff --git a/docs/query-and-durable-write-performance/README.md b/docs/query-and-durable-write-performance/README.md index 5f14e77c..a4ebf0fd 100644 --- a/docs/query-and-durable-write-performance/README.md +++ b/docs/query-and-durable-write-performance/README.md @@ -19,30 +19,50 @@ This note tracks the combined optimizer phase-2 and durable-write completion wor - Raw page-copy batching is shared for snapshot/export-style paths, and logical table-copy loops now share one reusable B-tree copy utility instead of each maintenance path carrying its own copy loop. - Opt-in durable group commit remains exposed through `UseDurableGroupCommit(...)`; this round keeps it expert-only and documentation-led rather than changing defaults. - Shared auto-commit non-insert SQL writes on one `Database` can now use the same isolated commit path as explicit `WriteTransaction` work, so low-conflict `UPDATE` / `DELETE` contention can build real pending WAL commit fan-in instead of stalling above the queue. +- The current advanced-optimizer phase is closed: heavy-hitter equality, histogram range, composite-prefix correlation, non-unique lookup costing, hash build-side choice, and bounded small-chain join reordering are covered by implementation, tests, and diagnostic close-out benchmarks. +- Opt-in adaptive join re-optimization is available for the current phase. When enabled, eligible joins can adapt before rows are emitted: index nested-loop joins can switch to hash joins after large observed outer-cardinality divergence, and inner hash joins can flip the build side when the planned build side is materially larger than expected. +- The current async I/O batching phase is closed: WAL, checkpoint, snapshot/export, backup/restore staging, logical rewrite, and inspector scan paths have been audited and covered by diagnostic close-out benchmarks. ## Public Surface - `sys.table_stats.row_count` keeps its existing numeric meaning. - `sys.table_stats.row_count_is_exact` is the new explicit exactness bit. -- Histogram and prefix stats remain internal in this round; there are still no public histogram system tables. +- `sys.planner_histograms`, `sys.planner_heavy_hitters`, and `sys.planner_index_prefix_stats` expose stable SQL projections over the internal planner statistics. +- `EXPLAIN ESTIMATE FOR ` returns a bounded diagnostic rowset showing estimate sources, stale/missing-stat fallbacks, lookup decisions, join estimates, and join-reorder choices without executing the target query. The practical debugging guide is in [Debugging Slow Queries With EXPLAIN ESTIMATE](../query-execution-pipeline.md#debugging-slow-queries-with-explain-estimate). +- `DatabaseOptions.AdaptiveQueryReoptimization` is disabled by default. Enable it explicitly with `EnableAdaptiveQueryReoptimization(...)` for direct embedded hosts that have measured stale-stat or parameter-sensitive join regressions. +- ADO.NET direct embedded connections can opt in with `Adaptive Query Reoptimization=true` when no explicit `DirectDatabaseOptions` is supplied. Remote `Endpoint` connections reject that key because remote hosts must enable adaptive behavior server-side. - `UseWriteOptimizedPreset()` remains the default recommendation for durable file-backed workloads. - `UseLowLatencyDurableWritePreset()` and `UseDurableGroupCommit(...)` remain opt-in measure-first knobs. - Shared-`Database` implicit auto-commit is now split by workload shape: - non-insert SQL writes can queue behind the WAL pending-commit path and benefit from `UseDurableGroupCommit(...)` - - hot insert loops still use the legacy serialized path today, so they should still be benchmarked as insert-specific workloads rather than assumed to coalesce + - one-row insert loops can opt into `ImplicitInsertExecutionMode.ConcurrentWriteTransactions`; hot right-edge and auto-ID rows now use shared row-id reservation plus pending leaf-page rebases to build WAL fan-in -## Remaining Work +## Close-Out Validation -- Adaptive runtime re-optimization is still future work. -- Histogram inspection remains internal; there is no SQL surface for planner histogram dumps yet. +The current advanced-optimizer and async I/O batching phases are backed by diagnostic benchmark suites rather than new public API: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --optimizer-closeout --repro +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --adaptive-reoptimization --repro +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --async-io-closeout --repro +``` + +The May 5, 2026 optimizer close-out run showed `ANALYZE`-driven plans improving the targeted shapes by `1.06x-1.89x` on the local runner. The async I/O close-out run classified save/backup/restore as already batched, vacuum/FK migration as intentionally row-logical through `BTreeCopyUtility`, and inspector/WAL scans as specialized diagnostics. + +The adaptive re-optimization benchmark is diagnostic rather than a release-core scorecard row. It reports eligible query count, attempts, successful switches, rejected switches, divergence events, buffered rows, and fail-closed reasons for disabled baseline, enabled no-switch, stale-stat fan-out, parameter-sensitive skew, hash build-side shapes, and synthetic switch-counter rows. + +## Future Work + +- `EXPLAIN ANALYZE`, runtime actual-row plan output, adaptive stats persistence, automatic `ANALYZE`, and arbitrary mid-plan reordering remain future work. +- Raw histogram/prefix storage payloads remain internal; future diagnostics should extend stable SQL projections or add typed DTOs deliberately rather than exposing storage encodings. - Durable group-commit guidance should keep following benchmark evidence, especially: - single-writer no-regression checks - 4-writer and 8-writer shared-`Database` contention runs - queue-depth, commits-per-flush, and latency percentile diagnostics - The remaining phase-4 write-path question is now narrower than "shared auto-commit in general": - non-insert shared auto-commit fan-in is working - - hot insert auto-commit still needs a dedicated design decision if we want it to coalesce without reopening structural conflict costs -- Async I/O batching still has room for more auditing outside the WAL hot path, but the main write-path batching pieces are already in place. See [Async I/O Batching Follow-Up](async-io-batching-follow-up.md). + - hot insert auto-commit now has an opt-in concurrent path, but bulk ingest guidance still starts with application-level batching +- Async I/O batching is done for the current phase; future work should be limited to specialized diagnostics or maintenance-path tuning when benchmark data justifies it. See [Async I/O Batching Follow-Up](async-io-batching-follow-up.md). ## Phase 4 Status @@ -55,26 +75,25 @@ Current measured status: - The same rerun showed shared auto-commit `W8` at about `743 commits/sec` with `commitsPerFlush = 3.37`. - That is now effectively in the same commit-fan-in band as the explicit `WriteTransaction` disjoint-update rows on the same runner. -2. The gain is intentionally scoped. - - Hot auto-commit inserts still use the legacy serialized path. - - The focused `insert-fan-in-diagnostics-20260411-165557.csv` rerun still kept every insert scenario at `commitsPerFlush = 1.00`. - - Shared auto-commit explicit-id inserts were about `458 commits/sec` at `W8`, auto-generated-id inserts were about `449 commits/sec`, and explicit `WriteTransaction` inserts were still only about `438 commits/sec` with explicit ids and about `413 commits/sec` with auto-generated ids. - - The shared row-id reservation pass removed the earlier duplicate-key failures from the explicit auto-generated-id rows, but it did not unlock any update-style fan-in. The remaining insert-side limitation is still structural rather than just "missing WAL fan-in." - - A rebuilt April 12, 2026 spot-check of `ExplicitTx_AutoId_W8_Batch250us` also landed at `441-445 commits/sec` with `extraAttempts = 0` and `dirtyParentRecoveries = 0` across `insert-fan-in-scenario-ExplicitTx_AutoId_W8_Batch250us-20260412-034728.csv` and `insert-fan-in-scenario-ExplicitTx_AutoId_W8_Batch250us-20260412-034745.csv`, so the earlier retry tail no longer reproduces on the current binaries. - - The current phase-4 result should therefore be read as "shared non-insert auto-commit fan-in works" rather than "every auto-commit workload now coalesces." +2. Insert fan-in is now opt-in and shape-specific. + - `ConcurrentWriteTransactions` keeps serialized inserts as the default but lets shared auto-commit insert loops use explicit write-transaction state. + - The shared row-id reservation path reserves monotonic in-memory ranges, publishes only committed high-water metadata, and tolerates rollback/retry gaps without duplicates. + - Pending leaf-page rebases let hot right-edge writers merge insert-only deltas against staged WAL images instead of waiting for the earlier commit to publish. + - A May 5, 2026 spot-check of `AutoCommitConcurrent_AutoId_W8_Batch250us` reached about `1,441 commits/sec` with `commitsPerFlush = 3.32`. + - The same spot-check put `AutoCommitConcurrent_ExplicitId_W8_Batch250us` at about `1,433 commits/sec` with `commitsPerFlush = 3.29`, and `AutoCommitConcurrent_ExplicitIdDisjoint_W8_Batch250us` at about `1,754 commits/sec` with `commitsPerFlush = 3.99`. 3. Defaults and presets should still stay where they are for now. - - The engine now has a real measured win for shared non-insert contention. - - It does not yet have a blanket insert-path win that would justify changing default batch-window guidance. - - `UseDurableGroupCommit(...)` should remain an opt-in measured knob until the insert-side question is settled. + - The engine now has measured wins for shared non-insert contention and opt-in concurrent one-row insert commits. + - This still does not replace `InsertBatch` or explicit transaction batching for bulk ingest. + - `UseDurableGroupCommit(...)` and `ConcurrentWriteTransactions` should remain opt-in measured knobs. Next clean steps: -1. Keep hot inserts on the current path until there is a durable row-id reservation plus right-edge insert strategy for concurrent implicit inserts. +1. Keep serialized inserts as the default until the broader release guardrails continue to show no single-writer or batch-ingest regression. 2. Keep the compact validation matrix small: - single-writer no-regression - shared non-insert auto-commit `W4` / `W8` - explicit `WriteTransaction` disjoint updates - hot insert auto-commit contention -3. If insert-side fan-in is revisited later, keep the new shared reservation correctness path and start the next pass with durable row-id reservation plus a right-edge insert strategy before touching defaults. -4. Do not change default batch windows or preset recommendations until the insert-side behavior is intentionally resolved. +3. Keep measuring disjoint explicit keys, hot explicit right-edge rows, and hot auto-generated IDs as separate shapes. +4. Do not change default batch windows or preset recommendations based on the shared-writer case alone. diff --git a/docs/query-and-durable-write-performance/async-io-batching-follow-up.md b/docs/query-and-durable-write-performance/async-io-batching-follow-up.md index 65c104bb..66e37e38 100644 --- a/docs/query-and-durable-write-performance/async-io-batching-follow-up.md +++ b/docs/query-and-durable-write-performance/async-io-batching-follow-up.md @@ -1,11 +1,14 @@ # Async I/O Batching Follow-Up -This note captures the remaining work behind the roadmap item currently marked -`In Progress`. +This note captures the close-out audit for the roadmap item now marked `Done` +for the current phase. It does not claim that every possible maintenance or +diagnostic path has been optimized forever; it records which broad storage paths +are covered and where future work should be limited. -## Current Shipped State +## Shipped State -The hot storage write path already has the main batching pieces in place: +The hot storage write path and the main large-copy paths have the batching +pieces needed for this phase: - WAL commits can append dirty pages through `AppendFramesAndCommitAsync(...)`. - Repeated `AppendFrameAsync(...)` calls inside a transaction are staged and emitted as chunked WAL writes during `CommitAsync(...)`. @@ -13,33 +16,46 @@ The hot storage write path already has the main batching pieces in place: - `SaveToFileAsync(...)` and backup-style snapshot copies use `StorageDeviceCopyBatcher`. - Vacuum and foreign-key migration rewrites share `BTreeCopyUtility` instead of each owning a separate row-copy loop. -## Remaining Work - -The remaining work is an audit and measurement pass, not a missing core WAL batching primitive. - -1. Audit non-hot rewrite/export paths: - - Backup and restore: `DatabaseBackupCoordinator`, `Database.SaveToFileAsync(...)`, `Pager.SaveToFileAsync(...)`. - - Vacuum: `DatabaseMaintenanceCoordinator.CopyDatabaseAsync(...)`. - - Foreign-key migration rewrites: `DatabaseForeignKeyMigrationCoordinator.ApplyPlanAsync(...)`. - - Storage diagnostics and inspectors: `WalInspector`, `InspectorEngine`, and large sequential read paths. - - Admin/UI export helpers only if they prove relevant to large database-file movement. - -2. Decide whether `BTreeCopyUtility` should stay a logical row-copy helper or grow batching behavior: - - Current behavior is cursor-read plus per-row `InsertAsync(...)`. - - Possible follow-up: add row/page-local batching, reusable buffers, or progress hooks. - - Avoid direct B-tree page copying unless catalog, root-page, schema, freelist, row-count, and index invariants are explicitly preserved. - -3. Add benchmark/diagnostic coverage before optimizing further: - - Large backup snapshot. - - Large restore staging. - - Large vacuum rewrite. - - Large foreign-key migration rewrite. - - Inspector/diagnostic scans over large DB/WAL files. - -4. Define completion criteria: - - The audit identifies every large sequential file-copy and logical table-copy path. - - Each path is classified as already batched, intentionally unbatched, or worth optimizing. - - Any optimized path has before/after timings and no crash-recovery or integrity regression. +## Close-Out Classification + +| Path | Classification | Notes | +|---|---|---| +| WAL commit frame writes | Already batched | Dirty pages are staged and emitted through chunked WAL writes before commit publication. | +| Checkpoint page copies | Already batched | Contiguous page writes are grouped when checkpointing back into the main database file. | +| `Database.SaveToFileAsync(...)` / `Pager.SaveToFileAsync(...)` | Already batched | Snapshot copies use `StorageDeviceCopyBatcher` for large sequential movement. | +| `DatabaseBackupCoordinator` backup and restore staging | Already batched | Backup/save and restore staging stay on the snapshot-copy path rather than bespoke small writes. | +| Vacuum rewrite | Intentionally logical/unbatched | Uses `BTreeCopyUtility` to preserve catalog, root-page, schema, freelist, row-count, and index invariants. | +| Foreign-key migration rewrite | Intentionally logical/unbatched | Keeps row-logical copying so FK metadata and index rebuild invariants stay explicit. | +| `DatabaseInspector` / `WalInspector` | Benchmarked diagnostic path | Kept outside the WAL hot path; future work can tune specialized scans if diagnostics become a bottleneck. | +| Admin/UI export helpers | Already covered or out of scope | File-copy style exports should use the existing snapshot-copy helpers; UI orchestration is not a storage batching primitive. | + +## Benchmark Coverage + +The diagnostic close-out suite is available through: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --async-io-closeout --repro +``` + +The suite covers large save/backup, restore staging, vacuum/FK logical rewrite, +checkpoint-adjacent maintenance data movement, and inspector/WAL scan paths. The +rows remain diagnostic rather than release-blocking because they are sensitive to +local storage and dataset size. + +The May 5, 2026 close-out run is recorded in +`tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/async-io-closeout-20260505-204638.csv`. +It measured save/backup/restore as `52,762`, `8,136`, and `9,996` pages/sec, +vacuum logical rewrite at `3,365` pages/sec, FK migration rewrite at `42,749` +rows/sec, database inspector scan at `18,600` pages/sec, and WAL inspector scan +at `2,310` frames/sec over a live 20-frame WAL. + +## Future Boundaries + +Future work should stay narrow: + +- Tune specialized diagnostic or maintenance scans only when benchmark data shows material small-read overhead. +- Keep `BTreeCopyUtility` row-logical unless a specific path can preserve catalog, root-page, schema, freelist, row-count, and index invariants without special-case risk. +- Do not treat async I/O batching as a durable group-commit or insert fan-in substitute. ## Non-Goals diff --git a/docs/query-execution-pipeline.md b/docs/query-execution-pipeline.md index 906e105c..35228a04 100644 --- a/docs/query-execution-pipeline.md +++ b/docs/query-execution-pipeline.md @@ -126,6 +126,12 @@ to estimate result sizes and guide operator selection. | Histogram buckets | Per column | 16 quantile-based buckets for range estimation | | Prefix distinct counts | Per index | Distinct values for each prefix of a composite index | +These planner statistics are inspectable through stable SQL projections: +`sys.planner_histograms`, `sys.planner_heavy_hitters`, and +`sys.planner_index_prefix_stats`. `EXPLAIN ESTIMATE FOR ` reports +which of those sources were used or ignored for a selected query shape without +executing the query. + **Estimation methods:** - **Lookup selectivity:** `tableRows / distinctCount` for uniform distribution, adjusted @@ -137,6 +143,158 @@ to estimate result sizes and guide operator selection. Without statistics, the planner uses heuristic fallback estimates. +### Debugging Slow Queries With EXPLAIN ESTIMATE + +The Admin query tab's **Estimate** button runs `EXPLAIN ESTIMATE FOR ` and shows +the returned diagnostic rows in the **Plan** tab. You can also run the statement directly: + +```sql +EXPLAIN ESTIMATE FOR +SELECT * +FROM orders o +JOIN customers c ON c.id = o.customer_id +WHERE o.status = 'open' +LIMIT 25; +``` + +This is a planner diagnostic, not a runtime profile. It does not execute the target query, +scan user rows to measure actual cardinality, or report per-operator wall-clock time. It +explains what information the planner used, what it ignored, and why it preferred or +rejected common access paths. + +**Result columns:** + +| Column | Meaning | +|--------|---------| +| `node_id` | Diagnostic row identifier. | +| `parent_node_id` | Parent diagnostic row, so related decisions can be grouped into a tree. | +| `node_kind` | Planner area being described, such as `source`, `filter`, `join`, `lookup`, `join-reorder`, or `estimate-source`. | +| `target` | Table, index, predicate, join, or query fragment for the row. | +| `decision` | The planner choice or finding. Examples include `table`, `index-lookup`, `hash-join`, `bounded-dp`, `heavy-hitter-equality`, `histogram-range`, and `ignored-stale-stats`. | +| `estimated_rows` | Estimated rows produced by this source, predicate, join, or decision. | +| `estimated_cost` | Relative planner cost used for comparison. It is not elapsed time. | +| `stats_source` | Statistics or metadata source behind the estimate, such as `sys.table_stats`, `sys.column_stats`, `sys.planner_heavy_hitters`, `sys.planner_histograms`, or `sys.planner_index_prefix_stats`. | +| `stats_state` | Whether the source was `fresh`, `exact`, `estimated`, `missing`, `stale-ignored`, `unsupported`, or otherwise informational. | +| `detail` | Stable text with the extra inputs behind the decision. | + +**What a healthy estimate usually looks like:** + +- Important base tables have `sys.table_stats` with `stats_state` of `exact` or + `estimated`. +- Selective equality predicates use `sys.column_stats` or `sys.planner_heavy_hitters`. +- Numeric range predicates use `sys.planner_histograms`. +- Correlated multi-column filters or joins use `sys.planner_index_prefix_stats`. +- Large joins use `hash-join`, `index-lookup`, `bounded-dp`, or another intentional + strategy rather than falling all the way to an unconstrained nested-loop fallback. +- `estimated_rows` drops after selective filters and after joins that should narrow the + result. + +**Red flags:** + +| Symptom in Plan rows | Likely meaning | First action | +|----------------------|----------------|--------------| +| `stats_state = missing` on important tables or columns | The planner is guessing. | Run `ANALYZE` for the table or database. | +| `stats_state = stale-ignored` or `decision = ignored-stale-stats` | Stats exist but were not trusted after table changes. | Run `ANALYZE table_name;`. | +| Large table source followed by `table-scan` when the predicate should be selective | Missing index, unsupported predicate shape, or missing stats. | Check indexes and rewrite predicate into a simple equality/range shape if possible. | +| Large join estimated with `nested-loop` fallback | No usable equi-join keys, no usable lookup index, or hash analysis could not extract keys. | Check join predicates and indexes on join keys. | +| `estimated_rows` stays close to the base table size after a selective predicate | Stats do not capture the selectivity, the predicate is unsupported, or values are stale. | Run `ANALYZE`, inspect heavy hitters/histograms, and check predicate shape. | +| `stats_source` is missing or heuristic on the expensive part of the query | Planner has little evidence for that choice. | Add stats with `ANALYZE`; consider an index if the predicate is selective. | +| Join reorder rows show fallback/greedy when a small inner-join chain was expected | Query shape is outside bounded reorder support or estimates are missing. | Check for outer joins, views/CTEs, unsupported predicates, or missing stats. | + +**Debugging workflow:** + +1. Run the query normally and note elapsed time and row count. +2. Click **Estimate** in Admin, or run `EXPLAIN ESTIMATE FOR ` manually. +3. Check `stats_state` first. If important rows are `missing` or `stale-ignored`, run: + + ```sql + ANALYZE; + ``` + + or target the affected table: + + ```sql + ANALYZE orders; + ``` + +4. Re-run **Estimate** and look for whether the planner now uses column stats, + heavy hitters, histograms, or composite-prefix stats. +5. Inspect large sources and joins: + - Is a large table scanned even though the predicate is selective? + - Is a join using hash or indexed lookup? + - Does `estimated_rows` shrink where the query should become selective? +6. Inspect the backing stats directly when an estimate looks wrong: + + ```sql + SELECT * FROM sys.table_stats WHERE table_name = 'orders'; + SELECT * FROM sys.column_stats WHERE table_name = 'orders'; + SELECT * FROM sys.planner_heavy_hitters WHERE table_name = 'orders'; + SELECT * FROM sys.planner_histograms WHERE table_name = 'orders'; + SELECT * FROM sys.planner_index_prefix_stats WHERE table_name = 'orders'; + ``` + +7. If stats are fresh but the plan still looks wrong, check query shape and indexes: + simple equality/range predicates and equi-joins are easiest for the planner to cost. + Expressions around indexed columns, non-equality joins, and unsupported predicates can + force fallback estimates or operators. + +### Public Planner Observability Phases + +The current `EXPLAIN ESTIMATE` and `sys.planner_*` surfaces are phase 1 of public planner +observability. They make the optimizer's existing statistics and costing choices visible, +but they intentionally do not change normal query planning or execution behavior. + +| Phase | Scope | Runtime performance impact | +|-------|-------|----------------------------| +| Phase 1: SQL-first planner diagnostics | Stable `sys.planner_histograms`, `sys.planner_heavy_hitters`, `sys.planner_index_prefix_stats`, and `EXPLAIN ESTIMATE FOR ` rowsets. | No direct gain expected. The value is explaining slow or surprising plans without adding overhead to ordinary queries. | +| Phase 2: Admin query debugger UX | Plan-tab warnings for missing/stale stats, clearer index chosen/rejected explanations, join-order visualization, copyable reports, and guided `ANALYZE` actions. | No direct engine gain expected. It shortens diagnosis time and reduces guesswork. | +| Phase 3: Runtime actuals / `EXPLAIN ANALYZE` | Execute the query and report actual rows, elapsed time, and estimate-vs-actual gaps per plan node. | Small overhead while profiling. The main value is identifying where estimates diverge from reality. | +| Phase 4: Adaptive join re-optimization | Opt-in adaptive wrappers for eligible joins. The current phase can switch lookup fan-out to hash join and flip inner hash build sides before rows are emitted. | Direct gains are possible for stale-stat, skewed, and parameter-sensitive joins. Default execution has no intended overhead because the feature is disabled unless the host opts in. | +| Phase 5: Broader public stats management | Typed .NET APIs, richer stats health reports, explicit stats refresh helpers, and expanded multi-column statistics inspection. | Mostly operational value. Performance gains come indirectly from keeping stats healthy and making bad plans easier to prevent. | + +Phase 1 is complete for the current public surface. Phase 4 is implemented for the current +join-focused opt-in scope. The next observability phase is still runtime actuals, because +`EXPLAIN ANALYZE` would make estimate-vs-actual gaps visible instead of relying on internal +adaptive counters. + +### Adaptive Join Re-Optimization + +Adaptive re-optimization is disabled by default. Direct embedded hosts can opt in: + +```csharp +var options = new DatabaseOptions() + .EnableAdaptiveQueryReoptimization(builder => builder + .WithDivergenceFactor(8) + .WithMinimumObservedRows(4096) + .WithMaxBufferedRows(65536) + .WithMaxReoptimizationsPerQuery(1)); +``` + +ADO.NET direct embedded connections can also use: + +```text +Data Source=app.db;Adaptive Query Reoptimization=true +``` + +Remote `Endpoint` connections reject that connection-string key because remote hosts must +enable adaptive behavior server-side. + +The current implementation adapts only at safe join boundaries before rows are emitted from +the adaptive join. Eligible shapes are inner and left joins with extractable equi-join keys +where both the original join and a semantic-preserving alternative are available. Unsupported +or risky shapes fail closed and keep the original plan, including cross joins, right joins, +compound queries, correlated subquery execution, and `SELECT *` cases where join reordering +could change visible column order. + +Use it when `EXPLAIN ESTIMATE` shows a plausible join plan but the real workload is known to +be stale-stat or parameter-sensitive. The expected wins are workload-shaped: stable plans and +single-table lookups should not improve, while bad join fan-out or wrong hash build-side cases +can improve materially. Benchmark diagnostics for this phase live behind: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --adaptive-reoptimization --repro +``` + ### Index Selection For each AND-separated predicate in the WHERE clause, the planner checks whether an diff --git a/docs/releases/v3.7.0-pr-notes.md b/docs/releases/v3.7.0-pr-notes.md new file mode 100644 index 00000000..4c9fb4b4 --- /dev/null +++ b/docs/releases/v3.7.0-pr-notes.md @@ -0,0 +1,116 @@ +## Summary + +This `version3.7.0` release completes the current optimizer and async I/O +close-out work, adds public planner observability, introduces opt-in adaptive +join reoptimization, and fixes the view/query paths that motivated the +customer-filtered fulfillment investigation. + +The query-planning work adds `EXPLAIN ESTIMATE` and `sys.planner_*` diagnostic +catalogs, then exposes those diagnostics in the Admin Query tab. It also adds +phase-one adaptive join reoptimization behind explicit opt-in settings so +eligible index nested-loop and hash joins can correct bad cardinality +assumptions before rows are emitted while default query behavior remains +unchanged. + +The view/admin work makes paged view browsing use bounded `LIMIT`/`OFFSET` +plans, lets simple view row-goal planning reorder eligible join chains before +the view operator tree is built, fixes Query tab grid scrolling, and preserves +SQL column type metadata across engine, API, client, and Admin result surfaces. + +The release also closes the current generated collection fast-path and DataGen +direct-load phase, refreshes release benchmark docs and scorecards, adds +optimizer/async I/O close-out diagnostics, and adds a dedicated WAL point-read +benchmark plus configurable guardrail metric comparison for sub-microsecond +latency rows. + +## Type of Change + +- [x] Bug fix +- [x] New feature +- [ ] Breaking change +- [x] Documentation update +- [x] Refactor / maintenance +- [ ] Tests only + +## Related Issues + +No issue numbers were linked for this release branch. Included work: + +- Added public planner diagnostics through `EXPLAIN ESTIMATE FOR SELECT`, WITH, + and compound queries. +- Added `sys.planner_*` virtual catalogs for histograms, heavy hitters, and + composite index prefix statistics. +- Added Admin Query tab estimate execution and Plan tab rendering. +- Added opt-in adaptive query reoptimization through `DatabaseOptions`, + `EnableAdaptiveQueryReoptimization(...)`, and direct embedded ADO.NET + connection strings. +- Added adaptive join diagnostics and fail-closed handling for unsupported or + risky shapes. +- Added row-goal reordering for eligible simple view join chains. +- Updated Admin DataGrid view paging to issue bounded `LIMIT`/`OFFSET` SQL. +- Fixed Query tab grid scrolling and pager layout. +- Improved lookup-join planning for indexed local predicates, residual + right-side filters, and index scan capacity hints. +- Propagated SQL result `ColumnTypes` through local engine transport, HTTP API, + HTTP client, and Admin DataGrid. +- Added a fulfillment sample `orders(customer_id)` lookup index. +- Closed the current generated collection fast-path phase and refreshed related + roadmap/static pages. +- Moved DataGen direct loads onto the write-optimized storage preset and + reduced direct-load row-buffer overhead. +- Added optimizer close-out and async I/O close-out diagnostic benchmark suites. +- Refreshed release benchmark README tables, benchmark catalog entries, and the + release-core manifest with the May 6 baseline. +- Added a WAL point-read BenchmarkDotNet suite and guardrail support for + comparing latency columns such as `P95` instead of only `Mean`. + +## Testing + +- [x] `dotnet build CSharpDB.slnx` +- [x] Relevant tests executed +- [x] Failure-path tests executed (if applicable: cancellation, invalid/unsupported inputs, non-`DbException` paths) +- [x] Manual verification performed (if applicable) + +Validation reported across the commit range: + +- `dotnet test .\CSharpDB.slnx -c Release` +- `dotnet build .\CSharpDB.slnx -c Release --no-restore` +- `dotnet test .\CSharpDB.slnx -c Release --no-build` +- `dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --optimizer-closeout --repro` +- `dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --async-io-closeout --repro` +- `pwsh -NoProfile .\tests\CSharpDB.Benchmarks\scripts\Run-Perf-Guardrails.ps1 -Mode release` + - Reported `PASS=187, WARN=0, SKIP=0, FAIL=0`. +- `dotnet build .\src\CSharpDB.Admin\CSharpDB.Admin.csproj -c Release` +- `dotnet test .\tests\CSharpDB.Admin.Forms.Tests\CSharpDB.Admin.Forms.Tests.csproj -c Release --filter FullyQualifiedName~DataGridTests` +- `dotnet test .\tests\CSharpDB.Tests\CSharpDB.Tests.csproj -c Release --filter FullyQualifiedName~SimpleViewLateUnindexedDetailJoinWithLimit|FullyQualifiedName~JoinChainWithLimit|FullyQualifiedName~PurchaseOrderLineJoinChainWithLimit` +- Targeted generated collection tests, trim smoke publish, DataGen Release + build, relational direct-load smoke, and document direct-load smoke. +- `dotnet build tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -c Release --no-restore` +- `dotnet run -c Release --project tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --micro --filter *WalPointReadBenchmarks* --job Dry` +- Focused lookup-join and SQL column type tests for: + `Join_WithWhereOnRightPrimaryKeyLookupSide_AppliesPredicate`, + `Join_WithUniqueTextFilterAndIndexedDependentSide_UsesLookupJoins`, and + `ExecuteSqlAsync_ReturnsQueryColumnTypes`. + +## Checklist + +- [x] I followed the project style and conventions. +- [x] I added or updated tests for behavior changes. +- [x] I covered both success and failure paths for changed behavior. +- [x] I updated docs for user-facing changes. +- [x] I verified no sensitive data was added. + +## Notes for Reviewers + +- Adaptive reoptimization is opt-in. Default query planning and execution remain + unchanged unless a host enables the feature. +- Remote endpoint connection strings reject the adaptive reoptimization key; + remote hosts must enable the feature server-side. +- `EXPLAIN ESTIMATE` and `sys.planner_*` are public diagnostic surfaces and + should stay isolated from normal select execution side effects. +- Simple view row-goal planning is intentionally scoped to eligible join chains + and bounded browsing shapes. +- The SQL result `ColumnTypes` DTO addition is additive, but client/UI consumers + should handle missing values from older servers. +- The WAL point-read benchmark gives future releases a cleaner signal, but it + does not yet have a historical baseline captured as a guardrail. diff --git a/docs/roadmap.md b/docs/roadmap.md index ea8c92ab..d47e660f 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -77,15 +77,19 @@ Advanced features and fundamental architecture enhancements. | **JSON path querying** | Query into JSON document fields in the Collection API (e.g., `$.address.city`) via `FindByPathAsync` / `FindByPathRangeAsync` | Done | | **Advanced collection storage path** | Binary direct-payload format with direct binary hydration, path-based field extraction, and richer expression/path indexes | Done | | **SQL batched row transport** | Internal row-batch transport serves as the batch-first SQL execution foundation across batch-capable result boundaries, scans, joins, and generic aggregates | Done | -| **Source-generated collection fast path** | In progress: `GetGeneratedCollectionAsync(...)`, generated field descriptors/index bindings, analyzer-packaged collection model/codecs, trim/NativeAOT smoke coverage, and a dedicated sample are now in place while broader package ergonomics and remaining generator coverage continue | In Progress | +| **Source-generated collection fast path** | Done for the current phase: opt-in generated collection models now provide `GetGeneratedCollectionAsync(...)`, generated field descriptors/index bindings, analyzer-packaged collection model/codecs, generated binary direct-payload encode/decode for supported document graphs, source-generated JSON fallback for unsupported shapes, trim/NativeAOT smoke coverage, and a dedicated sample | Done | +| **Generated collection package ergonomics** | Streamline NuGet/analyzer packaging, templates, onboarding docs, and generated-collection setup so consumers can adopt the opt-in path with less project wiring | Planned | +| **Broader generated collection model coverage** | Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes; unsupported shapes currently warn and fall back to source-generated JSON instead of binary direct payloads | Planned | | **Page-level compression** | Deep engine/page compression remains planned; application-level payload compression is available as a sample/SDK pattern without changing the storage format | Planned | | **At-rest encryption** | Encrypt database and WAL files with passphrase-based key management and explicit plaintext/encrypted migration/export paths; implementation must meet the database-encryption plan entry criteria before shipping | Research | -| **Advanced cost-based query optimizer** | In progress: phase-2 stats-guided costing is now in place through internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, and bounded DP reordering for small inner-join chains; adaptive re-optimization and public histogram inspection remain future work | In Progress | -| **Async I/O batching** | In progress: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, and reusable B-tree copy utilities now cover the main storage and maintenance write paths; remaining auditing is outside the WAL hot path | In Progress | +| **Advanced cost-based query optimizer** | Done for the current phase: `ANALYZE`-driven stats-guided costing now uses internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP reordering for small inner-join chains | Done | +| **Adaptive query re-optimization** | Done for the current phase: opt-in adaptive join execution can switch eligible index nested-loop joins to hash joins and can flip inner hash build sides at safe pre-emission boundaries when observed rows diverge materially from estimates. Broader `EXPLAIN ANALYZE`, runtime actual-row reporting, adaptive stats persistence, and arbitrary mid-plan reordering remain future work | Done | +| **Public planner histogram inspection** | Stable SQL-first diagnostics now expose `sys.planner_histograms`, `sys.planner_heavy_hitters`, `sys.planner_index_prefix_stats`, and `EXPLAIN ESTIMATE FOR ` while keeping raw histogram/prefix storage payloads internal | Done | +| **Async I/O batching** | Done for the current phase: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit now cover the main storage and maintenance write paths; future work is limited to specialized diagnostics or maintenance-path tuning | Done | | **Low-latency durable writes** | Done in `v2.9.0`: advisory planner-stat persistence can stay deferred without weakening committed-row durability, and `sys.table_stats.row_count_is_exact` now makes exact versus estimated row-count semantics explicit to planner and `COUNT(*)` fast paths | Done | | **Group commit / deferred WAL flush** | Done in `v2.9.0`: opt-in `UseDurableCommitBatchWindow(...)` batches durable WAL flushes across contending in-process transactions and remains an expert measure-first knob rather than default behavior | Done | | **Initial multi-writer support** | Explicit `WriteTransaction` conflict-detected retry flow, shared auto-commit non-insert isolation, and opt-in `ConcurrentWriteTransactions` for shared implicit inserts | Done | -| **Broader multi-writer insert optimization** | Improve hot insert fan-in, row-id reservation, and other high-contention patterns beyond the current initial multi-writer path | Research | +| **Broader multi-writer insert optimization** | Opt-in `ConcurrentWriteTransactions` now reserves shared row-id ranges and rebases hot right-edge insert pages against pending WAL images, improving concurrent one-row auto-ID and explicit-ID insert fan-in while keeping serialized inserts as the default | Done | | **API-level sharding** | Route API/daemon requests across multiple warm CSharpDB database files so independent tenants or shard keys can use separate WAL and commit paths, with v1 focused on single-shard writes and point reads | Research | | **Replication / change feed** | Stream committed changes for read replicas or event-driven architectures | Research | | **WebAssembly sandboxed UDFs** | Execute untrusted user-submitted functions in a WASM sandbox with resource limits (fuel, memory caps) via Wasmtime | Research | @@ -105,7 +109,7 @@ These are known simplifications in the current implementation: | **Schema** | No SQL `DEFAULT` column values or `CHECK` constraints yet. Foreign keys are currently v1 only: single-column, column-level `REFERENCES` with optional `ON DELETE CASCADE`; table-level/composite/deferred foreign keys and `ON UPDATE` actions are not implemented | | **Indexes** | Equality lookups support current `INTEGER`/`TEXT` indexes, but ordered range-scan pushdown is still limited to single-column `INTEGER` index paths | | **RowId** | Legacy table schemas without persisted high-water metadata may pay a one-time key scan on first insert | -| **Collections** | `FindByIndexAsync` supports declared field-equality lookups; `FindByPathAsync` and `FindByPathRangeAsync` support path-based queries on indexed paths; `FindAsync` remains a full scan for unindexed predicates | +| **Collections** | `FindByIndexAsync` supports declared field-equality lookups; `FindByPathAsync` and `FindByPathRangeAsync` support path-based queries on indexed paths; `FindAsync` remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads | | **Networking** | `CSharpDB.Daemon` now hosts both REST and gRPC from one process; named pipes remain reserved but are not implemented end to end today | | **Security** | Remote REST and daemon gRPC support opt-in API-key authentication, defaulting to `None` for backward compatibility. JWT, RBAC, mTLS helpers, TLS-specific configuration, and at-rest encryption are not implemented | | **Admin Forms** | The Forms designer/runtime supports the core generated-form and data-entry path plus initial trusted command-backed automation, including lifecycle events, command buttons, selected control events, and declarative action sequences, but still needs Access-parity work for responsive runtime rendering, complete inferred validation, richer form modes, broader built-in actions, additional events, advanced filtering/sorting, and broader controls | @@ -116,7 +120,7 @@ These are known simplifications in the current implementation: | **Storage** | No at-rest encryption for database/WAL files; on-disk storage is plaintext only | | **Storage** | Memory-mapped reads are opt-in and currently apply only to clean main-file pages; WAL-backed reads still rely on the WAL/cache path | | **Storage** | By default, durable auto-commit single-row writes still pay a physical WAL flush per commit; opt-in `UseDurableCommitBatchWindow(...)` can trade some commit latency for higher throughput across contending in-process writers, but default behavior remains per-commit durable | -| **Query** | Phase-2 cost-based planning is largely in place: `ANALYZE`, `sys.table_stats`, `sys.column_stats`, internal histograms/heavy hitters/prefix stats, and bounded small-chain join reordering now feed join/access-path costing; remaining work is adaptive re-optimization and public histogram/diagnostic surfacing rather than missing core stats-guided costing | +| **Query** | Phase-2 cost-based planning is in place: `ANALYZE`, `sys.table_stats`, `sys.column_stats`, public planner-stat diagnostics, histogram/heavy-hitter/prefix estimates, and bounded small-chain join reordering now feed join/access-path costing. Opt-in adaptive join re-optimization can react to stale-stat or parameter-sensitive join cardinality misses, while broader runtime actuals, `EXPLAIN ANALYZE`, and full mid-plan reordering remain future work | | **Query** | Internal row-batch transport is now the default scan-heavy execution foundation across batch-capable scans, joins, aggregates, and result boundaries; remaining work is broader kernel specialization and optional SIMD-style tuning rather than missing core batch coverage | --- @@ -139,6 +143,7 @@ Major features already implemented: - Ordered integer index range scans (`<`, `<=`, `>`, `>=`, `BETWEEN`) in the fast lookup path - `ANALYZE`, persisted `sys.table_stats` / `sys.column_stats`, and stale-aware column-stat refresh - Phase-2 cost-based query planning: statistics-guided access-path selection, join method choice, hash build-side choice, histogram/heavy-hitter/cardinality estimation, composite-prefix correlation modeling, and bounded small-chain inner-join reordering +- Opt-in adaptive join re-optimization behind `DatabaseOptions.AdaptiveQueryReoptimization` - SQL statement and SELECT plan caching - First-class `IDENTITY` / `AUTOINCREMENT` support for `INTEGER PRIMARY KEY` columns - Persisted table `NextRowId` high-water mark with compatibility fallback for legacy metadata @@ -170,7 +175,7 @@ Major features already implemented: - Binary direct-payload collection storage with direct hydration and field/path extraction - Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text - Collection path query APIs: `FindByPathAsync` and `FindByPathRangeAsync` -- Source-generated typed collection fast path foundations: generated collection models/codecs/field descriptors, trim-safe `GetGeneratedCollectionAsync(...)`, generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample +- Source-generated typed collection fast path: generated collection models/codecs/field descriptors, generated binary direct payloads for supported shapes, trim-safe `GetGeneratedCollectionAsync(...)`, generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample - Full-text search with tokenization, stemming, and relevance ranking - Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache - Client-wide `BackupAsync` / `RestoreAsync` across direct, HTTP, gRPC, CLI, and Admin diff --git a/samples/fulfillment-hub/schema.sql b/samples/fulfillment-hub/schema.sql index db946be8..c9d0b471 100644 --- a/samples/fulfillment-hub/schema.sql +++ b/samples/fulfillment-hub/schema.sql @@ -216,6 +216,7 @@ CREATE INDEX idx_inventory_positions_watch ON inventory_positions (warehouse_id, CREATE INDEX idx_purchase_orders_status_expected ON purchase_orders (status, expected_date); CREATE INDEX idx_purchase_order_lines_purchase_order ON purchase_order_lines (purchase_order_id, product_id); CREATE INDEX idx_orders_status_required ON orders (status, required_ship_date, priority_code); +CREATE INDEX idx_orders_customer_lookup ON orders (customer_id); CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date); CREATE INDEX idx_order_lines_product ON order_lines (product_id, order_id); CREATE INDEX idx_shipments_order_status ON shipments (order_id, status, shipped_date); diff --git a/src/CSharpDB.Admin/Components/Layout/NavMenu.razor b/src/CSharpDB.Admin/Components/Layout/NavMenu.razor index 6f4edd38..5ebae4d0 100644 --- a/src/CSharpDB.Admin/Components/Layout/NavMenu.razor +++ b/src/CSharpDB.Admin/Components/Layout/NavMenu.razor @@ -646,6 +646,9 @@ new SystemCatalogItem("sys.indexes", "indexes", "SELECT * FROM sys.indexes ORDER BY table_name, index_name, ordinal_position;"), new SystemCatalogItem("sys.table_stats", "table stats", "SELECT * FROM sys.table_stats ORDER BY table_name;"), new SystemCatalogItem("sys.column_stats", "column stats", "SELECT * FROM sys.column_stats ORDER BY table_name, ordinal_position;"), + new SystemCatalogItem("sys.planner_histograms", "planner histograms", "SELECT * FROM sys.planner_histograms ORDER BY table_name, ordinal_position, bucket_index;"), + new SystemCatalogItem("sys.planner_heavy_hitters", "planner heavy hitters", "SELECT * FROM sys.planner_heavy_hitters ORDER BY table_name, ordinal_position, row_count DESC;"), + new SystemCatalogItem("sys.planner_index_prefix_stats", "planner prefix stats", "SELECT * FROM sys.planner_index_prefix_stats ORDER BY table_name, index_name, prefix_length;"), new SystemCatalogItem("sys.views", "views", "SELECT * FROM sys.views ORDER BY view_name;"), new SystemCatalogItem("sys.triggers", "triggers", "SELECT * FROM sys.triggers ORDER BY trigger_name;"), new SystemCatalogItem("sys.objects", "all objects", "SELECT * FROM sys.objects ORDER BY object_type, object_name;"), diff --git a/src/CSharpDB.Admin/Components/Shared/DataGrid.razor b/src/CSharpDB.Admin/Components/Shared/DataGrid.razor index 5b39e3dc..2f65746b 100644 --- a/src/CSharpDB.Admin/Components/Shared/DataGrid.razor +++ b/src/CSharpDB.Admin/Components/Shared/DataGrid.razor @@ -9,180 +9,182 @@ @inject ModalService Modal @implements IAsyncDisposable -
- @if (_loading) - { -
-
-
- } - else if (_loadError is not null) - { -
- @_loadError -
- } - else if (Columns is null || Columns.Length == 0) - { -
- -

No data to display

-
- } - else - { - - - - - @for (int c = 0; c < Columns.Length; c++) - { - var colIdx = c; - var sortIndicator = _sortCol == colIdx ? (_sortAsc ? " \u2191" : " \u2193") : ""; - var isPrimaryKeyColumn = PrimaryKeyColumn is not null && Columns[colIdx] == PrimaryKeyColumn; - - } - - @if (ShowFilters) - { - - +
+
+ @if (_loading) + { +
+
+
+ } + else if (_loadError is not null) + { +
+ @_loadError +
+ } + else if (Columns is null || Columns.Length == 0) + { +
+ +

No data to display

+
+ } + else + { +
# - - @Columns[colIdx] - @if (ColumnTypes is not null && colIdx < ColumnTypes.Length) - { - @ColumnTypes[colIdx] - } - @if (isPrimaryKeyColumn) - { - PK - } - @sortIndicator -
+ + + @for (int c = 0; c < Columns.Length; c++) { var colIdx = c; - + var sortIndicator = _sortCol == colIdx ? (_sortAsc ? " \u2191" : " \u2193") : ""; + var isPrimaryKeyColumn = PrimaryKeyColumn is not null && Columns[colIdx] == PrimaryKeyColumn; + } - } - - - @if (_rows.Count == 0) - { - - - - } - else - { - @for (int r = 0; r < _rows.Count; r++) + @if (ShowFilters) { - var rowIdx = r; - var row = _rows[rowIdx]; - var rowClass = GetRowClass(row); - var isSelected = _selectedRows.Contains(row); - - + + @for (int c = 0; c < Columns.Length; c++) { var colIdx = c; - var isPk = PrimaryKeyColumn is not null && Columns[colIdx] == PrimaryKeyColumn; - var cellValue = row.CurrentValues[colIdx]; - var isEditing = _editingCell?.RowIndex == rowIdx && _editingCell?.ColIndex == colIdx; - - @if (isEditing) - { - - } - else - { - - } + } } - } - -
# -
- - -
-
+ + @Columns[colIdx] + @if (ColumnTypes is not null && colIdx < ColumnTypes.Length) + { + @ColumnTypes[colIdx] + } + @if (isPrimaryKeyColumn) + { + PK + } + @sortIndicator +
No rows found
- @((_page - 1) * _pageSize + rowIdx + 1) -
- - - @if (cellValue is null) - { - NULL - } - else if (cellValue is byte[] bytes) - { - [@bytes.Length bytes] - } - else if (IsStatusColumn(colIdx)) - { - @cellValue - } - else - { - @cellValue.ToString() - } - +
+ + +
+
- } -
+ + + @if (_rows.Count == 0) + { + + No rows found + + } + else + { + @for (int r = 0; r < _rows.Count; r++) + { + var rowIdx = r; + var row = _rows[rowIdx]; + var rowClass = GetRowClass(row); + var isSelected = _selectedRows.Contains(row); + + + @((_page - 1) * _pageSize + rowIdx + 1) + + @for (int c = 0; c < Columns.Length; c++) + { + var colIdx = c; + var isPk = PrimaryKeyColumn is not null && Columns[colIdx] == PrimaryKeyColumn; + var cellValue = row.CurrentValues[colIdx]; + var isEditing = _editingCell?.RowIndex == rowIdx && _editingCell?.ColIndex == colIdx; + + @if (isEditing) + { + + + + } + else + { + + @if (cellValue is null) + { + NULL + } + else if (cellValue is byte[] bytes) + { + [@bytes.Length bytes] + } + else if (IsStatusColumn(colIdx)) + { + @cellValue + } + else + { + @cellValue.ToString() + } + + } + } + + } + } + + + } + -@if (ShouldShowPaginationBar()) -{ -
-
- + @if (ShouldShowPaginationBar()) + { +
+
+ +
+
+ + + @GetPageInfoText() + + +
+
@GetRowCountText()
-
- - - @GetPageInfoText() - - -
-
@GetRowCountText()
-
-} + } - + +
@code { [Parameter] public string? TableName { get; set; } @@ -975,11 +977,10 @@ if (ViewName is not null) { string viewName = RequireIdentifier(ViewName); - if (!await TryLoadViewCursorPageAsync(viewName)) - await LoadUnknownTotalSourcePageAsync(viewName, captureColumnNames: true); + await DisposeViewCursorAsync(); + await LoadUnknownTotalSourcePageAsync(viewName, captureColumnNames: true); Columns ??= Array.Empty(); - ColumnTypes = null; PrimaryKeyColumn = null; return; } @@ -1020,6 +1021,8 @@ if (captureColumnNames && (Columns is null || Columns.Length == 0)) Columns = queryResult.ColumnNames ?? Array.Empty(); + if (captureColumnNames && (ColumnTypes is null || ColumnTypes.Length == 0)) + ColumnTypes = queryResult.ColumnTypes; var rows = (queryResult.Rows ?? []).ToList(); if (_page > 1 && rows.Count == 0) @@ -1119,7 +1122,7 @@ if (Columns is null || Columns.Length == 0) Columns = pageResult.ColumnNames ?? Array.Empty(); - ColumnTypes = null; + ColumnTypes = pageResult.ColumnTypes; PrimaryKeyColumn = null; var rows = (pageResult.Rows ?? []).ToList(); diff --git a/src/CSharpDB.Admin/Components/Tabs/QueryTab.razor b/src/CSharpDB.Admin/Components/Tabs/QueryTab.razor index d40ac96d..f7db3340 100644 --- a/src/CSharpDB.Admin/Components/Tabs/QueryTab.razor +++ b/src/CSharpDB.Admin/Components/Tabs/QueryTab.razor @@ -21,6 +21,12 @@ + @@ -119,7 +125,8 @@
Stats
- @if (_elapsed is not null) + @if (GetVisibleElapsed() is { } visibleElapsed) { - @_elapsed.Value.TotalMilliseconds.ToString("F1") ms + @visibleElapsed.TotalMilliseconds.ToString("F1") ms } @if (_loading) { @@ -255,9 +262,65 @@ } else { -
-

Plan

-

Planner output is not generated automatically from this panel. Use the stats shortcuts above for persisted table and column statistics before comparing plans.

+
+ @if (_loading) + { +
+
+
+ } + else if (_planError is not null) + { +
+ @_planError +
+ } + else if (_planColumns is not null && _planRows is not null) + { + + + + + @foreach (var col in _planColumns) + { + + } + + + + @for (int r = 0; r < _planRows.Count; r++) + { + + + @foreach (var val in _planRows[r]) + { + + } + + } + +
#@col
@(r + 1) + @if (val is null) + { + NULL + } + else if (val is byte[] bytes) + { + [@bytes.Length bytes] + } + else + { + @Convert.ToString(val, CultureInfo.InvariantCulture) + } +
+ } + else + { +
+ +

Run Estimate to inspect the current query plan inputs

+
+ }
}
@@ -311,10 +374,24 @@ private void OnDesignerCopySql(string sql) { _sqlText = sql; + ResetPlanState(); if (Tab is not null) Tab.SqlText = sql; SetMode(QueryMode.Sql); } + private Task OnSqlTextChanged(string sql) + { + if (string.Equals(_sqlText, sql, StringComparison.Ordinal)) + return Task.CompletedTask; + + _sqlText = sql; + ResetPlanState(); + if (Tab is not null) + Tab.SqlText = sql; + + return Task.CompletedTask; + } + private static readonly Regex s_execRegex = new( @"^\s*EXEC(?:UTE)?\s+(?[A-Za-z_][A-Za-z0-9_]*)(?[\s\S]*)$", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled); @@ -332,6 +409,9 @@ "sys.indexes", "sys.table_stats", "sys.column_stats", + "sys.planner_histograms", + "sys.planner_heavy_hitters", + "sys.planner_index_prefix_stats", "sys.views", "sys.triggers", "sys.objects", @@ -356,6 +436,10 @@ private string? _activeQuerySql; private int _queryReloadVersion; private QueryResultsStatus _queryResultsStatus = new(); + private string[]? _planColumns; + private List? _planRows; + private string? _planError; + private TimeSpan? _planElapsed; private int _editorHeightPx = DefaultEditorHeightPx; private bool _editorHeightInitialized; private ElementReference _queryTabRef; @@ -484,6 +568,7 @@ private void FormatSql() { _sqlText = SqlFormatter.Format(_sqlText); + ResetPlanState(); } private async Task RunAnalyzeAsync() @@ -492,6 +577,52 @@ await RunQuery(); } + private async Task RunPlanEstimateAsync() + { + if (string.IsNullOrWhiteSpace(_sqlText)) + { + Toast.Warning("Enter a SELECT query before estimating a plan."); + return; + } + + _loading = true; + _resultView = QueryResultView.Plan; + ResetPlanState(); + StateHasChanged(); + + try + { + string explainSql = BuildExplainEstimateSql(_sqlText); + + var result = await DbClient.ExecuteSqlAsync(explainSql); + _planElapsed = result.Elapsed; + + if (result.Error is not null) + { + _planError = result.Error; + } + else if (result.IsQuery) + { + _planColumns = result.ColumnNames; + _planRows = result.Rows; + } + else + { + _planError = "Planner estimate did not return a diagnostic rowset."; + } + } + catch (Exception ex) + { + _planError = ex.Message; + } + finally + { + _loading = false; + if (Tab is not null) Tab.SqlText = _sqlText; + StateHasChanged(); + } + } + private void LoadTableStatsQuery() => LoadSql(TableStatsSql); private void LoadColumnStatsQuery() => LoadSql(ColumnStatsSql); @@ -721,6 +852,12 @@ return _resultRows is { Count: > 0 } rows ? rows.Count.ToString(CultureInfo.InvariantCulture) : string.Empty; } + private string GetPlanBadge() + => _planRows is { Count: > 0 } rows ? rows.Count.ToString(CultureInfo.InvariantCulture) : string.Empty; + + private TimeSpan? GetVisibleElapsed() + => _resultView == QueryResultView.Plan ? _planElapsed : _elapsed; + private string GetMessageText() { if (_error is not null) @@ -790,10 +927,19 @@ _nonQueryMessage = null; _elapsed = null; _queryResultsStatus = new QueryResultsStatus(); + ResetPlanState(); if (resetActiveQuery) _activeQuerySql = null; } + private void ResetPlanState() + { + _planColumns = null; + _planRows = null; + _planError = null; + _planElapsed = null; + } + private async Task StartEditorResizeAsync(MouseEventArgs e) { if (_dotNetRef is null) @@ -905,6 +1051,40 @@ } } + private static string BuildExplainEstimateSql(string sql) + { + string trimmedSql = TrimTrailingSemicolons(sql.Trim()); + if (trimmedSql.Length == 0) + throw new InvalidOperationException("Plan/Estimate requires a SELECT, WITH, or compound SELECT query."); + + Statement statement; + try + { + statement = Parser.Parse(trimmedSql); + } + catch (Exception ex) + { + throw new InvalidOperationException( + $"Plan/Estimate requires a single SELECT, WITH, or compound SELECT query: {ex.Message}", + ex); + } + + return statement switch + { + ExplainEstimateStatement => trimmedSql, + QueryStatement or WithStatement => $"EXPLAIN ESTIMATE FOR {trimmedSql}", + _ => throw new InvalidOperationException("Plan/Estimate supports SELECT, WITH, and compound SELECT queries only."), + }; + } + + private static string TrimTrailingSemicolons(string sql) + { + while (sql.EndsWith(';')) + sql = sql[..^1].TrimEnd(); + + return sql; + } + private static bool TryParseExecuteCommand( string sql, out string? procedureName, diff --git a/src/CSharpDB.Admin/wwwroot/css/app.css b/src/CSharpDB.Admin/wwwroot/css/app.css index 5075c46e..f25bf99e 100644 --- a/src/CSharpDB.Admin/wwwroot/css/app.css +++ b/src/CSharpDB.Admin/wwwroot/css/app.css @@ -778,10 +778,21 @@ body { .grid-container { flex: 1; + min-width: 0; + min-height: 0; overflow: auto; position: relative; } +.data-grid-shell { + display: flex; + flex: 1; + flex-direction: column; + min-width: 0; + min-height: 0; + overflow: hidden; +} + .data-grid { width: 100%; border-collapse: collapse; @@ -1537,7 +1548,8 @@ body { overflow: hidden; } -.query-results-panel > .grid-container { +.query-results-panel > .grid-container, +.query-results-panel > .data-grid-shell { flex: 1; min-height: 0; } @@ -1670,6 +1682,7 @@ body { display: flex; flex-direction: column; height: 100%; + min-height: 0; } .storage-tab { @@ -4738,16 +4751,24 @@ body { } .query-result-body { + display: flex; + flex-direction: column; flex: 1; min-height: 0; - overflow: auto; + overflow: hidden; } -.query-result-body > .grid-container { - height: 100%; +.query-result-body > .grid-container, +.query-result-body > .data-grid-shell { + flex: 1; + min-width: 0; + min-height: 0; } .query-detail-pane { + flex: 1; + min-height: 0; + overflow: auto; padding: 18px; color: var(--text-secondary); } @@ -4769,7 +4790,10 @@ body { .query-stat-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(160px, 1fr)); + flex: 1; gap: 10px; + min-height: 0; + overflow: auto; padding: 14px; } diff --git a/src/CSharpDB.Api/CSharpDB.Api.csproj b/src/CSharpDB.Api/CSharpDB.Api.csproj index 708769ca..054d9e35 100644 --- a/src/CSharpDB.Api/CSharpDB.Api.csproj +++ b/src/CSharpDB.Api/CSharpDB.Api.csproj @@ -10,7 +10,7 @@ - + diff --git a/src/CSharpDB.Api/Dtos/Responses.cs b/src/CSharpDB.Api/Dtos/Responses.cs index 2d6ce584..5920e983 100644 --- a/src/CSharpDB.Api/Dtos/Responses.cs +++ b/src/CSharpDB.Api/Dtos/Responses.cs @@ -49,6 +49,7 @@ public sealed record TriggerResponse(string TriggerName, string TableName, strin public sealed record SqlResultResponse( bool IsQuery, string[]? ColumnNames, + string[]? ColumnTypes, IReadOnlyList>? Rows, int RowsAffected, string? Error, diff --git a/src/CSharpDB.Api/Endpoints/SqlEndpoints.cs b/src/CSharpDB.Api/Endpoints/SqlEndpoints.cs index 7f405773..afea5022 100644 --- a/src/CSharpDB.Api/Endpoints/SqlEndpoints.cs +++ b/src/CSharpDB.Api/Endpoints/SqlEndpoints.cs @@ -32,6 +32,7 @@ internal static SqlResultResponse ToResponse(SqlExecutionResult result) return new SqlResultResponse( result.IsQuery, result.ColumnNames, + result.ColumnTypes, namedRows, result.RowsAffected, result.Error, diff --git a/src/CSharpDB.Client/Internal/EngineTransportClient.SqlAndDiagnostics.cs b/src/CSharpDB.Client/Internal/EngineTransportClient.SqlAndDiagnostics.cs index dd64de28..09bbac87 100644 --- a/src/CSharpDB.Client/Internal/EngineTransportClient.SqlAndDiagnostics.cs +++ b/src/CSharpDB.Client/Internal/EngineTransportClient.SqlAndDiagnostics.cs @@ -205,6 +205,7 @@ private async Task ExecuteSqlCoreAsync(string sql, Cancellat { IsQuery = true, ColumnNames = lastResult.ColumnNames, + ColumnTypes = lastResult.ColumnTypes, Rows = lastResult.Rows, RowsAffected = lastResult.RowsAffected, Elapsed = stopwatch.Elapsed, diff --git a/src/CSharpDB.Client/Internal/EngineTransportClient.cs b/src/CSharpDB.Client/Internal/EngineTransportClient.cs index 213a90e2..dfd044bf 100644 --- a/src/CSharpDB.Client/Internal/EngineTransportClient.cs +++ b/src/CSharpDB.Client/Internal/EngineTransportClient.cs @@ -850,6 +850,7 @@ private static async Task ExecuteQueryAsync(Database db, str { IsQuery = result.IsQuery, ColumnNames = result.IsQuery ? result.Schema.Select(column => column.Name).ToArray() : null, + ColumnTypes = result.IsQuery ? result.Schema.Select(column => column.Type.ToString().ToUpperInvariant()).ToArray() : null, Rows = result.IsQuery ? rows.Select(ToObjects).ToList() : null, RowsAffected = result.IsQuery ? rows.Count : result.RowsAffected, }; diff --git a/src/CSharpDB.Client/Internal/HttpTransportClient.cs b/src/CSharpDB.Client/Internal/HttpTransportClient.cs index 47d7cf20..65c108bd 100644 --- a/src/CSharpDB.Client/Internal/HttpTransportClient.cs +++ b/src/CSharpDB.Client/Internal/HttpTransportClient.cs @@ -912,6 +912,7 @@ private static SqlExecutionResult MapSqlResult(ApiSqlResultResponse payload) { IsQuery = payload.IsQuery, ColumnNames = payload.ColumnNames, + ColumnTypes = payload.ColumnTypes, Rows = payload.ColumnNames is null ? null : MapRows(payload.ColumnNames, payload.Rows), RowsAffected = payload.RowsAffected, Error = payload.Error, @@ -1018,7 +1019,7 @@ private sealed record ApiCollectionCountResponse(string CollectionName, int Coun private sealed record ApiIndexResponse(string IndexName, string TableName, IReadOnlyList Columns, bool IsUnique, IReadOnlyList ColumnCollations); private sealed record ApiViewResponse(string ViewName, string Sql); private sealed record ApiTriggerResponse(string TriggerName, string TableName, string Timing, string Event, string BodySql); - private sealed record ApiSqlResultResponse(bool IsQuery, string[]? ColumnNames, IReadOnlyList>? Rows, int RowsAffected, string? Error, double ElapsedMs); + private sealed record ApiSqlResultResponse(bool IsQuery, string[]? ColumnNames, string[]? ColumnTypes, IReadOnlyList>? Rows, int RowsAffected, string? Error, double ElapsedMs); private sealed record ApiProcedureDetailResponse(string Name, string BodySql, IReadOnlyList Parameters, string? Description, bool IsEnabled, DateTime CreatedUtc, DateTime UpdatedUtc); private sealed record ApiProcedureParameterResponse(string Name, string Type, bool Required, object? Default, string? Description); private sealed record ApiProcedureExecutionResponse(string ProcedureName, bool Succeeded, IReadOnlyList Statements, string? Error, int? FailedStatementIndex, double ElapsedMs); diff --git a/src/CSharpDB.Client/Models/DataModels.cs b/src/CSharpDB.Client/Models/DataModels.cs index c403b789..c30d20e7 100644 --- a/src/CSharpDB.Client/Models/DataModels.cs +++ b/src/CSharpDB.Client/Models/DataModels.cs @@ -26,6 +26,7 @@ public sealed class SqlExecutionResult { public bool IsQuery { get; init; } public string[]? ColumnNames { get; init; } + public string[]? ColumnTypes { get; init; } public List? Rows { get; init; } public int RowsAffected { get; init; } public string? Error { get; init; } diff --git a/src/CSharpDB.Data/CSharpDbConnection.cs b/src/CSharpDB.Data/CSharpDbConnection.cs index ccaf69bc..4b777323 100644 --- a/src/CSharpDB.Data/CSharpDbConnection.cs +++ b/src/CSharpDB.Data/CSharpDbConnection.cs @@ -317,6 +317,12 @@ private async ValueTask OpenConfiguredSessionAsync( if (!string.IsNullOrWhiteSpace(builder.Endpoint)) { + if (builder.AdaptiveQueryReoptimization) + { + throw new InvalidOperationException( + "Adaptive Query Reoptimization is only supported for direct embedded connections; enable it on the remote host instead."); + } + if (hasEmbeddedTuning) { throw new InvalidOperationException( @@ -484,6 +490,7 @@ private static PoolKey CreatePoolKey( maxPoolSize, configuration.EffectiveOpenMode, configuration.EffectiveStoragePreset, + configuration.EffectiveAdaptiveQueryReoptimization, configuration.ExplicitDirectDatabaseOptions, configuration.ExplicitHybridDatabaseOptions); } diff --git a/src/CSharpDB.Data/CSharpDbConnectionPool.cs b/src/CSharpDB.Data/CSharpDbConnectionPool.cs index 89d15531..bc40f89f 100644 --- a/src/CSharpDB.Data/CSharpDbConnectionPool.cs +++ b/src/CSharpDB.Data/CSharpDbConnectionPool.cs @@ -66,6 +66,7 @@ internal readonly record struct PoolKey( int MaxPoolSize, CSharpDbEmbeddedOpenMode EffectiveOpenMode, CSharpDbStoragePreset? EffectiveStoragePreset, + bool EffectiveAdaptiveQueryReoptimization, object? ExplicitDirectDatabaseOptions, object? ExplicitHybridDatabaseOptions); diff --git a/src/CSharpDB.Data/CSharpDbConnectionStringBuilder.cs b/src/CSharpDB.Data/CSharpDbConnectionStringBuilder.cs index 95ee7e99..901cd4c9 100644 --- a/src/CSharpDB.Data/CSharpDbConnectionStringBuilder.cs +++ b/src/CSharpDB.Data/CSharpDbConnectionStringBuilder.cs @@ -13,6 +13,7 @@ public sealed class CSharpDbConnectionStringBuilder : DbConnectionStringBuilder private const string MaxPoolSizeKey = "Max Pool Size"; private const string StoragePresetKey = "Storage Preset"; private const string EmbeddedOpenModeKey = "Embedded Open Mode"; + private const string AdaptiveQueryReoptimizationKey = "Adaptive Query Reoptimization"; internal const bool DefaultPooling = false; internal const int DefaultMaxPoolSize = 32; @@ -70,6 +71,12 @@ public CSharpDbEmbeddedOpenMode? EmbeddedOpenMode set => SetNullableEnum(EmbeddedOpenModeKey, value); } + public bool AdaptiveQueryReoptimization + { + get => GetBoolean(AdaptiveQueryReoptimizationKey, defaultValue: false); + set => this[AdaptiveQueryReoptimizationKey] = value; + } + public CSharpDbConnectionStringBuilder() { } public CSharpDbConnectionStringBuilder(string connectionString) diff --git a/src/CSharpDB.Data/CSharpDbEmbeddedConfigurationResolver.cs b/src/CSharpDB.Data/CSharpDbEmbeddedConfigurationResolver.cs index ea6e62df..4cfdeae1 100644 --- a/src/CSharpDB.Data/CSharpDbEmbeddedConfigurationResolver.cs +++ b/src/CSharpDB.Data/CSharpDbEmbeddedConfigurationResolver.cs @@ -14,7 +14,8 @@ internal static bool HasRequestedTuning( return directDatabaseOptions is not null || hybridDatabaseOptions is not null || builder.StoragePreset is not null - || builder.EmbeddedOpenMode is not null; + || builder.EmbeddedOpenMode is not null + || builder.AdaptiveQueryReoptimization; } internal static ResolvedEmbeddedConfiguration Resolve( @@ -30,7 +31,7 @@ internal static ResolvedEmbeddedConfiguration Resolve( CSharpDbEmbeddedOpenMode? requestedOpenMode = embeddedOpenModeOverride ?? builder.EmbeddedOpenMode; DatabaseOptions effectiveDirectDatabaseOptions = directDatabaseOptions - ?? CreateDirectDatabaseOptions(requestedStoragePreset); + ?? CreateDirectDatabaseOptions(requestedStoragePreset, builder.AdaptiveQueryReoptimization); HybridDatabaseOptions? effectiveHybridDatabaseOptions = hybridDatabaseOptions ?? CreateHybridDatabaseOptions(requestedOpenMode); @@ -47,7 +48,9 @@ effectiveHybridDatabaseOptions is null directDatabaseOptions is not null || hybridDatabaseOptions is not null || requestedStoragePreset is not null - || requestedOpenMode is not null); + || requestedOpenMode is not null + || builder.AdaptiveQueryReoptimization, + builder.AdaptiveQueryReoptimization && directDatabaseOptions is null); } internal static CSharpDbEmbeddedOpenMode GetEffectiveOpenMode(HybridDatabaseOptions hybridDatabaseOptions) @@ -63,12 +66,18 @@ internal static CSharpDbEmbeddedOpenMode GetEffectiveOpenMode(HybridDatabaseOpti }; } - private static DatabaseOptions CreateDirectDatabaseOptions(CSharpDbStoragePreset? storagePreset) + private static DatabaseOptions CreateDirectDatabaseOptions( + CSharpDbStoragePreset? storagePreset, + bool adaptiveQueryReoptimization) { + DatabaseOptions options = adaptiveQueryReoptimization + ? new DatabaseOptions().EnableAdaptiveQueryReoptimization() + : new DatabaseOptions(); + if (storagePreset is null) - return new DatabaseOptions(); + return options; - return new DatabaseOptions().ConfigureStorageEngine(builder => + return options.ConfigureStorageEngine(builder => { switch (storagePreset.Value) { @@ -118,4 +127,5 @@ internal readonly record struct ResolvedEmbeddedConfiguration( CSharpDbStoragePreset? EffectiveStoragePreset, DatabaseOptions? ExplicitDirectDatabaseOptions, HybridDatabaseOptions? ExplicitHybridDatabaseOptions, - bool HasRequestedTuning); + bool HasRequestedTuning, + bool EffectiveAdaptiveQueryReoptimization); diff --git a/src/CSharpDB.Data/PreparedStatementTemplate.cs b/src/CSharpDB.Data/PreparedStatementTemplate.cs index feec0362..d693dd81 100644 --- a/src/CSharpDB.Data/PreparedStatementTemplate.cs +++ b/src/CSharpDB.Data/PreparedStatementTemplate.cs @@ -229,6 +229,23 @@ private static Statement BindStatement(Statement statement, CSharpDbParameterCol }; } + case ExplainEstimateStatement explain: + { + Statement target = explain.Target switch + { + QueryStatement query => BindQueryStatement(query, parameters, out bool queryChanged) is { } boundQuery && queryChanged + ? boundQuery + : query, + WithStatement with => BindStatement(with, parameters), + _ => explain.Target, + }; + + if (ReferenceEquals(target, explain.Target)) + return explain; + + return new ExplainEstimateStatement { Target = target }; + } + case CreateTriggerStatement trigger: { Expression? whenCondition = BindOptionalExpression(trigger.WhenCondition, parameters, out bool whenChanged); diff --git a/src/CSharpDB.Data/SharedMemoryDatabaseHost.cs b/src/CSharpDB.Data/SharedMemoryDatabaseHost.cs index ede4670c..79d1c7d4 100644 --- a/src/CSharpDB.Data/SharedMemoryDatabaseHost.cs +++ b/src/CSharpDB.Data/SharedMemoryDatabaseHost.cs @@ -524,7 +524,7 @@ private void ThrowIfBusyForIntrospection(long sessionId) } private static bool IsReadOnly(Statement statement) - => statement is QueryStatement or WithStatement; + => statement is QueryStatement or WithStatement or ExplainEstimateStatement; private static async ValueTask DetachQueryResultAsync(QueryResult query, CancellationToken cancellationToken) { diff --git a/src/CSharpDB.Engine/Database.cs b/src/CSharpDB.Engine/Database.cs index e357faca..b876b727 100644 --- a/src/CSharpDB.Engine/Database.cs +++ b/src/CSharpDB.Engine/Database.cs @@ -14,6 +14,21 @@ namespace CSharpDB.Engine; +internal readonly record struct RowIdReservationDiagnosticsSnapshot( + long ReservationCount, + long ReservedRowIdCount); + +internal readonly record struct AdaptiveQueryReoptimizationDiagnosticsSnapshot( + long EligibleQueryCount, + long AttemptCount, + long SuccessfulSwitchCount, + long RejectedSwitchCount, + long DivergenceEventCount, + long BufferedRowCount, + long MaxBufferedFallbackCount, + long UnsupportedFallbackCount, + long ReoptimizationLimitFallbackCount); + /// /// Top-level entry point for the CSharpDB embedded database engine. /// @@ -45,6 +60,8 @@ public sealed class Database : IAsyncDisposable private readonly object _sharedNextRowIdGate = new(); private readonly SemaphoreSlim _writeOperationGate = new(1, 1); private readonly SemaphoreSlim _sharedStateGate = new(1, 1); + private long _rowIdReservationCount; + private long _rowIdReservedRowCount; private long _observedSchemaVersion; private ImplicitInsertExecutionMode _implicitInsertExecutionMode; private bool _inTransaction; @@ -82,6 +99,35 @@ internal CommitPathDiagnosticsSnapshot GetCommitPathDiagnosticsSnapshot() => internal void ResetCommitPathDiagnostics() => _pager.ResetCommitPathDiagnostics(); + internal RowIdReservationDiagnosticsSnapshot GetRowIdReservationDiagnosticsSnapshot() => + new( + Interlocked.Read(ref _rowIdReservationCount), + Interlocked.Read(ref _rowIdReservedRowCount)); + + internal void ResetRowIdReservationDiagnostics() + { + Interlocked.Exchange(ref _rowIdReservationCount, 0); + Interlocked.Exchange(ref _rowIdReservedRowCount, 0); + } + + internal AdaptiveQueryReoptimizationDiagnosticsSnapshot GetAdaptiveQueryReoptimizationDiagnosticsSnapshot() + { + var snapshot = _planner.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + return new AdaptiveQueryReoptimizationDiagnosticsSnapshot( + snapshot.EligibleQueryCount, + snapshot.AttemptCount, + snapshot.SuccessfulSwitchCount, + snapshot.RejectedSwitchCount, + snapshot.DivergenceEventCount, + snapshot.BufferedRowCount, + snapshot.MaxBufferedFallbackCount, + snapshot.UnsupportedFallbackCount, + snapshot.ReoptimizationLimitFallbackCount); + } + + internal void ResetAdaptiveQueryReoptimizationDiagnostics() => + _planner.ResetAdaptiveQueryReoptimizationDiagnostics(); + private Database( Pager pager, SchemaCatalog catalog, @@ -91,6 +137,7 @@ private Database( ICatalogStore catalogStore, AdvisoryStatisticsPersistenceMode advisoryStatisticsPersistenceMode, ImplicitInsertExecutionMode implicitInsertExecutionMode = ImplicitInsertExecutionMode.Serialized, + AdaptiveQueryReoptimizationOptions? adaptiveQueryReoptimization = null, DbFunctionRegistry? functions = null, HybridDatabasePersistenceCoordinator? hybridPersistenceCoordinator = null) { @@ -108,10 +155,14 @@ private Database( pager, catalog, _recordSerializer, + tableRowCountProvider: null, nextRowIdHintProvider: TryGetSharedNextRowIdHint, - nextRowIdReservationProvider: ReserveSharedNextRowId, + nextRowIdReservationProvider: null, + nextRowIdRangeReservationProvider: ReserveSharedNextRowIdRange, nextRowIdObservationProvider: ObserveSharedNextRowId, - functions: _functions); + useTransientNextRowIdHints: false, + functions: _functions, + adaptiveQueryReoptimization: adaptiveQueryReoptimization); _statementCache = new StatementCache(DefaultStatementCacheCapacity); _observedSchemaVersion = catalog.SchemaVersion; RefreshSharedNextRowIdHintsFromCatalog(); @@ -140,11 +191,14 @@ public async ValueTask BeginWriteTransactionAsync(Cancellation _pager, transactionCatalog, _recordSerializer, + tableRowCountProvider: null, nextRowIdHintProvider: TryGetSharedNextRowIdHint, - nextRowIdReservationProvider: ReserveSharedNextRowId, + nextRowIdReservationProvider: null, + nextRowIdRangeReservationProvider: ReserveSharedNextRowIdRange, nextRowIdObservationProvider: ObserveSharedNextRowId, useTransientNextRowIdHints: true, - functions: _functions) + functions: _functions, + adaptiveQueryReoptimization: _planner.AdaptiveQueryReoptimization) { PreferSyncPointLookups = PreferSyncPointLookups, }; @@ -221,8 +275,17 @@ private void ApplyCommittedNextRowIdHints(IReadOnlyCollection 0 ? minimumNextRowId : 1; lock (_sharedNextRowIdGate) @@ -231,8 +294,11 @@ private long ReserveSharedNextRowId(string tableName, long minimumNextRowId) ? Math.Max(existing, normalizedMinimum) : normalizedMinimum; - _sharedNextRowIdHints[tableName] = checked(currentNextRowId + 1); - return currentNextRowId; + long endExclusive = checked(currentNextRowId + reservationCount); + _sharedNextRowIdHints[tableName] = endExclusive; + Interlocked.Increment(ref _rowIdReservationCount); + Interlocked.Add(ref _rowIdReservedRowCount, reservationCount); + return (currentNextRowId, endExclusive); } } @@ -468,6 +534,7 @@ public static async ValueTask OpenInMemoryAsync( context.CatalogStore, context.AdvisoryStatisticsPersistenceMode, options.ImplicitInsertExecutionMode, + options.AdaptiveQueryReoptimization, options.Functions); } @@ -531,6 +598,7 @@ public static async ValueTask OpenHybridAsync( snapshotContext.CatalogStore, snapshotContext.AdvisoryStatisticsPersistenceMode, options.ImplicitInsertExecutionMode, + options.AdaptiveQueryReoptimization, options.Functions, new HybridDatabasePersistenceCoordinator(fullPath, hybridOptions.PersistenceTriggers)); return snapshotDatabase; @@ -546,6 +614,7 @@ public static async ValueTask OpenHybridAsync( context.CatalogStore, context.AdvisoryStatisticsPersistenceMode, options.ImplicitInsertExecutionMode, + options.AdaptiveQueryReoptimization, options.Functions); try { @@ -605,6 +674,7 @@ public static async ValueTask LoadIntoMemoryAsync( context.CatalogStore, context.AdvisoryStatisticsPersistenceMode, options.ImplicitInsertExecutionMode, + options.AdaptiveQueryReoptimization, options.Functions); } @@ -628,6 +698,7 @@ public static async ValueTask OpenAsync( context.CatalogStore, context.AdvisoryStatisticsPersistenceMode, options.ImplicitInsertExecutionMode, + options.AdaptiveQueryReoptimization, options.Functions); } @@ -1025,6 +1096,7 @@ public ReaderSession CreateReaderSession() snapshot, _statementCache, _functions, + _planner.AdaptiveQueryReoptimization, snapshotRowCounts); } @@ -1930,6 +2002,7 @@ public sealed class ReaderSession : IDisposable private readonly StatementCache _statementCache; private readonly WalSnapshot _snapshot; private readonly IReadOnlyDictionary _snapshotRowCounts; + private readonly AdaptiveQueryReoptimizationOptions _adaptiveQueryReoptimization; private Pager? _snapshotPager; private QueryPlanner? _planner; private string? _lastSql; @@ -1944,6 +2017,7 @@ internal ReaderSession( WalSnapshot snapshot, StatementCache statementCache, DbFunctionRegistry functions, + AdaptiveQueryReoptimizationOptions adaptiveQueryReoptimization, IReadOnlyDictionary snapshotRowCounts) { _pager = pager; @@ -1956,6 +2030,7 @@ internal ReaderSession( _releaseActiveQueryCallback = ReleaseActiveQueryAsync; _statementCache = statementCache; _snapshot = snapshot; + _adaptiveQueryReoptimization = adaptiveQueryReoptimization; _snapshotRowCounts = snapshotRowCounts; } @@ -2027,7 +2102,12 @@ public ValueTask ExecuteReadAsync(Statement stmt, CancellationToken } } - _planner ??= new QueryPlanner(GetOrCreateSnapshotPager(), _catalog, _recordSerializer, functions: _functions); + _planner ??= new QueryPlanner( + GetOrCreateSnapshotPager(), + _catalog, + _recordSerializer, + functions: _functions, + adaptiveQueryReoptimization: _adaptiveQueryReoptimization); ValueTask plannerTask = _planner.ExecuteAsync(stmt, ct); if (plannerTask.IsCompletedSuccessfully) { @@ -2075,7 +2155,12 @@ private async ValueTask CompleteReadWithPrimaryKeyFastPathAsync( return fastLookupResult; } - _planner ??= new QueryPlanner(GetOrCreateSnapshotPager(), _catalog, _recordSerializer, functions: _functions); + _planner ??= new QueryPlanner( + GetOrCreateSnapshotPager(), + _catalog, + _recordSerializer, + functions: _functions, + adaptiveQueryReoptimization: _adaptiveQueryReoptimization); QueryResult plannerResult = await _planner.ExecuteAsync(stmt, ct); plannerResult.SetDisposeCallback(_releaseActiveQueryCallback); return plannerResult; @@ -2367,7 +2452,13 @@ private static bool IsSystemCatalogTable(string tableName) => string.Equals(tableName, "sys.table_stats", StringComparison.OrdinalIgnoreCase) || string.Equals(tableName, "sys_table_stats", StringComparison.OrdinalIgnoreCase) || string.Equals(tableName, "sys.column_stats", StringComparison.OrdinalIgnoreCase) || - string.Equals(tableName, "sys_column_stats", StringComparison.OrdinalIgnoreCase); + string.Equals(tableName, "sys_column_stats", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys.planner_histograms", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_histograms", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys.planner_heavy_hitters", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_heavy_hitters", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys.planner_index_prefix_stats", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_index_prefix_stats", StringComparison.OrdinalIgnoreCase); } /// diff --git a/src/CSharpDB.Engine/DatabaseOptions.cs b/src/CSharpDB.Engine/DatabaseOptions.cs index 36304f59..cdaa4258 100644 --- a/src/CSharpDB.Engine/DatabaseOptions.cs +++ b/src/CSharpDB.Engine/DatabaseOptions.cs @@ -18,6 +18,11 @@ public sealed class DatabaseOptions /// public ImplicitInsertExecutionMode ImplicitInsertExecutionMode { get; init; } = ImplicitInsertExecutionMode.Serialized; + /// + /// Opt-in adaptive join re-optimization for SELECT queries. Disabled by default. + /// + public AdaptiveQueryReoptimizationOptions AdaptiveQueryReoptimization { get; init; } = new(); + /// /// Trusted in-process scalar functions available to SQL and embedded expression surfaces. /// diff --git a/src/CSharpDB.Engine/DatabaseOptionsExtensions.cs b/src/CSharpDB.Engine/DatabaseOptionsExtensions.cs index 11988a5e..f57b7ade 100644 --- a/src/CSharpDB.Engine/DatabaseOptionsExtensions.cs +++ b/src/CSharpDB.Engine/DatabaseOptionsExtensions.cs @@ -17,6 +17,7 @@ public static DatabaseOptions ConfigureStorageEngine( return new DatabaseOptions { + AdaptiveQueryReoptimization = options.AdaptiveQueryReoptimization, Functions = options.Functions, ImplicitInsertExecutionMode = options.ImplicitInsertExecutionMode, StorageEngineFactory = options.StorageEngineFactory, @@ -36,10 +37,87 @@ public static DatabaseOptions ConfigureFunctions( return new DatabaseOptions { + AdaptiveQueryReoptimization = options.AdaptiveQueryReoptimization, Functions = DbFunctionRegistry.Create(configure), ImplicitInsertExecutionMode = options.ImplicitInsertExecutionMode, StorageEngineFactory = options.StorageEngineFactory, StorageEngineOptions = options.StorageEngineOptions, }; } + + /// + /// Enables opt-in adaptive join re-optimization and returns a new DatabaseOptions instance. + /// + public static DatabaseOptions EnableAdaptiveQueryReoptimization( + this DatabaseOptions options, + Action? configure = null) + { + ArgumentNullException.ThrowIfNull(options); + + var builder = new AdaptiveQueryReoptimizationOptionsBuilder(); + configure?.Invoke(builder); + + return new DatabaseOptions + { + AdaptiveQueryReoptimization = builder.Build(enabled: true), + Functions = options.Functions, + ImplicitInsertExecutionMode = options.ImplicitInsertExecutionMode, + StorageEngineFactory = options.StorageEngineFactory, + StorageEngineOptions = options.StorageEngineOptions, + }; + } +} + +public sealed class AdaptiveQueryReoptimizationOptionsBuilder +{ + private int _divergenceFactor = 8; + private int _minimumObservedRows = 4096; + private int _maxBufferedRows = 65536; + private int _maxReoptimizationsPerQuery = 1; + + public AdaptiveQueryReoptimizationOptionsBuilder WithDivergenceFactor(int value) + { + if (value < 2) + throw new ArgumentOutOfRangeException(nameof(value), "Divergence factor must be at least 2."); + + _divergenceFactor = value; + return this; + } + + public AdaptiveQueryReoptimizationOptionsBuilder WithMinimumObservedRows(int value) + { + if (value < 1) + throw new ArgumentOutOfRangeException(nameof(value), "Minimum observed rows must be greater than 0."); + + _minimumObservedRows = value; + return this; + } + + public AdaptiveQueryReoptimizationOptionsBuilder WithMaxBufferedRows(int value) + { + if (value < 1) + throw new ArgumentOutOfRangeException(nameof(value), "Max buffered rows must be greater than 0."); + + _maxBufferedRows = value; + return this; + } + + public AdaptiveQueryReoptimizationOptionsBuilder WithMaxReoptimizationsPerQuery(int value) + { + if (value < 0) + throw new ArgumentOutOfRangeException(nameof(value), "Max reoptimizations per query cannot be negative."); + + _maxReoptimizationsPerQuery = value; + return this; + } + + internal AdaptiveQueryReoptimizationOptions Build(bool enabled) + => new() + { + Enabled = enabled, + DivergenceFactor = _divergenceFactor, + MinimumObservedRows = _minimumObservedRows, + MaxBufferedRows = _maxBufferedRows, + MaxReoptimizationsPerQuery = _maxReoptimizationsPerQuery, + }; } diff --git a/src/CSharpDB.Engine/README.md b/src/CSharpDB.Engine/README.md index 2a0ce3f3..d8fdb474 100644 --- a/src/CSharpDB.Engine/README.md +++ b/src/CSharpDB.Engine/README.md @@ -13,7 +13,7 @@ Lightweight embedded SQL database engine for .NET with single-file storage, WAL ## Features -- **SQL engine**: DDL, DML, JOINs, aggregates, GROUP BY, HAVING, CTEs, `UNION` / `INTERSECT` / `EXCEPT`, scalar subqueries, `IN (SELECT ...)`, `EXISTS (SELECT ...)`, views, triggers, indexes, `ANALYZE`, and `sys.*` catalogs including `sys.table_stats` and `sys.column_stats` +- **SQL engine**: DDL, DML, JOINs, aggregates, GROUP BY, HAVING, CTEs, `UNION` / `INTERSECT` / `EXCEPT`, scalar subqueries, `IN (SELECT ...)`, `EXISTS (SELECT ...)`, views, triggers, indexes, `ANALYZE`, `EXPLAIN ESTIMATE FOR `, and `sys.*` catalogs including `sys.table_stats`, `sys.column_stats`, and `sys.planner_*` - **NoSQL Collection API**: Typed `Collection` with `Put`/`Get`/`Delete`/`Scan`/`Find` - **Single-file storage**: All data in one `.db` file with 4 KB B+tree pages - **In-memory mode**: Open empty in memory, load an existing `.db` + `.wal` into memory, then save back to disk @@ -22,7 +22,7 @@ Lightweight embedded SQL database engine for .NET with single-file storage, WAL - **Concurrent readers**: Snapshot-isolated readers alongside a single writer - **Statement + plan caching**: bounded caches for parsed SQL statements and SELECT plan reuse - **Fast-path lookups**: Direct B+tree access for `SELECT ... WHERE pk = value` -- **Persisted statistics**: Exact row counts maintained on write, explicit `row_count_is_exact` semantics in `sys.table_stats`, `ANALYZE`-refreshed column distinct/min/max plus internal histograms/heavy hitters/composite-prefix stats, stale tracking after writes, and reuse of fresh stats for `COUNT(*)`, selective lookup planning, join method choice, and bounded small-chain inner-join reordering +- **Persisted statistics**: Exact row counts maintained on write, explicit `row_count_is_exact` semantics in `sys.table_stats`, `ANALYZE`-refreshed column distinct/min/max plus histogram/heavy-hitter/composite-prefix diagnostics, stale tracking after writes, and reuse of fresh stats for `COUNT(*)`, selective lookup planning, join method choice, and bounded small-chain inner-join reordering - **Async-first**: All APIs are `async`/`await` from top to bottom Current boundary: diff --git a/src/CSharpDB.Execution/AdaptiveQueryReoptimizationRuntimeDiagnostics.cs b/src/CSharpDB.Execution/AdaptiveQueryReoptimizationRuntimeDiagnostics.cs new file mode 100644 index 00000000..0b4f4e6d --- /dev/null +++ b/src/CSharpDB.Execution/AdaptiveQueryReoptimizationRuntimeDiagnostics.cs @@ -0,0 +1,66 @@ +using CSharpDB.Primitives; + +namespace CSharpDB.Execution; + +internal enum AdaptiveQueryReoptimizationFallbackReason +{ + None = 0, + MaxBufferedRows, + ReoptimizationLimit, + Unsupported, +} + +internal sealed class AdaptiveQueryExecutionLease +{ + private int _remainingReoptimizations; + + public AdaptiveQueryExecutionLease(AdaptiveQueryReoptimizationOptions options) + { + Options = options; + _remainingReoptimizations = options.MaxReoptimizationsPerQuery; + } + + public AdaptiveQueryReoptimizationOptions Options { get; } + + public bool TryConsumeReoptimization() + { + while (true) + { + int current = Volatile.Read(ref _remainingReoptimizations); + if (current <= 0) + return false; + + if (Interlocked.CompareExchange(ref _remainingReoptimizations, current - 1, current) == current) + return true; + } + } +} + +internal sealed class AdaptiveQueryReoptimizationRuntimeDiagnostics +{ + private readonly Action _recordAttempt; + private readonly Action _recordSuccessfulSwitch; + private readonly Action _recordRejectedSwitch; + private readonly Action _recordDivergence; + private readonly Action _recordBufferedRows; + + public AdaptiveQueryReoptimizationRuntimeDiagnostics( + Action recordAttempt, + Action recordSuccessfulSwitch, + Action recordRejectedSwitch, + Action recordDivergence, + Action recordBufferedRows) + { + _recordAttempt = recordAttempt; + _recordSuccessfulSwitch = recordSuccessfulSwitch; + _recordRejectedSwitch = recordRejectedSwitch; + _recordDivergence = recordDivergence; + _recordBufferedRows = recordBufferedRows; + } + + public void RecordAttempt() => _recordAttempt(); + public void RecordSuccessfulSwitch() => _recordSuccessfulSwitch(); + public void RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason reason) => _recordRejectedSwitch(reason); + public void RecordDivergence() => _recordDivergence(); + public void RecordBufferedRows(long count) => _recordBufferedRows(count); +} diff --git a/src/CSharpDB.Execution/Operators.cs b/src/CSharpDB.Execution/Operators.cs index b85e160a..89339608 100644 --- a/src/CSharpDB.Execution/Operators.cs +++ b/src/CSharpDB.Execution/Operators.cs @@ -8588,6 +8588,611 @@ public int GetHashCode(HashJoinKey obj) } } +internal sealed class BufferedReplayOperator : IOperator, IBatchOperator, IBatchBackedRowOperator, IRowBufferReuseController, IBatchBufferReuseController, IEstimatedRowCountProvider +{ + private const int DefaultBatchSize = 64; + + private readonly List _bufferedRows; + private readonly IOperator _continuation; + private readonly bool _continuationAlreadyOpen; + private readonly bool _continuationExhausted; + private readonly int? _estimatedRowCount; + private IBatchOperator? _continuationBatchSource; + private int _bufferedIndex = -1; + private bool _opened; + private bool _bufferedComplete; + private bool _reuseCurrentBatch = true; + private RowBatch _currentBatch; + + public BufferedReplayOperator( + ColumnDefinition[] outputSchema, + List bufferedRows, + IOperator continuation, + bool continuationAlreadyOpen, + bool continuationExhausted) + { + OutputSchema = outputSchema; + _bufferedRows = bufferedRows; + _continuation = continuation; + _continuationAlreadyOpen = continuationAlreadyOpen; + _continuationExhausted = continuationExhausted; + _estimatedRowCount = continuation is IEstimatedRowCountProvider estimated && + estimated.EstimatedRowCount is int continuationEstimate + ? Math.Max(bufferedRows.Count, continuationEstimate) + : bufferedRows.Count; + _currentBatch = new RowBatch(outputSchema.Length, DefaultBatchSize); + } + + public ColumnDefinition[] OutputSchema { get; } + public bool ReusesCurrentRowBuffer => true; + public int? EstimatedRowCount => _estimatedRowCount; + public DbValue[] Current { get; private set; } = Array.Empty(); + IBatchOperator IBatchBackedRowOperator.BatchSource => this; + bool IBatchOperator.ReusesCurrentBatch => _reuseCurrentBatch; + RowBatch IBatchOperator.CurrentBatch => _currentBatch; + + public async ValueTask OpenAsync(CancellationToken ct = default) + { + if (_opened) + return; + + _opened = true; + _bufferedIndex = -1; + _bufferedComplete = _bufferedRows.Count == 0; + + if (!_continuationAlreadyOpen && !_continuationExhausted) + await _continuation.OpenAsync(ct); + + _continuationBatchSource = !_continuationExhausted + ? BatchSourceHelper.TryGetBatchSource(_continuation) + : null; + } + + public async ValueTask MoveNextAsync(CancellationToken ct = default) + { + if (!_bufferedComplete) + { + int next = _bufferedIndex + 1; + if (next < _bufferedRows.Count) + { + _bufferedIndex = next; + Current = _bufferedRows[next]; + return true; + } + + _bufferedComplete = true; + } + + if (_continuationExhausted) + return false; + + if (await _continuation.MoveNextAsync(ct)) + { + Current = _continuation.Current; + return true; + } + + return false; + } + + public async ValueTask MoveNextBatchAsync(CancellationToken ct = default) + { + var batch = _reuseCurrentBatch ? EnsureBatch(OutputSchema.Length) : new RowBatch(OutputSchema.Length, DefaultBatchSize); + batch.Reset(); + + while (batch.Count < batch.Capacity && !_bufferedComplete) + { + int next = _bufferedIndex + 1; + if (next >= _bufferedRows.Count) + { + _bufferedComplete = true; + break; + } + + _bufferedIndex = next; + _bufferedRows[next].CopyTo(batch.GetWritableRowSpan(batch.Count)); + batch.CommitWrittenRow(batch.Count); + } + + if (batch.Count > 0) + { + _currentBatch = batch; + return true; + } + + if (_continuationExhausted) + { + _currentBatch = batch; + return false; + } + + if (_continuationBatchSource != null) + { + bool hasBatch = await _continuationBatchSource.MoveNextBatchAsync(ct); + _currentBatch = hasBatch ? _continuationBatchSource.CurrentBatch : batch; + return hasBatch; + } + + while (batch.Count < batch.Capacity && await _continuation.MoveNextAsync(ct)) + { + _continuation.Current.CopyTo(batch.GetWritableRowSpan(batch.Count)); + batch.CommitWrittenRow(batch.Count); + } + + _currentBatch = batch; + return batch.Count > 0; + } + + public void SetReuseCurrentRowBuffer(bool reuse) + { + if (_continuation is IRowBufferReuseController controller) + controller.SetReuseCurrentRowBuffer(reuse); + } + + public void SetReuseCurrentBatch(bool reuse) + { + _reuseCurrentBatch = reuse; + if (_continuation is IBatchBufferReuseController controller) + controller.SetReuseCurrentBatch(reuse); + if (!reuse) + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + } + + public ValueTask DisposeAsync() => _continuation.DisposeAsync(); + + private RowBatch EnsureBatch(int columnCount) + { + if (_currentBatch.ColumnCount != columnCount) + _currentBatch = new RowBatch(columnCount, DefaultBatchSize); + + return _currentBatch; + } +} + +internal sealed class AdaptiveIndexNestedLoopJoinOperator : IOperator, IBatchOperator, IBatchBackedRowOperator, IProjectionPushdownTarget, IEstimatedRowCountProvider, IBatchBufferReuseController +{ + private const int DefaultBatchSize = 64; + + private readonly IOperator _outer; + private readonly IOperator _unusedHashRight; + private readonly Func _createLookupJoin; + private readonly Func _createHashJoin; + private readonly AdaptiveQueryExecutionLease _lease; + private readonly AdaptiveQueryReoptimizationRuntimeDiagnostics _diagnostics; + private readonly long _estimatedOuterRows; + private readonly int? _estimatedRowCount; + private IOperator? _active; + private IBatchOperator? _activeBatchSource; + private int[]? _projectionColumnIndices; + private bool _reuseCurrentBatch = true; + private RowBatch _currentBatch; + + public AdaptiveIndexNestedLoopJoinOperator( + IOperator outer, + IOperator unusedHashRight, + ColumnDefinition[] outputSchema, + Func createLookupJoin, + Func createHashJoin, + AdaptiveQueryExecutionLease lease, + AdaptiveQueryReoptimizationRuntimeDiagnostics diagnostics, + long estimatedOuterRows, + int? estimatedRowCount) + { + _outer = outer; + _unusedHashRight = unusedHashRight; + OutputSchema = outputSchema; + _createLookupJoin = createLookupJoin; + _createHashJoin = createHashJoin; + _lease = lease; + _diagnostics = diagnostics; + _estimatedOuterRows = Math.Max(estimatedOuterRows, 1); + _estimatedRowCount = estimatedRowCount; + _currentBatch = new RowBatch(outputSchema.Length, DefaultBatchSize); + } + + public ColumnDefinition[] OutputSchema { get; private set; } + public bool ReusesCurrentRowBuffer => _active?.ReusesCurrentRowBuffer ?? false; + public int? EstimatedRowCount => _estimatedRowCount; + public DbValue[] Current => _active?.Current ?? throw new InvalidOperationException("Operator is not open."); + IBatchOperator IBatchBackedRowOperator.BatchSource => this; + bool IBatchOperator.ReusesCurrentBatch => _reuseCurrentBatch; + RowBatch IBatchOperator.CurrentBatch => _currentBatch; + + public async ValueTask OpenAsync(CancellationToken ct = default) + { + if (_active != null) + return; + + _diagnostics.RecordAttempt(); + var options = _lease.Options; + long threshold = ComputeThreshold(_estimatedOuterRows, options); + var buffered = new List(Math.Min(options.MaxBufferedRows, (int)Math.Min(threshold + 1, int.MaxValue))); + bool exhausted = false; + bool switchToHash = false; + bool disposeUnusedHashRight = true; + + if (_outer is IRowBufferReuseController reuseController) + reuseController.SetReuseCurrentRowBuffer(false); + + await _outer.OpenAsync(ct); + while (buffered.Count <= threshold && buffered.Count < options.MaxBufferedRows) + { + if (!await _outer.MoveNextAsync(ct)) + { + exhausted = true; + break; + } + + buffered.Add(CloneRow(_outer.Current)); + } + + _diagnostics.RecordBufferedRows(buffered.Count); + + if (!exhausted && buffered.Count > threshold) + { + _diagnostics.RecordDivergence(); + if (_lease.TryConsumeReoptimization()) + { + switchToHash = true; + _diagnostics.RecordSuccessfulSwitch(); + } + else + { + _diagnostics.RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason.ReoptimizationLimit); + } + } + else if (!exhausted && buffered.Count >= options.MaxBufferedRows) + { + _diagnostics.RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason.MaxBufferedRows); + } + + var replay = new BufferedReplayOperator( + _outer.OutputSchema, + buffered, + _outer, + continuationAlreadyOpen: true, + continuationExhausted: exhausted); + + _active = switchToHash + ? _createHashJoin(replay) + : _createLookupJoin(replay); + disposeUnusedHashRight = !switchToHash; + + if (_projectionColumnIndices != null && + _active is IProjectionPushdownTarget projectionTarget && + !projectionTarget.TrySetOutputProjection(_projectionColumnIndices, OutputSchema)) + { + _diagnostics.RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason.Unsupported); + } + + if (_active is IBatchBufferReuseController batchController) + batchController.SetReuseCurrentBatch(_reuseCurrentBatch); + + try + { + await _active.OpenAsync(ct); + _activeBatchSource = BatchSourceHelper.TryGetBatchSource(_active); + } + finally + { + if (disposeUnusedHashRight) + await _unusedHashRight.DisposeAsync(); + } + } + + public ValueTask MoveNextAsync(CancellationToken ct = default) + => _active?.MoveNextAsync(ct) ?? ValueTask.FromResult(false); + + public async ValueTask MoveNextBatchAsync(CancellationToken ct = default) + { + if (_activeBatchSource != null) + { + bool hasBatch = await _activeBatchSource.MoveNextBatchAsync(ct); + _currentBatch = hasBatch + ? _activeBatchSource.CurrentBatch + : EnsureBatch(OutputSchema.Length); + return hasBatch; + } + + var batch = _reuseCurrentBatch ? EnsureBatch(OutputSchema.Length) : new RowBatch(OutputSchema.Length, DefaultBatchSize); + batch.Reset(); + while (batch.Count < batch.Capacity && _active != null && await _active.MoveNextAsync(ct)) + { + _active.Current.CopyTo(batch.GetWritableRowSpan(batch.Count)); + batch.CommitWrittenRow(batch.Count); + } + + _currentBatch = batch; + return batch.Count > 0; + } + + public bool TrySetOutputProjection(int[] columnIndices, ColumnDefinition[] outputSchema) + { + _projectionColumnIndices = (int[])columnIndices.Clone(); + OutputSchema = outputSchema; + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + return true; + } + + public void SetReuseCurrentBatch(bool reuse) + { + _reuseCurrentBatch = reuse; + if (_active is IBatchBufferReuseController controller) + controller.SetReuseCurrentBatch(reuse); + if (!reuse) + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + } + + public async ValueTask DisposeAsync() + { + if (_active != null) + { + await _active.DisposeAsync(); + } + else + { + await _outer.DisposeAsync(); + await _unusedHashRight.DisposeAsync(); + } + } + + private RowBatch EnsureBatch(int columnCount) + { + if (_currentBatch.ColumnCount != columnCount) + _currentBatch = new RowBatch(columnCount, DefaultBatchSize); + + return _currentBatch; + } + + private static long ComputeThreshold(long estimatedRows, AdaptiveQueryReoptimizationOptions options) + { + long divergenceRows = estimatedRows > long.MaxValue / options.DivergenceFactor + ? long.MaxValue + : estimatedRows * options.DivergenceFactor; + return Math.Max(options.MinimumObservedRows, divergenceRows); + } + + private static DbValue[] CloneRow(DbValue[] row) => row.Length == 0 ? Array.Empty() : (DbValue[])row.Clone(); +} + +internal sealed class AdaptiveHashJoinOperator : IOperator, IBatchOperator, IBatchBackedRowOperator, IProjectionPushdownTarget, IEstimatedRowCountProvider, IBatchBufferReuseController +{ + private const int DefaultBatchSize = 64; + + private readonly IOperator _left; + private readonly IOperator _right; + private readonly JoinType _joinType; + private readonly Expression? _residualCondition; + private readonly TableSchema _compositeSchema; + private readonly int _leftColCount; + private readonly int _rightColCount; + private readonly int[] _leftKeyIndices; + private readonly int[] _rightKeyIndices; + private readonly bool _plannedBuildRightSide; + private readonly long _estimatedLeftRows; + private readonly long _estimatedRightRows; + private readonly int? _estimatedRowCount; + private readonly DbFunctionRegistry _functions; + private readonly AdaptiveQueryExecutionLease _lease; + private readonly AdaptiveQueryReoptimizationRuntimeDiagnostics _diagnostics; + private IOperator? _active; + private IBatchOperator? _activeBatchSource; + private int[]? _projectionColumnIndices; + private bool _reuseCurrentBatch = true; + private RowBatch _currentBatch; + + public AdaptiveHashJoinOperator( + IOperator left, + IOperator right, + JoinType joinType, + Expression? residualCondition, + TableSchema compositeSchema, + int leftColCount, + int rightColCount, + int[] leftKeyIndices, + int[] rightKeyIndices, + bool plannedBuildRightSide, + long estimatedLeftRows, + long estimatedRightRows, + int? estimatedRowCount, + DbFunctionRegistry functions, + AdaptiveQueryExecutionLease lease, + AdaptiveQueryReoptimizationRuntimeDiagnostics diagnostics) + { + _left = left; + _right = right; + _joinType = joinType; + _residualCondition = residualCondition; + _compositeSchema = compositeSchema; + _leftColCount = leftColCount; + _rightColCount = rightColCount; + _leftKeyIndices = leftKeyIndices; + _rightKeyIndices = rightKeyIndices; + _plannedBuildRightSide = plannedBuildRightSide; + _estimatedLeftRows = Math.Max(estimatedLeftRows, 1); + _estimatedRightRows = Math.Max(estimatedRightRows, 1); + _estimatedRowCount = estimatedRowCount; + _functions = functions; + _lease = lease; + _diagnostics = diagnostics; + OutputSchema = compositeSchema.Columns as ColumnDefinition[] ?? compositeSchema.Columns.ToArray(); + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + } + + public ColumnDefinition[] OutputSchema { get; private set; } + public bool ReusesCurrentRowBuffer => _active?.ReusesCurrentRowBuffer ?? false; + public int? EstimatedRowCount => _estimatedRowCount; + public DbValue[] Current => _active?.Current ?? throw new InvalidOperationException("Operator is not open."); + IBatchOperator IBatchBackedRowOperator.BatchSource => this; + bool IBatchOperator.ReusesCurrentBatch => _reuseCurrentBatch; + RowBatch IBatchOperator.CurrentBatch => _currentBatch; + + public async ValueTask OpenAsync(CancellationToken ct = default) + { + if (_active != null) + return; + + _diagnostics.RecordAttempt(); + var options = _lease.Options; + var leftRows = new List(Math.Min(options.MaxBufferedRows, 1024)); + var rightRows = new List(Math.Min(options.MaxBufferedRows, 1024)); + + bool leftExhausted = await BufferSourceAsync(_left, leftRows, options.MaxBufferedRows, ct); + bool rightExhausted = await BufferSourceAsync(_right, rightRows, options.MaxBufferedRows, ct); + _diagnostics.RecordBufferedRows(leftRows.Count + rightRows.Count); + + bool buildRightSide = _plannedBuildRightSide; + if (_joinType == JoinType.Inner && leftExhausted && rightExhausted) + { + long plannedBuildEstimate = _plannedBuildRightSide ? _estimatedRightRows : _estimatedLeftRows; + long actualBuildRows = _plannedBuildRightSide ? rightRows.Count : leftRows.Count; + long actualProbeRows = _plannedBuildRightSide ? leftRows.Count : rightRows.Count; + if (actualBuildRows > ComputeThreshold(plannedBuildEstimate, options) && + actualProbeRows < actualBuildRows) + { + _diagnostics.RecordDivergence(); + if (_lease.TryConsumeReoptimization()) + { + buildRightSide = !_plannedBuildRightSide; + _diagnostics.RecordSuccessfulSwitch(); + } + else + { + _diagnostics.RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason.ReoptimizationLimit); + } + } + } + else if (!leftExhausted || !rightExhausted) + { + _diagnostics.RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason.MaxBufferedRows); + } + + var leftReplay = new BufferedReplayOperator(_left.OutputSchema, leftRows, _left, true, leftExhausted); + var rightReplay = new BufferedReplayOperator(_right.OutputSchema, rightRows, _right, true, rightExhausted); + _active = new HashJoinOperator( + leftReplay, + rightReplay, + _joinType, + _residualCondition, + _compositeSchema, + _leftColCount, + _rightColCount, + _leftKeyIndices, + _rightKeyIndices, + buildRightSide, + buildRightSide ? rightRows.Count : leftRows.Count, + _estimatedRowCount, + _functions); + + if (_projectionColumnIndices != null && + _active is IProjectionPushdownTarget projectionTarget) + { + projectionTarget.TrySetOutputProjection(_projectionColumnIndices, OutputSchema); + } + + if (_active is IBatchBufferReuseController batchController) + batchController.SetReuseCurrentBatch(_reuseCurrentBatch); + + await _active.OpenAsync(ct); + _activeBatchSource = BatchSourceHelper.TryGetBatchSource(_active); + } + + public ValueTask MoveNextAsync(CancellationToken ct = default) + => _active?.MoveNextAsync(ct) ?? ValueTask.FromResult(false); + + public async ValueTask MoveNextBatchAsync(CancellationToken ct = default) + { + if (_activeBatchSource != null) + { + bool hasBatch = await _activeBatchSource.MoveNextBatchAsync(ct); + _currentBatch = hasBatch + ? _activeBatchSource.CurrentBatch + : EnsureBatch(OutputSchema.Length); + return hasBatch; + } + + var batch = _reuseCurrentBatch ? EnsureBatch(OutputSchema.Length) : new RowBatch(OutputSchema.Length, DefaultBatchSize); + batch.Reset(); + while (batch.Count < batch.Capacity && _active != null && await _active.MoveNextAsync(ct)) + { + _active.Current.CopyTo(batch.GetWritableRowSpan(batch.Count)); + batch.CommitWrittenRow(batch.Count); + } + + _currentBatch = batch; + return batch.Count > 0; + } + + public bool TrySetOutputProjection(int[] columnIndices, ColumnDefinition[] outputSchema) + { + _projectionColumnIndices = (int[])columnIndices.Clone(); + OutputSchema = outputSchema; + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + return true; + } + + public void SetReuseCurrentBatch(bool reuse) + { + _reuseCurrentBatch = reuse; + if (_active is IBatchBufferReuseController controller) + controller.SetReuseCurrentBatch(reuse); + if (!reuse) + _currentBatch = new RowBatch(OutputSchema.Length, DefaultBatchSize); + } + + public async ValueTask DisposeAsync() + { + if (_active != null) + { + await _active.DisposeAsync(); + } + else + { + await _left.DisposeAsync(); + await _right.DisposeAsync(); + } + } + + private RowBatch EnsureBatch(int columnCount) + { + if (_currentBatch.ColumnCount != columnCount) + _currentBatch = new RowBatch(columnCount, DefaultBatchSize); + + return _currentBatch; + } + + private static async ValueTask BufferSourceAsync( + IOperator source, + List rows, + int maxRows, + CancellationToken ct) + { + if (source is IRowBufferReuseController reuseController) + reuseController.SetReuseCurrentRowBuffer(false); + + await source.OpenAsync(ct); + while (rows.Count < maxRows) + { + if (!await source.MoveNextAsync(ct)) + return true; + + rows.Add(CloneRow(source.Current)); + } + + return false; + } + + private static long ComputeThreshold(long estimatedRows, AdaptiveQueryReoptimizationOptions options) + { + long divergenceRows = estimatedRows > long.MaxValue / options.DivergenceFactor + ? long.MaxValue + : estimatedRows * options.DivergenceFactor; + return Math.Max(options.MinimumObservedRows, divergenceRows); + } + + private static DbValue[] CloneRow(DbValue[] row) => row.Length == 0 ? Array.Empty() : (DbValue[])row.Clone(); +} + /// /// Index nested-loop join operator. /// Uses a right-side PRIMARY KEY or unique single-column index for lookup joins. @@ -11317,7 +11922,8 @@ public IndexScanOperator( int[]? expectedKeyColumnIndices = null, DbValue[]? expectedKeyComponents = null, string?[]? expectedKeyCollations = null, - bool usesOrderedTextPayload = false) + bool usesOrderedTextPayload = false, + int? estimatedRowCount = null) { _indexStore = indexStore; _tableTree = tableTree; @@ -11328,6 +11934,7 @@ public IndexScanOperator( _expectedKeyComponents = expectedKeyComponents; _expectedKeyCollations = expectedKeyCollations; _usesOrderedTextPayload = usesOrderedTextPayload; + _estimatedRowCount = estimatedRowCount > 0 ? estimatedRowCount : null; if (expectedKeyColumnIndices is { Length: > 0 } && expectedKeyComponents is { Length: > 0 }) { _expectedKeyAccessors = BoundColumnAccessHelper.CreateAccessors(_recordSerializer, expectedKeyColumnIndices); diff --git a/src/CSharpDB.Execution/QueryPlanner.cs b/src/CSharpDB.Execution/QueryPlanner.cs index 90bc3821..11d1d6c2 100644 --- a/src/CSharpDB.Execution/QueryPlanner.cs +++ b/src/CSharpDB.Execution/QueryPlanner.cs @@ -1,4 +1,5 @@ using System.Buffers.Binary; +using System.Globalization; using System.Security.Cryptography; using System.Runtime.CompilerServices; using System.Text; @@ -18,6 +19,89 @@ public sealed class QueryPlanner private const int AnalyzeHistogramBucketCount = 16; private const int AnalyzeFrequentValueCount = 8; private const int MaxJoinReorderDpLeafCount = 6; + private const int MaxJoinOrderRowGoalRows = 4096; + private const int MaxExplainEstimateRows = 500; + + internal readonly record struct AdaptiveQueryReoptimizationDiagnosticsSnapshot( + long EligibleQueryCount, + long AttemptCount, + long SuccessfulSwitchCount, + long RejectedSwitchCount, + long DivergenceEventCount, + long BufferedRowCount, + long MaxBufferedFallbackCount, + long UnsupportedFallbackCount, + long ReoptimizationLimitFallbackCount); + + private sealed record PlannerEstimateDiagnostic( + int NodeId, + int? ParentNodeId, + string NodeKind, + string? Target, + string Decision, + long? EstimatedRows, + long? EstimatedCost, + string? StatsSource, + string StatsState, + string? Detail); + + private readonly record struct ExplainEstimateResult(TableSchema? Schema, bool HasRows, long Rows); + + private sealed class PlannerEstimateDiagnostics + { + private readonly List _rows = new(); + private int _nextNodeId = 1; + private bool _truncated; + + public IReadOnlyList Rows => _rows; + + public int Add( + int? parentNodeId, + string nodeKind, + string? target, + string decision, + long? estimatedRows = null, + long? estimatedCost = null, + string? statsSource = null, + string statsState = "not_applicable", + string? detail = null) + { + int nodeId = _nextNodeId++; + if (_rows.Count >= MaxExplainEstimateRows) + { + if (!_truncated && _rows.Count > 0) + { + _rows[^1] = new PlannerEstimateDiagnostic( + nodeId, + parentNodeId, + "truncation", + "EXPLAIN ESTIMATE", + "truncated", + null, + null, + null, + "bounded", + $"Diagnostic output was capped at {MaxExplainEstimateRows} rows."); + _truncated = true; + } + + return nodeId; + } + + _rows.Add(new PlannerEstimateDiagnostic( + nodeId, + parentNodeId, + nodeKind, + target, + decision, + estimatedRows, + estimatedCost, + statsSource, + statsState, + detail)); + return nodeId; + } + } private sealed class AnalyzedTableStatisticsResult { @@ -180,6 +264,55 @@ public override int GetHashCode() => new ColumnDefinition { Name = "is_stale", Type = DbType.Integer, Nullable = false }, ]; + private static readonly ColumnDefinition[] SystemPlannerHistogramsColumns = + [ + new ColumnDefinition { Name = "table_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "column_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "ordinal_position", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "bucket_index", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "lower_bound", Type = DbType.Null, Nullable = true }, + new ColumnDefinition { Name = "upper_bound", Type = DbType.Null, Nullable = true }, + new ColumnDefinition { Name = "row_count", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "non_null_count", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "is_stale", Type = DbType.Integer, Nullable = false }, + ]; + + private static readonly ColumnDefinition[] SystemPlannerHeavyHittersColumns = + [ + new ColumnDefinition { Name = "table_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "column_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "ordinal_position", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "value", Type = DbType.Null, Nullable = true }, + new ColumnDefinition { Name = "row_count", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "frequency_ppm", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "is_stale", Type = DbType.Integer, Nullable = false }, + ]; + + private static readonly ColumnDefinition[] SystemPlannerIndexPrefixStatsColumns = + [ + new ColumnDefinition { Name = "table_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "index_name", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "prefix_length", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "prefix_columns", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "distinct_count", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "table_row_count", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "is_stale", Type = DbType.Integer, Nullable = false }, + ]; + + private static readonly ColumnDefinition[] ExplainEstimateColumns = + [ + new ColumnDefinition { Name = "node_id", Type = DbType.Integer, Nullable = false }, + new ColumnDefinition { Name = "parent_node_id", Type = DbType.Integer, Nullable = true }, + new ColumnDefinition { Name = "node_kind", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "target", Type = DbType.Text, Nullable = true }, + new ColumnDefinition { Name = "decision", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "estimated_rows", Type = DbType.Integer, Nullable = true }, + new ColumnDefinition { Name = "estimated_cost", Type = DbType.Integer, Nullable = true }, + new ColumnDefinition { Name = "stats_source", Type = DbType.Text, Nullable = true }, + new ColumnDefinition { Name = "stats_state", Type = DbType.Text, Nullable = false }, + new ColumnDefinition { Name = "detail", Type = DbType.Text, Nullable = true }, + ]; + private static readonly ColumnDefinition[] SystemSavedQueriesColumns = [ new ColumnDefinition { Name = "id", Type = DbType.Integer, Nullable = false, IsPrimaryKey = true, IsIdentity = true }, @@ -220,6 +353,7 @@ public override int GetHashCode() => /// Cache of parsed trigger bodies to avoid re-parsing on every row. private readonly Dictionary> _triggerBodyCache = new(StringComparer.OrdinalIgnoreCase); private readonly Dictionary _nextRowIdCache = new(StringComparer.OrdinalIgnoreCase); + private readonly Dictionary _rowIdReservationLeases = new(StringComparer.OrdinalIgnoreCase); private readonly HashSet _dirtyNextRowIdTables = new(StringComparer.OrdinalIgnoreCase); private readonly Dictionary> _compiledExpressionCache = new(); private readonly Dictionary _compiledSpanExpressionCache = new(); @@ -250,13 +384,24 @@ public override int GetHashCode() => private long _selectPlanCacheMissCount; private long _selectPlanCacheReclassificationCount; private long _selectPlanCacheStoreCount; + private long _adaptiveEligibleQueryCount; + private long _adaptiveAttemptCount; + private long _adaptiveSuccessfulSwitchCount; + private long _adaptiveRejectedSwitchCount; + private long _adaptiveDivergenceEventCount; + private long _adaptiveBufferedRowCount; + private long _adaptiveMaxBufferedFallbackCount; + private long _adaptiveUnsupportedFallbackCount; + private long _adaptiveReoptimizationLimitFallbackCount; private readonly record struct CorrelationScope(DbValue[] Row, TableSchema Schema); private long _observedSchemaVersion; private readonly Func? _nextRowIdHintProvider; private readonly Func? _nextRowIdReservationProvider; + private readonly Func? _nextRowIdRangeReservationProvider; private readonly Action? _nextRowIdObservationProvider; private readonly bool _useTransientNextRowIdHints; + private readonly AdaptiveQueryReoptimizationRuntimeDiagnostics _adaptiveRuntimeDiagnostics; private const int MaxCompiledExpressionCacheEntries = 4096; private const int MaxSelectPlanCacheEntries = 1024; @@ -324,6 +469,42 @@ private sealed class ReusableInsertEncodingBuffer public byte[] Buffer { get; set; } = Array.Empty(); } + private sealed class RowIdReservationLease + { + public RowIdReservationLease(long nextRowId, long endExclusive) + { + NextRowId = nextRowId; + EndExclusive = endExclusive; + } + + public long NextRowId { get; private set; } + public long EndExclusive { get; } + + public bool TryReserve(long minimumNextRowId, out long rowId) + { + if (NextRowId < minimumNextRowId) + NextRowId = minimumNextRowId; + + if (NextRowId >= EndExclusive) + { + rowId = 0; + return false; + } + + rowId = NextRowId; + NextRowId = checked(NextRowId + 1); + return true; + } + + public bool AdvanceTo(long nextRowId) + { + if (NextRowId < nextRowId) + NextRowId = nextRowId; + + return NextRowId < EndExclusive; + } + } + private sealed class ResolvedInsertIndexMutationPlan { public required IndexSchema Index { get; init; } @@ -351,6 +532,8 @@ internal readonly record struct SelectPlanCacheDiagnostics( /// public bool PreferSyncPointLookups { get; set; } = true; + public AdaptiveQueryReoptimizationOptions AdaptiveQueryReoptimization { get; } + public QueryPlanner( Pager pager, SchemaCatalog catalog, @@ -360,7 +543,35 @@ public QueryPlanner( Func? nextRowIdReservationProvider = null, Action? nextRowIdObservationProvider = null, bool useTransientNextRowIdHints = false, - DbFunctionRegistry? functions = null) + DbFunctionRegistry? functions = null, + AdaptiveQueryReoptimizationOptions? adaptiveQueryReoptimization = null) + : this( + pager, + catalog, + recordSerializer, + tableRowCountProvider, + nextRowIdHintProvider, + nextRowIdReservationProvider, + nextRowIdRangeReservationProvider: null, + nextRowIdObservationProvider, + useTransientNextRowIdHints, + functions, + adaptiveQueryReoptimization) + { + } + + internal QueryPlanner( + Pager pager, + SchemaCatalog catalog, + IRecordSerializer? recordSerializer, + Func? tableRowCountProvider, + Func? nextRowIdHintProvider, + Func? nextRowIdReservationProvider, + Func? nextRowIdRangeReservationProvider, + Action? nextRowIdObservationProvider, + bool useTransientNextRowIdHints, + DbFunctionRegistry? functions, + AdaptiveQueryReoptimizationOptions? adaptiveQueryReoptimization = null) { _pager = pager; _catalog = catalog; @@ -372,9 +583,17 @@ public QueryPlanner( _tableRowCountProvider = tableRowCountProvider; _nextRowIdHintProvider = nextRowIdHintProvider; _nextRowIdReservationProvider = nextRowIdReservationProvider; + _nextRowIdRangeReservationProvider = nextRowIdRangeReservationProvider; _nextRowIdObservationProvider = nextRowIdObservationProvider; _observedSchemaVersion = catalog.SchemaVersion; _useTransientNextRowIdHints = useTransientNextRowIdHints; + AdaptiveQueryReoptimization = NormalizeAdaptiveQueryReoptimizationOptions(adaptiveQueryReoptimization); + _adaptiveRuntimeDiagnostics = new AdaptiveQueryReoptimizationRuntimeDiagnostics( + RecordAdaptiveAttempt, + RecordAdaptiveSuccessfulSwitch, + RecordAdaptiveRejectedSwitch, + RecordAdaptiveDivergence, + RecordAdaptiveBufferedRows); } public IReadOnlyCollection> GetCommittedNextRowIdHints() @@ -436,6 +655,7 @@ private async ValueTask ExecuteCoreAsync(Statement stmt, Cancellati token => ExecuteDropTriggerAsync(dropTrig, token), ct), AnalyzeStatement analyze => await ExecuteAnalyzeAsync(analyze, ct), + ExplainEstimateStatement explain => ExecuteExplainEstimate(explain), _ => throw new CSharpDbException(ErrorCode.Unknown, $"Unknown statement type: {stmt.GetType().Name}"), }; } @@ -450,27 +670,33 @@ private async ValueTask ExecuteSchemaMutationAsync( return await action(ct); } - private ValueTask ExecuteQueryAsync(QueryStatement stmt, CancellationToken ct) + private ValueTask ExecuteQueryAsync( + QueryStatement stmt, + CancellationToken ct, + bool suppressAdaptiveReoptimization = false) { if (ContainsSubqueries(stmt)) - return ExecuteQueryWithSubqueriesAsync(stmt, ct); + return ExecuteQueryWithSubqueriesAsync(stmt, ct, suppressAdaptiveReoptimization); return stmt switch { - SelectStatement select => ValueTask.FromResult(ExecuteSelect(select)), + SelectStatement select => ValueTask.FromResult(ExecuteSelect(select, suppressAdaptiveReoptimization)), CompoundSelectStatement compound => ExecuteCompoundSelectAsync(compound, ct), _ => throw new CSharpDbException(ErrorCode.Unknown, $"Unknown query type: {stmt.GetType().Name}"), }; } - private async ValueTask ExecuteQueryWithSubqueriesAsync(QueryStatement stmt, CancellationToken ct) + private async ValueTask ExecuteQueryWithSubqueriesAsync( + QueryStatement stmt, + CancellationToken ct, + bool suppressAdaptiveReoptimization = false) { var lowered = await RewriteSubqueriesInQueryAsync(stmt, ct); if (!ContainsSubqueries(lowered)) { return lowered switch { - SelectStatement select => ExecuteSelect(select), + SelectStatement select => ExecuteSelect(select, suppressAdaptiveReoptimization), CompoundSelectStatement compound => await ExecuteCompoundSelectAsync(compound, ct), _ => throw new CSharpDbException(ErrorCode.Unknown, $"Unknown query type: {lowered.GetType().Name}"), }; @@ -503,6 +729,81 @@ internal void ResetSelectPlanCacheDiagnostics() _selectPlanCacheStoreCount = 0; } + internal AdaptiveQueryReoptimizationDiagnosticsSnapshot GetAdaptiveQueryReoptimizationDiagnosticsSnapshot() + => new( + Interlocked.Read(ref _adaptiveEligibleQueryCount), + Interlocked.Read(ref _adaptiveAttemptCount), + Interlocked.Read(ref _adaptiveSuccessfulSwitchCount), + Interlocked.Read(ref _adaptiveRejectedSwitchCount), + Interlocked.Read(ref _adaptiveDivergenceEventCount), + Interlocked.Read(ref _adaptiveBufferedRowCount), + Interlocked.Read(ref _adaptiveMaxBufferedFallbackCount), + Interlocked.Read(ref _adaptiveUnsupportedFallbackCount), + Interlocked.Read(ref _adaptiveReoptimizationLimitFallbackCount)); + + internal void ResetAdaptiveQueryReoptimizationDiagnostics() + { + Interlocked.Exchange(ref _adaptiveEligibleQueryCount, 0); + Interlocked.Exchange(ref _adaptiveAttemptCount, 0); + Interlocked.Exchange(ref _adaptiveSuccessfulSwitchCount, 0); + Interlocked.Exchange(ref _adaptiveRejectedSwitchCount, 0); + Interlocked.Exchange(ref _adaptiveDivergenceEventCount, 0); + Interlocked.Exchange(ref _adaptiveBufferedRowCount, 0); + Interlocked.Exchange(ref _adaptiveMaxBufferedFallbackCount, 0); + Interlocked.Exchange(ref _adaptiveUnsupportedFallbackCount, 0); + Interlocked.Exchange(ref _adaptiveReoptimizationLimitFallbackCount, 0); + } + + private static AdaptiveQueryReoptimizationOptions NormalizeAdaptiveQueryReoptimizationOptions( + AdaptiveQueryReoptimizationOptions? options) + { + options ??= new AdaptiveQueryReoptimizationOptions(); + + return new AdaptiveQueryReoptimizationOptions + { + Enabled = options.Enabled, + DivergenceFactor = Math.Max(2, options.DivergenceFactor), + MinimumObservedRows = Math.Max(1, options.MinimumObservedRows), + MaxBufferedRows = Math.Max(1, options.MaxBufferedRows), + MaxReoptimizationsPerQuery = Math.Max(0, options.MaxReoptimizationsPerQuery), + }; + } + + private void RecordAdaptiveEligibleQuery() => + Interlocked.Increment(ref _adaptiveEligibleQueryCount); + + private void RecordAdaptiveAttempt() => + Interlocked.Increment(ref _adaptiveAttemptCount); + + private void RecordAdaptiveSuccessfulSwitch() => + Interlocked.Increment(ref _adaptiveSuccessfulSwitchCount); + + private void RecordAdaptiveRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason reason) + { + Interlocked.Increment(ref _adaptiveRejectedSwitchCount); + switch (reason) + { + case AdaptiveQueryReoptimizationFallbackReason.MaxBufferedRows: + Interlocked.Increment(ref _adaptiveMaxBufferedFallbackCount); + break; + case AdaptiveQueryReoptimizationFallbackReason.ReoptimizationLimit: + Interlocked.Increment(ref _adaptiveReoptimizationLimitFallbackCount); + break; + case AdaptiveQueryReoptimizationFallbackReason.Unsupported: + Interlocked.Increment(ref _adaptiveUnsupportedFallbackCount); + break; + } + } + + private void RecordAdaptiveDivergence() => + Interlocked.Increment(ref _adaptiveDivergenceEventCount); + + private void RecordAdaptiveBufferedRows(long count) + { + if (count > 0) + Interlocked.Add(ref _adaptiveBufferedRowCount, count); + } + private void InvalidateSchemaSensitiveCachesIfNeeded() { long currentVersion = _catalog.SchemaVersion; @@ -511,6 +812,7 @@ private void InvalidateSchemaSensitiveCachesIfNeeded() _triggerBodyCache.Clear(); _nextRowIdCache.Clear(); + _rowIdReservationLeases.Clear(); _dirtyNextRowIdTables.Clear(); _compiledExpressionCache.Clear(); _compiledSpanExpressionCache.Clear(); @@ -602,6 +904,7 @@ private async ValueTask ExecuteCreateTableAsync(CreateTableStatemen for (int i = 0; i < foreignKeys.Length; i++) await CreateForeignKeySupportIndexAsync(schema, foreignKeys[i], ct); _nextRowIdCache.Remove(stmt.TableName); + _rowIdReservationLeases.Remove(stmt.TableName); _dirtyNextRowIdTables.Remove(stmt.TableName); return new QueryResult(0); } @@ -624,6 +927,7 @@ private async ValueTask ExecuteDropTableAsync(DropTableStatement st await _catalog.DropTableAsync(stmt.TableName, ct); _nextRowIdCache.Remove(stmt.TableName); + _rowIdReservationLeases.Remove(stmt.TableName); _dirtyNextRowIdTables.Remove(stmt.TableName); return new QueryResult(0); } @@ -776,6 +1080,8 @@ private async ValueTask ExecuteAlterTableAsync(AlterTableStatement await RenameTableWithDependenciesAsync(stmt.TableName, rename.NewTableName, schema, ct); if (_nextRowIdCache.Remove(stmt.TableName, out long nextRowId)) _nextRowIdCache[rename.NewTableName] = nextRowId; + if (_rowIdReservationLeases.Remove(stmt.TableName, out RowIdReservationLease? rowIdLease)) + _rowIdReservationLeases[rename.NewTableName] = rowIdLease; if (_dirtyNextRowIdTables.Remove(stmt.TableName)) _dirtyNextRowIdTables.Add(rename.NewTableName); break; @@ -1990,45 +2296,1042 @@ private async ValueTask ExecuteCreateTriggerAsync(CreateTriggerStat // Validate the target table exists GetSchema(stmt.TableName); - // Serialize the trigger body statements back to SQL text for storage - string bodySql = SerializeTriggerBody(stmt); + // Serialize the trigger body statements back to SQL text for storage + string bodySql = SerializeTriggerBody(stmt); + + var schema = new TriggerSchema + { + TriggerName = stmt.TriggerName, + TableName = stmt.TableName, + Timing = stmt.Timing, + Event = stmt.Event, + BodySql = bodySql, + }; + + await _catalog.CreateTriggerAsync(schema, ct); + _triggerBodyCache.Remove(stmt.TriggerName); // invalidate cache + return new QueryResult(0); + } + + private async ValueTask ExecuteDropTriggerAsync(DropTriggerStatement stmt, CancellationToken ct) + { + if (stmt.IfExists && _catalog.GetTrigger(stmt.TriggerName) == null) + return new QueryResult(0); + + await _catalog.DropTriggerAsync(stmt.TriggerName, ct); + _triggerBodyCache.Remove(stmt.TriggerName); + return new QueryResult(0); + } + + private async ValueTask ExecuteAnalyzeAsync(AnalyzeStatement stmt, CancellationToken ct) + { + if (!string.IsNullOrWhiteSpace(stmt.TableName)) + { + await AnalyzeTableAsync(stmt.TableName, ct); + return new QueryResult(0); + } + + foreach (string tableName in _catalog.GetTableNames().ToArray()) + await AnalyzeTableAsync(tableName, ct); + + return new QueryResult(0); + } + + private QueryResult ExecuteExplainEstimate(ExplainEstimateStatement stmt) + { + var diagnostics = new PlannerEstimateDiagnostics(); + int rootNode = diagnostics.Add( + parentNodeId: null, + nodeKind: "statement", + target: "EXPLAIN ESTIMATE", + decision: "diagnostic-only", + statsState: "not_executed", + detail: "The target query was inspected for planner estimates only and was not executed."); + + switch (stmt.Target) + { + case QueryStatement query: + ExplainQueryStatement(query, diagnostics, rootNode, "query"); + break; + case WithStatement with: + ExplainWithStatement(with, diagnostics, rootNode); + break; + default: + throw new CSharpDbException( + ErrorCode.SyntaxError, + "EXPLAIN ESTIMATE FOR supports SELECT, WITH, and compound SELECT queries only."); + } + + return QueryResult.FromMaterializedRows(ExplainEstimateColumns, BuildExplainEstimateRows(diagnostics.Rows)); + } + + private static List BuildExplainEstimateRows(IReadOnlyList diagnostics) + { + var rows = new List(diagnostics.Count); + for (int i = 0; i < diagnostics.Count; i++) + { + PlannerEstimateDiagnostic d = diagnostics[i]; + rows.Add( + [ + DbValue.FromInteger(d.NodeId), + d.ParentNodeId.HasValue ? DbValue.FromInteger(d.ParentNodeId.Value) : DbValue.Null, + DbValue.FromText(d.NodeKind), + d.Target is { Length: > 0 } target ? DbValue.FromText(target) : DbValue.Null, + DbValue.FromText(d.Decision), + d.EstimatedRows.HasValue ? DbValue.FromInteger(d.EstimatedRows.Value) : DbValue.Null, + d.EstimatedCost.HasValue ? DbValue.FromInteger(d.EstimatedCost.Value) : DbValue.Null, + d.StatsSource is { Length: > 0 } source ? DbValue.FromText(source) : DbValue.Null, + DbValue.FromText(d.StatsState), + d.Detail is { Length: > 0 } detail ? DbValue.FromText(detail) : DbValue.Null, + ]); + } + + return rows; + } + + private void ExplainWithStatement(WithStatement stmt, PlannerEstimateDiagnostics diagnostics, int parentNode) + { + int withNode = diagnostics.Add( + parentNode, + "with", + "WITH", + "cte-scope", + statsState: "not_executed", + detail: $"{stmt.Ctes.Count} CTE definition(s); CTE row counts are not materialized by EXPLAIN ESTIMATE."); + + for (int i = 0; i < stmt.Ctes.Count; i++) + { + CteDefinition cte = stmt.Ctes[i]; + int cteNode = diagnostics.Add( + withNode, + "cte", + cte.Name, + "definition", + statsState: "not_executed", + detail: "CTE query inspected without materialization."); + ExplainQueryStatement(cte.Query, diagnostics, cteNode, cte.Name); + } + + ExplainQueryStatement(stmt.MainQuery, diagnostics, withNode, "main"); + } + + private ExplainEstimateResult ExplainQueryStatement( + QueryStatement stmt, + PlannerEstimateDiagnostics diagnostics, + int parentNode, + string target) + { + return stmt switch + { + SelectStatement select => ExplainSelectStatement(select, diagnostics, parentNode, target), + CompoundSelectStatement compound => ExplainCompoundSelectStatement(compound, diagnostics, parentNode, target), + _ => new ExplainEstimateResult(null, false, 0), + }; + } + + private ExplainEstimateResult ExplainCompoundSelectStatement( + CompoundSelectStatement stmt, + PlannerEstimateDiagnostics diagnostics, + int parentNode, + string target) + { + int compoundNode = diagnostics.Add( + parentNode, + "compound", + target, + SetOperationToSql(stmt.Operation), + statsState: "metadata", + detail: "Compound query estimate is derived from child query estimates."); + + ExplainEstimateResult left = ExplainQueryStatement(stmt.Left, diagnostics, compoundNode, "left"); + ExplainEstimateResult right = ExplainQueryStatement(stmt.Right, diagnostics, compoundNode, "right"); + + bool hasRows = left.HasRows && right.HasRows; + long rows = 0; + if (hasRows) + { + rows = stmt.Operation switch + { + SetOperationKind.Union => SafeAdd(left.Rows, right.Rows), + SetOperationKind.Intersect => Math.Min(left.Rows, right.Rows), + SetOperationKind.Except => left.Rows, + _ => SafeAdd(left.Rows, right.Rows), + }; + + ApplyLimitOffsetToEstimate(stmt.Limit, stmt.Offset, ref rows); + } + + diagnostics.Add( + compoundNode, + "compound-estimate", + target, + "output-cardinality", + hasRows ? rows : null, + hasRows ? rows : null, + "child-estimates", + hasRows ? "estimated" : "missing", + "UNION assumes additive inputs before distinct elimination; INTERSECT uses the smaller child; EXCEPT keeps the left child estimate."); + + return new ExplainEstimateResult(null, hasRows, rows); + } + + private ExplainEstimateResult ExplainSelectStatement( + SelectStatement stmt, + PlannerEstimateDiagnostics diagnostics, + int parentNode, + string target) + { + int selectNode = diagnostics.Add( + parentNode, + "select", + target, + "plan-estimate", + statsState: "metadata", + detail: QueryToSql(stmt)); + + if (stmt.From is JoinTableRef join) + ExplainJoinReorder(join, stmt.Where, diagnostics, selectNode); + + ExplainEstimateResult from = ExplainTableRef(stmt.From, stmt.Where, diagnostics, selectNode); + long rows = from.Rows; + bool hasRows = from.HasRows; + + if (stmt.Where != null && stmt.From is not JoinTableRef and not SimpleTableRef) + { + diagnostics.Add( + selectNode, + "filter", + ExprToSql(stmt.Where), + "not-estimated", + statsState: "unsupported", + detail: "Filter estimate is available for base-table and join sources in this diagnostic surface."); + } + + if (hasRows && (stmt.GroupBy is { Count: > 0 } || stmt.Columns.Any(static c => c.Expression != null && ContainsAggregate(c.Expression)))) + { + rows = stmt.GroupBy is { Count: > 0 } + ? Math.Max(1, Math.Min(rows, rows / 2)) + : 1; + + diagnostics.Add( + selectNode, + "aggregate", + stmt.GroupBy is { Count: > 0 } ? "GROUP BY" : "scalar", + "output-cardinality", + rows, + rows, + "metadata", + "estimated", + "Aggregate output estimate is bounded because this diagnostic does not execute the aggregate."); + } + + if (hasRows) + ApplyLimitOffsetToEstimate(stmt.Limit, stmt.Offset, ref rows); + + if (stmt.Limit.HasValue || stmt.Offset.HasValue) + { + diagnostics.Add( + selectNode, + "row-goal", + "LIMIT/OFFSET", + "applied", + hasRows ? rows : null, + hasRows ? rows : null, + "query-shape", + hasRows ? "estimated" : "missing", + $"LIMIT={stmt.Limit?.ToString(CultureInfo.InvariantCulture) ?? "none"}, OFFSET={stmt.Offset?.ToString(CultureInfo.InvariantCulture) ?? "none"}."); + } + + return new ExplainEstimateResult(from.Schema, hasRows, rows); + } + + private ExplainEstimateResult ExplainTableRef( + TableRef tableRef, + Expression? outerWhere, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + switch (tableRef) + { + case SingleRowTableRef: + { + diagnostics.Add(parentNode, "source", "single-row", "constant-row", 1, 1, "query-shape", "exact"); + return new ExplainEstimateResult(CreateSingleRowSchema(), true, 1); + } + case SimpleTableRef simple: + return ExplainSimpleTableRef(simple, outerWhere, diagnostics, parentNode); + case JoinTableRef join: + return ExplainJoinTableRef(join, outerWhere, diagnostics, parentNode); + default: + diagnostics.Add(parentNode, "source", tableRef.GetType().Name, "unsupported", statsState: "unsupported"); + return new ExplainEstimateResult(null, false, 0); + } + } + + private ExplainEstimateResult ExplainSimpleTableRef( + SimpleTableRef simple, + Expression? outerWhere, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + string target = simple.Alias is { Length: > 0 } + ? $"{simple.TableName} AS {simple.Alias}" + : simple.TableName; + + if (TryBuildSystemCatalogSource(simple, out var systemSource)) + { + long systemRowCount = TryNormalizeSystemCatalogTableName(simple.TableName, out string normalizedSystemName) + ? CountSystemCatalogRows(normalizedSystemName) + : 0; + diagnostics.Add(parentNode, "source", target, "system-catalog", systemRowCount, systemRowCount, "sys.catalog", "exact"); + return new ExplainEstimateResult(systemSource.schema, true, systemRowCount); + } + + if (_cteData != null && _cteData.TryGetValue(simple.TableName, out var cteInfo)) + { + var cteSchema = CreateQualifiedSchema(cteInfo.Schema.TableName, cteInfo.Schema.Columns, simple.Alias ?? simple.TableName); + diagnostics.Add(parentNode, "source", target, "cte-materialized", cteInfo.Rows.Count, cteInfo.Rows.Count, "cte", "exact"); + return new ExplainEstimateResult(cteSchema, true, cteInfo.Rows.Count); + } + + string? viewSql = _catalog.GetViewSql(simple.TableName); + if (viewSql != null) + { + int viewNode = diagnostics.Add( + parentNode, + "source", + target, + "view-expanded", + statsSource: "sys.views", + statsState: "metadata"); + + if (Parser.Parse(viewSql) is QueryStatement viewQuery) + { + ExplainEstimateResult viewEstimate = ExplainQueryStatement(viewQuery, diagnostics, viewNode, simple.TableName); + ColumnDefinition[] outputColumns = ResolveCorrelationQueryOutputSchema(viewQuery); + return new ExplainEstimateResult( + CreateQualifiedSchema(simple.TableName, outputColumns, simple.Alias ?? simple.TableName), + viewEstimate.HasRows, + viewEstimate.Rows); + } + + diagnostics.Add(viewNode, "view", simple.TableName, "unsupported", statsState: "unsupported", detail: "View definition is not a query statement."); + return new ExplainEstimateResult(null, false, 0); + } + + TableSchema baseSchema = GetSchema(simple.TableName); + TableSchema schema = CreateQualifiedSchema(baseSchema.TableName, baseSchema.Columns, simple.Alias ?? simple.TableName); + TableStatistics? tableStats = _catalog.GetTableStatistics(simple.TableName); + bool hasRows = TryGetTableRowCount(simple.TableName, out long rowCount); + string statsState = tableStats == null + ? "missing" + : tableStats.HasStaleColumns ? "stale-columns" : tableStats.RowCountIsExact ? "exact" : "estimated"; + + int tableNode = diagnostics.Add( + parentNode, + "source", + target, + "table", + hasRows ? rowCount : null, + hasRows ? rowCount : null, + tableStats == null ? null : "sys.table_stats", + statsState, + tableStats == null + ? "No persisted table row-count statistics are available." + : $"row_count_is_exact={(tableStats.RowCountIsExact ? 1 : 0)}, has_stale_columns={(tableStats.HasStaleColumns ? 1 : 0)}."); + + List? localPredicates = null; + if (outerWhere != null) + { + if (tableRefIsOnlySimpleSource(outerWhere, simple, schema)) + { + localPredicates = new List(); + CollectAndConjuncts(outerWhere, localPredicates); + } + else if (TryCollectLocalJoinLeafPredicates(outerWhere, simple, schema, out var joinLocalPredicates)) + { + localPredicates = joinLocalPredicates; + } + } + + if (hasRows && localPredicates is { Count: > 0 }) + { + ExplainSimpleTableFilter(simple.TableName, schema, localPredicates, rowCount, diagnostics, tableNode, out long filteredRows); + rowCount = filteredRows; + } + + return new ExplainEstimateResult(schema, hasRows, rowCount); + + static bool tableRefIsOnlySimpleSource(Expression predicate, SimpleTableRef simpleRef, TableSchema tableSchema) + { + var leaves = new[] + { + new ReorderableJoinLeaf( + simpleRef, + tableSchema, + 1, + 0, + simpleRef.Alias ?? simpleRef.TableName, + simpleRef.Alias is { Length: > 0 } ? [simpleRef.Alias, simpleRef.TableName] : [simpleRef.TableName]) + }; + + var conjuncts = new List(); + CollectAndConjuncts(predicate, conjuncts); + return conjuncts.All(conjunct => + !TryResolveReferencedJoinTables(conjunct, leaves, out var referenced) || + referenced.Count <= 1); + } + } + + private ExplainEstimateResult ExplainJoinTableRef( + JoinTableRef join, + Expression? outerWhere, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + int joinNode = diagnostics.Add( + parentNode, + "join", + JoinTypeToSql(join.JoinType), + "join-source", + statsState: "metadata", + detail: join.Condition != null ? ExprToSql(join.Condition) : null); + + ExplainEstimateResult left = ExplainTableRef(join.Left, outerWhere, diagnostics, joinNode); + ExplainEstimateResult right = ExplainTableRef(join.Right, outerWhere, diagnostics, joinNode); + if (left.Schema == null || right.Schema == null) + return new ExplainEstimateResult(null, false, 0); + + TableSchema compositeSchema = TableSchema.CreateJoinSchema(left.Schema, right.Schema); + int[] leftKeyIndices = Array.Empty(); + int[] rightKeyIndices = Array.Empty(); + if (join.Condition != null && + TryAnalyzeHashJoinCondition(join.Condition, compositeSchema, left.Schema.Columns.Count, out leftKeyIndices, out rightKeyIndices, out _)) + { + ExplainJoinPredicateStats(join, left.Schema, right.Schema, leftKeyIndices, rightKeyIndices, left, right, diagnostics, joinNode); + } + + bool hasEstimate = TryEstimateJoinOutputRows( + join, + left.Schema, + right.Schema, + leftKeyIndices, + rightKeyIndices, + left.HasRows, + left.Rows, + right.HasRows, + right.Rows, + out long estimatedRows); + + if (!hasEstimate && left.HasRows && right.HasRows) + { + estimatedRows = CardinalityEstimator.EstimateFallbackJoinRowCount( + join.JoinType, + hasLeftEstimate: true, + left.Rows, + hasRightEstimate: true, + right.Rows); + hasEstimate = true; + } + + diagnostics.Add( + joinNode, + "join-estimate", + JoinTypeToSql(join.JoinType), + hasEstimate ? "output-cardinality" : "not-estimated", + hasEstimate ? estimatedRows : null, + hasEstimate ? estimatedRows : null, + leftKeyIndices.Length > 0 ? "sys.column_stats" : "query-shape", + hasEstimate ? "estimated" : "missing", + leftKeyIndices.Length > 0 + ? $"Equi-join keys={leftKeyIndices.Length}." + : "No equi-join key estimate was available."); + + if (left.HasRows && right.HasRows && join.JoinType == JoinType.Inner) + { + bool buildRight = ShouldBuildHashRightSide(left.Rows, right.Rows); + diagnostics.Add( + joinNode, + "join-decision", + "hash-build-side", + buildRight ? "build-right" : "build-left", + buildRight ? right.Rows : left.Rows, + SafeAdd(left.Rows, right.Rows), + "row-count-estimates", + "estimated", + $"left_rows={left.Rows}, right_rows={right.Rows}."); + } + + return new ExplainEstimateResult(compositeSchema, hasEstimate, estimatedRows); + } + + private void ExplainJoinReorder( + JoinTableRef join, + Expression? outerWhere, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + var leaves = new List(); + var predicates = new List(); + int leafIndex = 0; + int predicateIndex = 0; + if (!TryCollectReorderableInnerJoinChain(join, leaves, predicates, ref leafIndex, ref predicateIndex) || + leaves.Count < 3) + { + return; + } + + ApplyLocalPredicateRowEstimates(leaves, predicates, outerWhere); + string originalOrder = string.Join(" -> ", leaves.OrderBy(static l => l.OriginalIndex).Select(static l => l.Identifier)); + + if (TryChooseBoundedInnerJoinOrder(leaves, predicates, out var boundedLeaves)) + { + string boundedOrder = string.Join(" -> ", boundedLeaves.Select(static l => l.Identifier)); + diagnostics.Add( + parentNode, + "join-reorder", + "inner-chain", + "bounded-dp", + boundedLeaves.Count > 0 ? boundedLeaves[^1].RowCount : null, + boundedLeaves.Sum(static l => l.RowCount), + "sys.table_stats/sys.column_stats", + "estimated", + $"original={originalOrder}; reordered={boundedOrder}; leaves={leaves.Count}."); + return; + } + + if (TryChooseGreedyInnerJoinOrder(leaves, predicates, out var greedyLeaves)) + { + string greedyOrder = string.Join(" -> ", greedyLeaves.Select(static l => l.Identifier)); + diagnostics.Add( + parentNode, + "join-reorder", + "inner-chain", + "greedy", + greedyLeaves.Count > 0 ? greedyLeaves[^1].RowCount : null, + greedyLeaves.Sum(static l => l.RowCount), + "sys.table_stats/sys.column_stats", + "estimated", + $"original={originalOrder}; reordered={greedyOrder}; leaves={leaves.Count}."); + return; + } + + diagnostics.Add( + parentNode, + "join-reorder", + "inner-chain", + "rejected", + statsSource: "query-shape", + statsState: "unsupported", + detail: $"original={originalOrder}; leaves={leaves.Count}."); + } + + private void ExplainSimpleTableFilter( + string tableName, + TableSchema schema, + IReadOnlyList predicates, + long inputRows, + PlannerEstimateDiagnostics diagnostics, + int parentNode, + out long filteredRows) + { + filteredRows = inputRows; + bool estimated = CardinalityEstimator.TryEstimateFilteredRowCount( + _catalog, + schema, + inputRows, + predicates, + out long estimatedRows); + + bool indexEstimated = TryEstimateIndexedLocalPredicateRows( + tableName, + schema, + predicates, + inputRows, + out long indexedRows); + + if (indexEstimated && (!estimated || indexedRows < estimatedRows)) + { + estimated = true; + estimatedRows = indexedRows; + } + + if (estimated) + filteredRows = estimatedRows; + + string target = string.Join(" AND ", predicates.Select(ExprToSql)); + diagnostics.Add( + parentNode, + "filter", + target, + estimated ? "estimated" : "fallback", + estimated ? estimatedRows : null, + estimated ? estimatedRows : null, + estimated + ? indexEstimated && estimatedRows == indexedRows + ? "index-metadata" + : "sys.column_stats/sys.planner_*" + : null, + estimated ? "estimated" : "missing", + estimated + ? $"input_rows={inputRows}, estimated_selectivity={(double)estimatedRows / Math.Max(inputRows, 1):0.######}." + : "No fresh usable statistics matched this filter; normal planning falls back to structural heuristics."); + + ExplainIndexLookupCandidates(tableName, schema, predicates, diagnostics, parentNode); + ExplainPredicateStatistics(tableName, schema, predicates, inputRows, diagnostics, parentNode); + ExplainCompositePrefixFilter(tableName, schema, predicates, inputRows, diagnostics, parentNode); + } + + private void ExplainIndexLookupCandidates( + string tableName, + TableSchema schema, + IReadOnlyList predicates, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + IReadOnlyList indexes = _catalog.GetSqlIndexesForTable(tableName); + if (indexes.Count == 0) + return; + + bool hasIntegerPk = schema.PrimaryKeyColumnIndex >= 0 && + schema.PrimaryKeyColumnIndex < schema.Columns.Count && + schema.Columns[schema.PrimaryKeyColumnIndex].Type == DbType.Integer; + + foreach (Expression predicate in predicates) + { + if (!TryPickLookupCandidate( + tableName, + predicate, + schema, + indexes, + hasIntegerPk, + schema.PrimaryKeyColumnIndex, + out LookupCandidate candidate)) + { + continue; + } + + bool selected = ShouldUseLookupCandidate(candidate); + string target = candidate.IsPrimaryKey + ? $"{tableName}.PRIMARY_KEY" + : candidate.Index?.IndexName ?? tableName; + string statsState = candidate.IsPrimaryKey || candidate.Index?.IsUnique == true + ? "exact" + : candidate.EstimatedRows.HasValue ? "estimated" : "missing"; + string detail = candidate.IsPrimaryKey + ? "Primary-key equality lookup is exact." + : candidate.Index?.IsUnique == true + ? "Unique index equality lookup is exact." + : candidate.EstimatedRows.HasValue && candidate.TableRowCount.HasValue + ? $"estimated_rows={candidate.EstimatedRows.Value}, table_rows={candidate.TableRowCount.Value}, threshold={Math.Max(1, candidate.TableRowCount.Value / 4)}." + : "No fresh lookup statistics were available; normal planning keeps the lookup candidate eligible."; + + diagnostics.Add( + parentNode, + "index-lookup", + target, + selected ? "selected" : "rejected", + candidate.EstimatedRows, + candidate.EstimatedRows, + candidate.EstimatedRows.HasValue ? "sys.column_stats/sys.planner_heavy_hitters" : null, + statsState, + detail); + } + } + + private void ExplainPredicateStatistics( + string tableName, + TableSchema schema, + IReadOnlyList predicates, + long inputRows, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + foreach (Expression predicate in predicates) + { + if (TryExtractIndexEqualityLookupTerm(predicate, schema, out int equalityColumn, out DbValue equalityValue, out _)) + { + ExplainDiscreteValueStats(tableName, schema, equalityColumn, [equalityValue], inputRows, diagnostics, parentNode, "equality"); + continue; + } + + if (TryExtractInList(predicate, schema, out int inColumn, out var values)) + { + ExplainDiscreteValueStats(tableName, schema, inColumn, values, inputRows, diagnostics, parentNode, "in-list"); + continue; + } + + if (TryGetRangePredicateColumn(predicate, schema, out int rangeColumn)) + ExplainRangeStats(tableName, schema, rangeColumn, predicate, inputRows, diagnostics, parentNode); + } + } + + private void ExplainDiscreteValueStats( + string tableName, + TableSchema schema, + int columnIndex, + IReadOnlyList values, + long inputRows, + PlannerEstimateDiagnostics diagnostics, + int parentNode, + string decision) + { + string columnName = schema.Columns[columnIndex].Name; + ColumnStatistics? stats = _catalog.GetColumnStatistics(tableName, columnName); + if (stats == null) + { + diagnostics.Add(parentNode, "estimate-source", $"{tableName}.{columnName}", decision, statsState: "missing", detail: "No ANALYZE column statistics are available."); + return; + } + + if (stats.IsStale) + { + diagnostics.Add( + parentNode, + "estimate-source", + $"{tableName}.{columnName}", + "ignored-stale-stats", + statsSource: "sys.column_stats", + statsState: "stale-ignored", + detail: "Column statistics exist but are stale; normal planning ignores histogram/heavy-hitter data for this column."); + return; + } + + long? estimatedRows = null; + if (values.Count == 1 && + TryEstimateLookupRowCount(tableName, columnName, values[0], out long lookupRows, out _)) + { + estimatedRows = lookupRows; + } + + bool matchedHeavyHitter = false; + long matchedHeavyRows = 0; + if (_catalog.TryGetFreshColumnDistributionStatistics(tableName, columnName, out var distribution)) + { + foreach (FrequentValueStatistics heavy in distribution.FrequentValues) + { + if (values.Any(value => DbValue.Compare(value, heavy.Value) == 0)) + { + matchedHeavyHitter = true; + matchedHeavyRows += heavy.RowCount; + } + } + } + + diagnostics.Add( + parentNode, + "estimate-source", + $"{tableName}.{columnName}", + matchedHeavyHitter ? "heavy-hitter" : "distinct-average", + estimatedRows ?? (matchedHeavyHitter ? matchedHeavyRows : null), + estimatedRows ?? (matchedHeavyHitter ? matchedHeavyRows : null), + matchedHeavyHitter ? "sys.planner_heavy_hitters" : "sys.column_stats", + "fresh", + matchedHeavyHitter + ? $"matched_heavy_rows={matchedHeavyRows}, requested_values={values.Count}, input_rows={inputRows}." + : $"distinct_count={stats.DistinctCount}, non_null_count={stats.NonNullCount}, requested_values={values.Count}."); + } + + private void ExplainRangeStats( + string tableName, + TableSchema schema, + int columnIndex, + Expression predicate, + long inputRows, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + string columnName = schema.Columns[columnIndex].Name; + ColumnStatistics? stats = _catalog.GetColumnStatistics(tableName, columnName); + if (stats == null) + { + diagnostics.Add(parentNode, "estimate-source", $"{tableName}.{columnName}", "range", statsState: "missing", detail: "No ANALYZE column statistics are available."); + return; + } + + if (stats.IsStale) + { + diagnostics.Add( + parentNode, + "estimate-source", + $"{tableName}.{columnName}", + "ignored-stale-stats", + statsSource: "sys.column_stats", + statsState: "stale-ignored", + detail: "Column statistics exist but are stale; normal planning ignores histogram range data for this column."); + return; + } + + bool hasHistogram = _catalog.TryGetFreshColumnDistributionStatistics(tableName, columnName, out var distribution) && + distribution.HistogramBuckets.Count > 0; + long? estimatedRows = null; + if (CardinalityEstimator.TryEstimateFilteredRowCount(_catalog, schema, inputRows, [predicate], out long rangeRows)) + estimatedRows = rangeRows; + + diagnostics.Add( + parentNode, + "estimate-source", + $"{tableName}.{columnName}", + hasHistogram ? "histogram-range" : "min-max-range", + estimatedRows, + estimatedRows, + hasHistogram ? "sys.planner_histograms" : "sys.column_stats", + "fresh", + hasHistogram + ? $"histogram_buckets={distribution.HistogramBuckets.Count}, predicate={ExprToSql(predicate)}." + : $"predicate={ExprToSql(predicate)}."); + } + + private void ExplainCompositePrefixFilter( + string tableName, + TableSchema schema, + IReadOnlyList predicates, + long inputRows, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + Dictionary? discreteCountsByColumn = null; + foreach (Expression predicate in predicates) + { + if (TryExtractIndexEqualityLookupTerm(predicate, schema, out int equalityColumn, out _, out _)) + { + (discreteCountsByColumn ??= new Dictionary())[equalityColumn] = 1; + continue; + } + + if (TryExtractInList(predicate, schema, out int inColumn, out var values)) + (discreteCountsByColumn ??= new Dictionary())[inColumn] = Math.Max(values.Length, 1); + } + + if (discreteCountsByColumn is not { Count: >= 2 }) + return; + + foreach (IndexSchema index in _catalog.GetSqlIndexesForTable(tableName)) + { + if (index.Columns.Count < 2 || + !_catalog.TryGetFreshIndexPrefixStatistics(index.IndexName, out var prefixStats)) + { + TableStatistics? staleTableStats = _catalog.GetTableStatistics(tableName); + if (staleTableStats?.HasStaleColumns == true && index.Columns.Count >= 2) + { + diagnostics.Add( + parentNode, + "estimate-source", + index.IndexName, + "ignored-stale-prefix-stats", + statsSource: "sys.planner_index_prefix_stats", + statsState: "stale-ignored", + detail: "Composite-prefix statistics exist for the table but are stale."); + } + + continue; + } + + int maxPrefixLength = Math.Min(index.Columns.Count, prefixStats.PrefixDistinctCounts.Count); + long combinationCount = 1; + for (int prefixLength = 1; prefixLength <= maxPrefixLength; prefixLength++) + { + int columnIndex = schema.GetColumnIndex(index.Columns[prefixLength - 1]); + if (columnIndex < 0 || !discreteCountsByColumn.TryGetValue(columnIndex, out int discreteCount)) + break; + + combinationCount = SafeMultiply(combinationCount, discreteCount); + if (prefixLength < 2) + continue; + + long distinctCount = prefixStats.PrefixDistinctCounts[prefixLength - 1]; + if (distinctCount <= 0) + continue; + + long estimatedRows = Math.Clamp( + DivideRoundUp(SafeMultiply(inputRows, Math.Max(combinationCount, 1)), distinctCount), + 1, + Math.Max(inputRows, 1)); + + diagnostics.Add( + parentNode, + "estimate-source", + index.IndexName, + "composite-prefix-filter", + estimatedRows, + estimatedRows, + "sys.planner_index_prefix_stats", + "fresh", + $"prefix_columns={string.Join(",", index.Columns.Take(prefixLength))}, distinct_count={distinctCount}, combinations={combinationCount}."); + } + } + } + + private void ExplainJoinPredicateStats( + JoinTableRef join, + TableSchema leftSchema, + TableSchema rightSchema, + ReadOnlySpan leftKeyIndices, + ReadOnlySpan rightKeyIndices, + ExplainEstimateResult left, + ExplainEstimateResult right, + PlannerEstimateDiagnostics diagnostics, + int parentNode) + { + if (leftKeyIndices.Length == 0 || leftKeyIndices.Length != rightKeyIndices.Length) + return; + + IndexPrefixStatistics? leftPrefix = null; + IndexPrefixStatistics? rightPrefix = null; + bool usedCompositePrefix = leftKeyIndices.Length > 1 && + TryFindMatchingPrefixStats(leftSchema, leftKeyIndices, out leftPrefix) && + TryFindMatchingPrefixStats(rightSchema, rightKeyIndices, out rightPrefix); + + if (CardinalityEstimator.TryEstimateEqualityJoinRowCount( + _catalog, + join.JoinType, + leftSchema, + rightSchema, + leftKeyIndices, + rightKeyIndices, + left.HasRows ? left.Rows : 0, + right.HasRows ? right.Rows : 0, + out long estimatedRows)) + { + diagnostics.Add( + parentNode, + "estimate-source", + "join-equality", + usedCompositePrefix ? "composite-prefix-join" : "column-distinct-join", + estimatedRows, + estimatedRows, + usedCompositePrefix ? "sys.planner_index_prefix_stats" : "sys.column_stats", + "fresh", + usedCompositePrefix + ? $"left_index={leftPrefix!.IndexName}, right_index={rightPrefix!.IndexName}, key_count={leftKeyIndices.Length}." + : $"key_count={leftKeyIndices.Length}."); + } + } + + private bool TryFindMatchingPrefixStats( + TableSchema schema, + ReadOnlySpan keyIndices, + out IndexPrefixStatistics prefixStats) + { + prefixStats = null!; + foreach (IndexSchema index in _catalog.GetIndexesForTable(schema.TableName)) + { + if (!_catalog.TryGetFreshIndexPrefixStatistics(index.IndexName, out var stats) || + stats.PrefixColumns.Count < keyIndices.Length) + { + continue; + } + + bool matches = true; + for (int i = 0; i < keyIndices.Length; i++) + { + int keyIndex = keyIndices[i]; + if (keyIndex < 0 || + keyIndex >= schema.Columns.Count || + !string.Equals(stats.PrefixColumns[i], schema.Columns[keyIndex].Name, StringComparison.OrdinalIgnoreCase)) + { + matches = false; + break; + } + } + + if (!matches) + continue; + + prefixStats = stats; + return true; + } + + return false; + } + + private static void ApplyLimitOffsetToEstimate(int? limit, int? offset, ref long rows) + { + if (offset.HasValue) + rows = Math.Max(0, rows - Math.Max(offset.Value, 0)); + if (limit.HasValue) + rows = Math.Min(rows, Math.Max(limit.Value, 0)); + } + + private static bool TryExtractInList( + Expression expression, + TableSchema schema, + out int columnIndex, + out DbValue[] values) + { + columnIndex = -1; + values = Array.Empty(); + if (expression is not InExpression { Negated: false } inExpression) + return false; + + Expression operand = CollationSupport.StripCollation(inExpression.Operand); + if (operand is not ColumnRefExpression column) + return false; + + int resolvedIndex = column.TableAlias != null + ? schema.GetQualifiedColumnIndex(column.TableAlias, column.ColumnName) + : schema.GetColumnIndex(column.ColumnName); + if (resolvedIndex < 0 || resolvedIndex >= schema.Columns.Count) + return false; + + var parsedValues = new List(inExpression.Values.Count); + for (int i = 0; i < inExpression.Values.Count; i++) + { + Expression valueExpression = CollationSupport.StripCollation(inExpression.Values[i]); + if (valueExpression is not LiteralExpression literal || + !TryConvertLiteral(literal, out DbValue value) || + value.IsNull || + value.Type != schema.Columns[resolvedIndex].Type) + { + return false; + } + + parsedValues.Add(value); + } - var schema = new TriggerSchema - { - TriggerName = stmt.TriggerName, - TableName = stmt.TableName, - Timing = stmt.Timing, - Event = stmt.Event, - BodySql = bodySql, - }; + if (parsedValues.Count == 0) + return false; - await _catalog.CreateTriggerAsync(schema, ct); - _triggerBodyCache.Remove(stmt.TriggerName); // invalidate cache - return new QueryResult(0); + columnIndex = resolvedIndex; + values = parsedValues.ToArray(); + return true; } - private async ValueTask ExecuteDropTriggerAsync(DropTriggerStatement stmt, CancellationToken ct) + private static bool TryGetRangePredicateColumn(Expression expression, TableSchema schema, out int columnIndex) { - if (stmt.IfExists && _catalog.GetTrigger(stmt.TriggerName) == null) - return new QueryResult(0); + columnIndex = -1; + if (expression is BetweenExpression { Negated: false } between) + return TryResolveRangeColumn(between.Operand, schema, out columnIndex); - await _catalog.DropTriggerAsync(stmt.TriggerName, ct); - _triggerBodyCache.Remove(stmt.TriggerName); - return new QueryResult(0); + if (expression is not BinaryExpression binary || + binary.Op is not (BinaryOp.LessThan or BinaryOp.LessOrEqual or BinaryOp.GreaterThan or BinaryOp.GreaterOrEqual)) + { + return false; + } + + return TryResolveRangeColumn(binary.Left, schema, out columnIndex) || + TryResolveRangeColumn(binary.Right, schema, out columnIndex); } - private async ValueTask ExecuteAnalyzeAsync(AnalyzeStatement stmt, CancellationToken ct) + private static bool TryResolveRangeColumn(Expression expression, TableSchema schema, out int columnIndex) { - if (!string.IsNullOrWhiteSpace(stmt.TableName)) - { - await AnalyzeTableAsync(stmt.TableName, ct); - return new QueryResult(0); - } + columnIndex = -1; + expression = CollationSupport.StripCollation(expression); + if (expression is not ColumnRefExpression column) + return false; - foreach (string tableName in _catalog.GetTableNames().ToArray()) - await AnalyzeTableAsync(tableName, ct); + int resolvedIndex = column.TableAlias != null + ? schema.GetQualifiedColumnIndex(column.TableAlias, column.ColumnName) + : schema.GetColumnIndex(column.ColumnName); + if (resolvedIndex < 0 || resolvedIndex >= schema.Columns.Count) + return false; - return new QueryResult(0); + DbType type = schema.Columns[resolvedIndex].Type; + if (type is not (DbType.Integer or DbType.Real)) + return false; + + columnIndex = resolvedIndex; + return true; } private async ValueTask AnalyzeTableAsync(string tableName, CancellationToken ct) @@ -2728,6 +4031,7 @@ internal async ValueTask ExecuteInsertAsync( foreach (var valueRow in stmt.ValueRows) { var row = ResolveInsertRow(schema, stmt.ColumnNames, valueRow); + int rowIdReservationCountHint = Math.Max(1, stmt.ValueRows.Count - inserted); InsertRowResult insertRow = await ExecuteResolvedInsertRowAsync( stmt.TableName, schema, @@ -2737,6 +4041,7 @@ internal async ValueTask ExecuteInsertAsync( mutationContext, adjustTableRowCount: false, reusableEncodingBuffer, + rowIdReservationCountHint, ct); if (stmt.ValueRows.Count == 1) generatedIntegerKey = insertRow.GeneratedIntegerIdentity; @@ -2761,8 +4066,8 @@ await FinalizeInsertStatementAsync( private async ValueTask ExecuteCompoundSelectAsync(CompoundSelectStatement stmt, CancellationToken ct) { - await using var leftResult = await ExecuteQueryAsync(stmt.Left, ct); - await using var rightResult = await ExecuteQueryAsync(stmt.Right, ct); + await using var leftResult = await ExecuteQueryAsync(stmt.Left, ct, suppressAdaptiveReoptimization: true); + await using var rightResult = await ExecuteQueryAsync(stmt.Right, ct, suppressAdaptiveReoptimization: true); var outputSchema = MergeCompoundSchemas(leftResult.Schema, rightResult.Schema); var leftRows = await leftResult.ToListAsync(ct); @@ -3442,25 +4747,28 @@ private static CorrelationScope[] CreateCorrelationScopes( return scopes; } - private QueryResult ExecuteSelect(SelectStatement stmt) + private QueryResult ExecuteSelect(SelectStatement stmt, bool suppressAdaptiveReoptimization = false) { if (_cteData != null) - return ExecuteSelectGeneral(stmt); + return ExecuteSelectGeneral(stmt, suppressAdaptiveReoptimization); if (_selectPlanCache.TryGetValue(stmt, out var cachedPlan)) { _selectPlanCacheHitCount++; - return ExecuteSelectWithCachedPlan(stmt, cachedPlan); + return ExecuteSelectWithCachedPlan(stmt, cachedPlan, suppressAdaptiveReoptimization); } _selectPlanCacheMissCount++; - var result = ClassifyAndExecuteSelect(stmt, out var selectedPlan); + var result = ClassifyAndExecuteSelect(stmt, out var selectedPlan, suppressAdaptiveReoptimization); CacheSelectPlan(stmt, selectedPlan); return result; } - private QueryResult ExecuteSelectWithCachedPlan(SelectStatement stmt, SelectPlanKind cachedPlan) + private QueryResult ExecuteSelectWithCachedPlan( + SelectStatement stmt, + SelectPlanKind cachedPlan, + bool suppressAdaptiveReoptimization) { switch (cachedPlan) { @@ -3502,7 +4810,7 @@ private QueryResult ExecuteSelectWithCachedPlan(SelectStatement stmt, SelectPlan return constantGroupAggResult; break; case SelectPlanKind.General: - return ExecuteSelectGeneral(stmt); + return ExecuteSelectGeneral(stmt, suppressAdaptiveReoptimization); default: throw new InvalidOperationException($"Unknown select plan kind: {cachedPlan}"); } @@ -3510,12 +4818,15 @@ private QueryResult ExecuteSelectWithCachedPlan(SelectStatement stmt, SelectPlan // Plan assumptions no longer hold (typically after cache invalidation edge cases). // Reclassify and refresh the cache entry. _selectPlanCacheReclassificationCount++; - var result = ClassifyAndExecuteSelect(stmt, out var updatedPlan); + var result = ClassifyAndExecuteSelect(stmt, out var updatedPlan, suppressAdaptiveReoptimization); CacheSelectPlan(stmt, updatedPlan); return result; } - private QueryResult ClassifyAndExecuteSelect(SelectStatement stmt, out SelectPlanKind selectedPlan) + private QueryResult ClassifyAndExecuteSelect( + SelectStatement stmt, + out SelectPlanKind selectedPlan, + bool suppressAdaptiveReoptimization) { if (!stmt.IsDistinct) { @@ -3572,9 +4883,54 @@ private QueryResult ClassifyAndExecuteSelect(SelectStatement stmt, out SelectPla } selectedPlan = SelectPlanKind.General; - return ExecuteSelectGeneral(stmt); + return ExecuteSelectGeneral(stmt, suppressAdaptiveReoptimization); + } + + private AdaptiveQueryExecutionLease? TryCreateAdaptiveQueryExecutionLease(SelectStatement stmt) + { + var options = AdaptiveQueryReoptimization; + if (!options.Enabled || options.MaxReoptimizationsPerQuery <= 0) + return null; + + if (_cteData != null || + stmt.From is not JoinTableRef join || + stmt.Columns.Any(static c => c.IsStar) || + ContainsSubqueries(stmt) || + !IsAdaptiveJoinCandidate(join)) + { + return null; + } + + RecordAdaptiveEligibleQuery(); + return new AdaptiveQueryExecutionLease(options); + } + + private static bool IsAdaptiveJoinCandidate(TableRef tableRef) + { + if (tableRef is not JoinTableRef join) + return true; + + if (join.JoinType is not (JoinType.Inner or JoinType.LeftOuter) || + join.Condition == null || + !ContainsEqualityPredicate(join.Condition)) + { + return false; + } + + return IsAdaptiveJoinCandidate(join.Left) && + IsAdaptiveJoinCandidate(join.Right); } + private static bool ContainsEqualityPredicate(Expression expression) + => expression switch + { + BinaryExpression { Op: BinaryOp.Equals } => true, + BinaryExpression { Op: BinaryOp.And } binary => + ContainsEqualityPredicate(binary.Left) || + ContainsEqualityPredicate(binary.Right), + _ => false, + }; + private void CacheSelectPlan(SelectStatement stmt, SelectPlanKind kind) { _selectPlanCacheStoreCount++; @@ -3597,8 +4953,25 @@ private void CacheSelectPlan(SelectStatement stmt, SelectPlanKind kind) } } - private TableRef GetOrCreateCachedJoinReorder(SelectStatement stmt) + private TableRef GetOrCreateCachedJoinReorder(SelectStatement stmt, bool preserveJoinOrderForRowGoal) { + // Reordered joins change the physical composite row shape. Keep SELECT * + // projections in declared FROM order unless a later projection restores + // the public column order. + if (stmt.Columns.Any(static c => c.IsStar)) + return stmt.From; + + if (preserveJoinOrderForRowGoal) + { + if (stmt.From is JoinTableRef join && + TryReorderInnerJoinChainForRowGoal(join, stmt.Where, out var reordered)) + { + return reordered; + } + + return stmt.From; + } + if (!_selectJoinReorderCache.TryGetValue(stmt, out var cached)) { cached = new CachedSelectJoinReorder(); @@ -3618,9 +4991,13 @@ stmt.From is JoinTableRef join && return cached.ReorderedFrom ?? stmt.From; } - private QueryResult ExecuteSelectGeneral(SelectStatement stmt) + private QueryResult ExecuteSelectGeneral(SelectStatement stmt, bool suppressAdaptiveReoptimization = false) { - TableRef fromRef = GetOrCreateCachedJoinReorder(stmt); + AdaptiveQueryExecutionLease? adaptiveLease = suppressAdaptiveReoptimization + ? null + : TryCreateAdaptiveQueryExecutionLease(stmt); + bool preserveJoinOrderForRowGoal = ShouldPreserveJoinOrderForRowGoal(stmt); + TableRef fromRef = GetOrCreateCachedJoinReorder(stmt, preserveJoinOrderForRowGoal); HashSet? consumedOuterPredicates = fromRef is JoinTableRef ? new HashSet(ReferenceEqualityComparer.Instance) : null; @@ -3631,7 +5008,9 @@ private QueryResult ExecuteSelectGeneral(SelectStatement stmt) stmt.Where, pushDownOuterLocalPredicates: fromRef is JoinTableRef, allowJoinReorder: false, - consumedOuterPredicates: consumedOuterPredicates); + consumedOuterPredicates: consumedOuterPredicates, + preserveJoinOrderForRowGoal: preserveJoinOrderForRowGoal, + adaptiveLease: adaptiveLease); bool hasAggregates = stmt.GroupBy != null || stmt.Having != null || stmt.Columns.Any(c => c.Expression != null && ContainsAggregate(c.Expression)); @@ -3852,6 +5231,25 @@ op is IndexScanOperator hashedLookup && return CreateQueryResult(op); } + private static bool ShouldPreserveJoinOrderForRowGoal(SelectStatement stmt) + { + if (!stmt.Limit.HasValue || stmt.Limit.Value <= 0) + return false; + + long rowGoal = (long)stmt.Limit.Value + Math.Max(stmt.Offset ?? 0, 0); + if (rowGoal > MaxJoinOrderRowGoalRows || + stmt.Where != null || + stmt.IsDistinct || + stmt.GroupBy != null || + stmt.Having != null || + stmt.OrderBy is { Count: > 0 }) + { + return false; + } + + return !stmt.Columns.Any(c => c.Expression != null && ContainsAggregate(c.Expression)); + } + private static IOperator ApplyOffsetAndLimit(IOperator op, int? offset, int? limit) { if (offset.HasValue) @@ -4471,6 +5869,7 @@ internal async ValueTask ExecuteSimpleInsertAsync( for (int i = 0; i < insert.RowCount; i++) { DbValue[] row = ResolveSimpleInsertRow(schema, explicitColumnIndices, insert.ValueRows[i]); + int rowIdReservationCountHint = Math.Max(1, insert.RowCount - inserted); InsertRowResult insertRow = await ExecuteResolvedInsertRowAsync( insert.TableName, @@ -4481,6 +5880,7 @@ internal async ValueTask ExecuteSimpleInsertAsync( mutationContext, adjustTableRowCount: false, reusableEncodingBuffer, + rowIdReservationCountHint, ct); if (insert.RowCount == 1) generatedIntegerKey = insertRow.GeneratedIntegerIdentity; @@ -4546,6 +5946,7 @@ private async ValueTask ExecuteBareSimpleInsertAsync( insertTraversalPath, insertTraversalSet, reusableEncodingBuffer, + rowIdReservationCountHint: Math.Max(1, insert.RowCount - inserted), ct); if (insert.RowCount == 1) generatedIntegerKey = insertRow.GeneratedIntegerIdentity; @@ -4599,9 +6000,16 @@ private async ValueTask ExecuteBareInsertRowAsync( List? traversalPath, HashSet? traversalSet, ReusableInsertEncodingBuffer? reusableEncodingBuffer, + int rowIdReservationCountHint, CancellationToken ct) { - var (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync(tableName, schema, tree, row, ct); + var (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync( + tableName, + schema, + tree, + row, + rowIdReservationCountHint, + ct); long? generatedIntegerIdentity = autoGeneratedRowId && schema.PrimaryKeyColumnIndex >= 0 && schema.Columns[schema.PrimaryKeyColumnIndex].IsIdentity && @@ -4623,7 +6031,13 @@ private async ValueTask ExecuteBareInsertRowAsync( catch (CSharpDbException ex) when (autoGeneratedRowId && ex.Code == ErrorCode.DuplicateKey) { InvalidateRowIdCache(tableName); - (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync(tableName, schema, tree, row, ct); + (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync( + tableName, + schema, + tree, + row, + rowIdReservationCountHint, + ct); generatedIntegerIdentity = autoGeneratedRowId && schema.PrimaryKeyColumnIndex >= 0 && schema.Columns[schema.PrimaryKeyColumnIndex].IsIdentity && @@ -4951,6 +6365,9 @@ private bool TryBuildSimpleSystemCatalogCountStarQuery(SelectStatement stmt, out "sys.objects" => CountSystemObjects(), "sys.table_stats" => _catalog.GetTableStatistics().Count, "sys.column_stats" => _catalog.GetColumnStatistics().Count, + "sys.planner_histograms" => CountPlannerHistogramRows(), + "sys.planner_heavy_hitters" => CountPlannerHeavyHitterRows(), + "sys.planner_index_prefix_stats" => CountPlannerIndexPrefixRows(), _ => 0, }; @@ -6346,7 +7763,9 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca Expression? outerWhere = null, bool pushDownOuterLocalPredicates = false, bool allowJoinReorder = true, - HashSet? consumedOuterPredicates = null) + HashSet? consumedOuterPredicates = null, + bool preserveJoinOrderForRowGoal = false, + AdaptiveQueryExecutionLease? adaptiveLease = null) { if (tableRef is SingleRowTableRef) { @@ -6395,10 +7814,24 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca if (viewQuery is SelectStatement viewStmt && !ContainsSubqueries(viewStmt)) { Expression? pushedOuterViewPredicate = TryRewriteOuterPredicateForSimpleView(simple, viewStmt, outerWhere); + bool preserveViewJoinOrderForRowGoal = + preserveJoinOrderForRowGoal && + CanPropagateJoinOrderRowGoalIntoSimpleView(viewStmt); + TableRef viewFrom = viewStmt.From; + if (preserveViewJoinOrderForRowGoal && + viewFrom is JoinTableRef viewJoin && + TryReorderInnerJoinChainForRowGoal(viewJoin, pushedOuterViewPredicate, out var rowGoalViewFrom)) + { + viewFrom = rowGoalViewFrom; + } + (viewOp, viewSchema) = BuildFromOperator( - viewStmt.From, + viewFrom, pushedOuterViewPredicate, - pushDownOuterLocalPredicates: pushedOuterViewPredicate != null); + pushDownOuterLocalPredicates: pushedOuterViewPredicate != null, + allowJoinReorder: !viewStmt.Columns.Any(static c => c.IsStar), + preserveJoinOrderForRowGoal: preserveViewJoinOrderForRowGoal, + adaptiveLease: adaptiveLease); bool hasAggregates = viewStmt.GroupBy != null || viewStmt.Having != null || @@ -6558,13 +7991,35 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca if (tableRef is JoinTableRef join) { if (allowJoinReorder && + !preserveJoinOrderForRowGoal && TryReorderInnerJoinChain(join, outerWhere, out var reordered)) { - return BuildFromOperator(reordered, outerWhere, pushDownOuterLocalPredicates, allowJoinReorder: false, consumedOuterPredicates); + return BuildFromOperator( + reordered, + outerWhere, + pushDownOuterLocalPredicates, + allowJoinReorder: false, + consumedOuterPredicates, + preserveJoinOrderForRowGoal, + adaptiveLease); } - var (leftOp, leftSchema) = BuildFromOperator(join.Left, outerWhere, pushDownOuterLocalPredicates, allowJoinReorder, consumedOuterPredicates); - var (rightOp, rightSchema) = BuildFromOperator(join.Right, outerWhere, pushDownOuterLocalPredicates, allowJoinReorder, consumedOuterPredicates); + var (leftOp, leftSchema) = BuildFromOperator( + join.Left, + outerWhere, + pushDownOuterLocalPredicates, + allowJoinReorder, + consumedOuterPredicates, + preserveJoinOrderForRowGoal, + adaptiveLease); + var (rightOp, rightSchema) = BuildFromOperator( + join.Right, + outerWhere, + pushDownOuterLocalPredicates, + allowJoinReorder, + consumedOuterPredicates, + preserveJoinOrderForRowGoal, + adaptiveLease); // Build composite schema that inherits all qualified mappings var compositeSchema = TableSchema.CreateJoinSchema(leftSchema, rightSchema); @@ -6590,6 +8045,8 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca rightSchema, leftSchema, swappedCompositeSchema, + outerWhere, + adaptiveLease: null, out var swappedIndexNestedJoinOp)) { swappedJoinOp = swappedIndexNestedJoinOp!; @@ -6601,6 +8058,7 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca rightSchema, leftSchema, swappedCompositeSchema, + adaptiveLease: null, out var swappedHashJoinOp)) { swappedJoinOp = swappedHashJoinOp!; @@ -6643,6 +8101,8 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca leftSchema, rightSchema, compositeSchema, + outerWhere, + adaptiveLease, out var indexNestedJoinOp)) { return (indexNestedJoinOp!, compositeSchema); @@ -6655,6 +8115,7 @@ private async ValueTask ExecuteUpdateAsync(UpdateStatement stmt, Ca leftSchema, rightSchema, compositeSchema, + adaptiveLease, out var hashJoinOp)) { return (hashJoinOp!, compositeSchema); @@ -6719,6 +8180,25 @@ private static bool CanPushOuterPredicateIntoSimpleView(SelectStatement viewStmt return !viewStmt.Columns.Any(c => c.Expression != null && ContainsAggregate(c.Expression)); } + private static bool CanPropagateJoinOrderRowGoalIntoSimpleView(SelectStatement viewStmt) + { + if (viewStmt.IsDistinct || + viewStmt.Where != null || + viewStmt.GroupBy != null || + viewStmt.Having != null || + viewStmt.OrderBy != null || + viewStmt.Limit != null || + viewStmt.Offset != null) + { + return false; + } + + if (viewStmt.Columns.Any(static c => c.IsStar)) + return false; + + return !viewStmt.Columns.Any(c => c.Expression != null && ContainsAggregate(c.Expression)); + } + private static bool TryBuildSimpleViewOutputExpressionMap( SelectStatement viewStmt, TableSchema sourceSchema, @@ -7030,6 +8510,43 @@ private bool TryReorderInnerJoinChain(JoinTableRef join, Expression? outerWhere, if (originalOrder.SequenceEqual(reorderedOrder, StringComparer.OrdinalIgnoreCase)) return false; + return TryBuildInnerJoinTreeFromOrder(orderedLeaves, predicates, out reordered); + } + + private bool TryReorderInnerJoinChainForRowGoal(JoinTableRef join, Expression? outerWhere, out TableRef reordered) + { + reordered = join; + + var leaves = new List(); + var predicates = new List(); + int leafIndex = 0; + int predicateIndex = 0; + if (!TryCollectReorderableInnerJoinChain(join, leaves, predicates, ref leafIndex, ref predicateIndex)) + return false; + + if (leaves.Count < 2) + return false; + + ApplyLocalPredicateRowEstimates(leaves, predicates, outerWhere); + + var originalOrder = leaves.OrderBy(static l => l.OriginalIndex).Select(static l => l.Identifier).ToArray(); + if (!TryChooseStreamingLookupInnerJoinOrder(leaves, predicates, out var orderedLeaves)) + return false; + + var reorderedOrder = orderedLeaves.Select(static l => l.Identifier).ToArray(); + if (originalOrder.SequenceEqual(reorderedOrder, StringComparer.OrdinalIgnoreCase)) + return false; + + return TryBuildInnerJoinTreeFromOrder(orderedLeaves, predicates, out reordered); + } + + private static bool TryBuildInnerJoinTreeFromOrder( + IReadOnlyList orderedLeaves, + IReadOnlyList predicates, + out TableRef reordered) + { + reordered = orderedLeaves[0].TableRef; + var orderedPredicates = predicates .OrderBy(static p => p.OriginalIndex) .ToList(); @@ -7157,6 +8674,20 @@ private static void TrackConsumedOuterPredicates( return CombineConjuncts(remainingConjuncts); } + private static Expression? AddResidualConjuncts( + Expression? residualCondition, + IReadOnlyList conjuncts) + { + if (conjuncts.Count == 0) + return residualCondition; + + var residualTerms = new List(conjuncts.Count + (residualCondition == null ? 0 : 1)); + if (residualCondition != null) + residualTerms.Add(residualCondition); + residualTerms.AddRange(conjuncts); + return CombineConjuncts(residualTerms); + } + private Expression? RemoveRedundantInnerJoinLocalLeafPredicates( JoinTableRef join, Expression? outerWhere) @@ -7221,12 +8752,27 @@ private void ApplyLocalPredicateRowEstimates( if (localPredicates.Count == 0) continue; - if (!CardinalityEstimator.TryEstimateFilteredRowCount( + bool estimated = CardinalityEstimator.TryEstimateFilteredRowCount( _catalog, leaf.Schema, leaf.RowCount, localPredicates, - out long estimatedRows)) + out long estimatedRows); + + bool indexEstimated = TryEstimateIndexedLocalPredicateRows( + leaf.TableRef.TableName, + leaf.Schema, + localPredicates, + leaf.RowCount, + out long indexedRows); + + if (indexEstimated && (!estimated || indexedRows < estimatedRows)) + { + estimated = true; + estimatedRows = indexedRows; + } + + if (!estimated) { continue; } @@ -7235,6 +8781,50 @@ private void ApplyLocalPredicateRowEstimates( } } + private bool TryEstimateIndexedLocalPredicateRows( + string tableName, + TableSchema schema, + IReadOnlyList predicates, + long tableRowCount, + out long estimatedRows) + { + estimatedRows = 0; + if (tableRowCount <= 0 || predicates.Count == 0) + return false; + + IReadOnlyList indexes = _catalog.GetSqlIndexesForTable(tableName); + bool hasIntegerPk = schema.PrimaryKeyColumnIndex >= 0 && + schema.PrimaryKeyColumnIndex < schema.Columns.Count && + schema.Columns[schema.PrimaryKeyColumnIndex].Type == DbType.Integer; + + long? bestRows = null; + for (int i = 0; i < predicates.Count; i++) + { + if (!TryPickLookupCandidate( + tableName, + predicates[i], + schema, + indexes, + hasIntegerPk, + schema.PrimaryKeyColumnIndex, + out var candidate) || + !ShouldUseLookupCandidate(candidate) || + !candidate.EstimatedRows.HasValue) + { + continue; + } + + long candidateRows = Math.Clamp(candidate.EstimatedRows.Value, 1, tableRowCount); + bestRows = bestRows.HasValue ? Math.Min(bestRows.Value, candidateRows) : candidateRows; + } + + if (!bestRows.HasValue) + return false; + + estimatedRows = bestRows.Value; + return true; + } + private bool TryCollectReorderableInnerJoinChain( TableRef tableRef, List leaves, @@ -7282,44 +8872,231 @@ private bool TryCollectReorderableInnerJoinChain( }); } - return true; + return true; + } + + private bool TryCreateReorderableJoinLeaf(SimpleTableRef simple, int originalIndex, out ReorderableJoinLeaf leaf) + { + leaf = default; + + if (_cteData != null && _cteData.ContainsKey(simple.TableName)) + return false; + + if (IsSystemCatalogTable(simple.TableName) || _catalog.GetViewSql(simple.TableName) != null) + return false; + + var schema = GetSchema(simple.TableName); + if (!TryEstimateTableRefRowCount(simple, out long rowCount) || rowCount <= 0) + return false; + + string identifier = simple.Alias ?? simple.TableName; + string[] referenceNames = simple.Alias is { Length: > 0 } + ? [simple.Alias, simple.TableName] + : [simple.TableName]; + + var qualifiedMappings = new Dictionary(StringComparer.OrdinalIgnoreCase); + foreach (string referenceName in referenceNames) + { + for (int i = 0; i < schema.Columns.Count; i++) + qualifiedMappings[$"{referenceName}.{schema.Columns[i].Name}"] = i; + } + + var qualifiedSchema = new TableSchema + { + TableName = schema.TableName, + Columns = schema.Columns, + QualifiedMappings = qualifiedMappings, + }; + + leaf = new ReorderableJoinLeaf(simple, qualifiedSchema, rowCount, originalIndex, identifier, referenceNames); + return true; + } + + private bool TryChooseStreamingLookupInnerJoinOrder( + IReadOnlyList leaves, + IReadOnlyList predicates, + out List orderedLeaves) + { + orderedLeaves = []; + + List? bestOrder = null; + int bestLookupScore = int.MinValue; + int bestStartOriginalIndex = int.MaxValue; + + foreach (var start in leaves.OrderBy(static l => l.OriginalIndex)) + { + var candidateOrder = new List { start }; + var selectedIds = new HashSet(StringComparer.OrdinalIgnoreCase) { start.Identifier }; + var remaining = leaves + .Where(leaf => !string.Equals(leaf.Identifier, start.Identifier, StringComparison.OrdinalIgnoreCase)) + .OrderBy(static leaf => leaf.OriginalIndex) + .ToList(); + + int lookupScore = 0; + bool completeLookupChain = true; + + while (remaining.Count > 0) + { + if (!TryFindNextStreamingLookupLeaf( + candidateOrder, + selectedIds, + remaining, + predicates, + out var next, + out int stepScore)) + { + completeLookupChain = false; + break; + } + + candidateOrder.Add(next); + selectedIds.Add(next.Identifier); + remaining.Remove(next); + lookupScore += stepScore; + } + + if (!completeLookupChain) + continue; + + if (bestOrder == null || + lookupScore > bestLookupScore || + (lookupScore == bestLookupScore && start.OriginalIndex < bestStartOriginalIndex)) + { + bestOrder = candidateOrder; + bestLookupScore = lookupScore; + bestStartOriginalIndex = start.OriginalIndex; + } + } + + if (bestOrder == null) + return false; + + orderedLeaves = bestOrder; + return true; + } + + private bool TryFindNextStreamingLookupLeaf( + IReadOnlyList selectedLeaves, + HashSet selectedIds, + IReadOnlyList remainingLeaves, + IReadOnlyList predicates, + out ReorderableJoinLeaf next, + out int score) + { + next = default; + score = 0; + bool found = false; + int bestOriginalIndex = int.MaxValue; + + for (int i = 0; i < remainingLeaves.Count; i++) + { + var candidate = remainingLeaves[i]; + if (!TryScoreStreamingLookupLeaf(selectedLeaves, selectedIds, candidate, predicates, out int candidateScore)) + continue; + + if (!found || + candidateScore > score || + (candidateScore == score && candidate.OriginalIndex < bestOriginalIndex)) + { + next = candidate; + score = candidateScore; + bestOriginalIndex = candidate.OriginalIndex; + found = true; + } + } + + return found; } - private bool TryCreateReorderableJoinLeaf(SimpleTableRef simple, int originalIndex, out ReorderableJoinLeaf leaf) + private bool TryScoreStreamingLookupLeaf( + IReadOnlyList selectedLeaves, + HashSet selectedIds, + ReorderableJoinLeaf candidate, + IReadOnlyList predicates, + out int score) { - leaf = default; + score = 0; + var nextSelectedIds = new HashSet(selectedIds, StringComparer.OrdinalIgnoreCase) + { + candidate.Identifier + }; - if (_cteData != null && _cteData.ContainsKey(simple.TableName)) + var attachablePredicates = predicates + .Where(p => + p.ReferencedTables.Contains(candidate.Identifier) && + ShouldAttachInnerJoinPredicate(p, selectedIds, nextSelectedIds)) + .OrderBy(static p => p.OriginalIndex) + .ToList(); + + if (attachablePredicates.Count == 0) return false; - if (IsSystemCatalogTable(simple.TableName) || _catalog.GetViewSql(simple.TableName) != null) + var condition = CombineConjuncts(attachablePredicates.Select(static p => p.Expression).ToList()); + if (condition == null) return false; - var schema = GetSchema(simple.TableName); - if (!TryEstimateTableRefRowCount(simple, out long rowCount) || rowCount <= 0) + var selectedSchema = BuildJoinSchemaForLeaves(selectedLeaves); + var compositeSchema = TableSchema.CreateJoinSchema(selectedSchema, candidate.Schema); + if (!TryAnalyzeHashJoinCondition( + condition, + compositeSchema, + selectedSchema.Columns.Count, + out var leftKeyIndices, + out var rightKeyIndices, + out _)) + { return false; + } - string identifier = simple.Alias ?? simple.TableName; - string[] referenceNames = simple.Alias is { Length: > 0 } - ? [simple.Alias, simple.TableName] - : [simple.TableName]; + if (leftKeyIndices.Length == 0 || leftKeyIndices.Length != rightKeyIndices.Length) + return false; - var qualifiedMappings = new Dictionary(StringComparer.OrdinalIgnoreCase); - foreach (string referenceName in referenceNames) + int rightPkIndex = candidate.Schema.PrimaryKeyColumnIndex; + if (rightKeyIndices.Length == 1 && + rightPkIndex == rightKeyIndices[0] && + candidate.Schema.Columns[rightPkIndex].Type == DbType.Integer) { - for (int i = 0; i < schema.Columns.Count; i++) - qualifiedMappings[$"{referenceName}.{schema.Columns[i].Name}"] = i; + score = 100; + return true; } - var qualifiedSchema = new TableSchema + var indexes = _catalog.GetSqlIndexesForTable(candidate.TableRef.TableName); + for (int i = 0; i < indexes.Count; i++) { - TableName = schema.TableName, - Columns = schema.Columns, - QualifiedMappings = qualifiedMappings, - }; + var index = indexes[i]; + if (!TryMatchJoinLookupIndex( + index, + selectedSchema, + candidate.Schema, + leftKeyIndices, + rightKeyIndices, + out _, + out var orderedRightKeyIndices, + out _)) + { + continue; + } - leaf = new ReorderableJoinLeaf(simple, qualifiedSchema, rowCount, originalIndex, identifier, referenceNames); - return true; + bool directIntegerKey = + orderedRightKeyIndices.Length == 1 && + candidate.Schema.Columns[orderedRightKeyIndices[0]].Type == DbType.Integer; + int indexScore = index.IsUnique ? 90 : 70; + if (directIntegerKey) + indexScore += 5; + + score = Math.Max(score, indexScore); + } + + return score > 0; + } + + private static TableSchema BuildJoinSchemaForLeaves(IReadOnlyList leaves) + { + var schema = leaves[0].Schema; + for (int i = 1; i < leaves.Count; i++) + schema = TableSchema.CreateJoinSchema(schema, leaves[i].Schema); + + return schema; } private bool TryChooseGreedyInnerJoinOrder( @@ -7771,6 +9548,8 @@ private bool TryBuildIndexNestedLoopJoinOperator( TableSchema leftSchema, TableSchema rightSchema, TableSchema compositeSchema, + Expression? outerWhere, + AdaptiveQueryExecutionLease? adaptiveLease, out IOperator? indexNestedJoinOp) { indexNestedJoinOp = null; @@ -7803,6 +9582,12 @@ private bool TryBuildIndexNestedLoopJoinOperator( return false; } + if (join.JoinType == JoinType.Inner && + TryCollectLocalJoinLeafPredicates(outerWhere, rightSimple, rightSchema, out var rightLocalConjuncts)) + { + residualCondition = AddResidualConjuncts(residualCondition, rightLocalConjuncts); + } + if (leftKeyIndices.Length == 0 || leftKeyIndices.Length != rightKeyIndices.Length) return false; @@ -7939,10 +9724,11 @@ private bool TryBuildIndexNestedLoopJoinOperator( if (orderedLeftKeyIndices == null || orderedRightKeyIndices == null) return false; + Func createLookupJoin; if (usesDirectIntegerLookup) { - indexNestedJoinOp = new IndexNestedLoopJoinOperator( - leftOp, + createLookupJoin = outer => new IndexNestedLoopJoinOperator( + outer, rightTableTree, rightIndexStore, join.JoinType, @@ -7957,8 +9743,8 @@ private bool TryBuildIndexNestedLoopJoinOperator( } else { - indexNestedJoinOp = new HashedIndexNestedLoopJoinOperator( - leftOp, + createLookupJoin = outer => new HashedIndexNestedLoopJoinOperator( + outer, rightTableTree, rightIndexStore!, join.JoinType, @@ -7977,6 +9763,39 @@ private bool TryBuildIndexNestedLoopJoinOperator( functions: _functions); } + if (adaptiveLease != null && hasOuterEstimate && hasInnerEstimate) + { + Func createHashJoin = outer => new HashJoinOperator( + outer, + rightOp, + join.JoinType, + residualCondition, + compositeSchema, + leftSchema.Columns.Count, + rightSchema.Columns.Count, + orderedLeftKeyIndices, + orderedRightKeyIndices, + buildRightSide: true, + buildRowCapacityHint: ToCapacityHint(innerRows), + estimatedOutputRowCount, + _functions); + + indexNestedJoinOp = new AdaptiveIndexNestedLoopJoinOperator( + leftOp, + rightOp, + compositeSchema.Columns as ColumnDefinition[] ?? compositeSchema.Columns.ToArray(), + createLookupJoin, + createHashJoin, + adaptiveLease, + _adaptiveRuntimeDiagnostics, + outerRows, + estimatedOutputRowCount); + } + else + { + indexNestedJoinOp = createLookupJoin(leftOp); + } + return true; } @@ -8179,6 +9998,7 @@ private bool TryBuildHashJoinOperator( TableSchema leftSchema, TableSchema rightSchema, TableSchema compositeSchema, + AdaptiveQueryExecutionLease? adaptiveLease, out IOperator? hashJoinOp) { hashJoinOp = null; @@ -8229,20 +10049,46 @@ private bool TryBuildHashJoinOperator( hasRightEstimate, rightRows); - hashJoinOp = new HashJoinOperator( - leftOp, - rightOp, - join.JoinType, - residualCondition, - compositeSchema, - leftSchema.Columns.Count, - rightSchema.Columns.Count, - leftKeyIndices, - rightKeyIndices, - buildRightSide, - buildRowCapacityHint, - estimatedOutputRowCount, - _functions); + if (adaptiveLease != null && + join.JoinType == JoinType.Inner && + hasLeftEstimate && + hasRightEstimate) + { + hashJoinOp = new AdaptiveHashJoinOperator( + leftOp, + rightOp, + join.JoinType, + residualCondition, + compositeSchema, + leftSchema.Columns.Count, + rightSchema.Columns.Count, + leftKeyIndices, + rightKeyIndices, + buildRightSide, + leftRows, + rightRows, + estimatedOutputRowCount, + _functions, + adaptiveLease, + _adaptiveRuntimeDiagnostics); + } + else + { + hashJoinOp = new HashJoinOperator( + leftOp, + rightOp, + join.JoinType, + residualCondition, + compositeSchema, + leftSchema.Columns.Count, + rightSchema.Columns.Count, + leftKeyIndices, + rightKeyIndices, + buildRightSide, + buildRowCapacityHint, + estimatedOutputRowCount, + _functions); + } return true; } @@ -8667,7 +10513,8 @@ private bool TryBuildIndexScanPlan( Index: compositeIndex, LookupValue: compositeLookupKey, KeyColumnIndices: compositeColumnIndices, - KeyComponents: compositeKeyComponents); + KeyComponents: compositeKeyComponents, + EstimatedRows: compositeIndex!.IsUnique ? 1 : null); return true; } @@ -8678,7 +10525,10 @@ private bool TryBuildIndexScanPlan( selectedCandidate.Index, selectedCandidate.LookupValue, selectedCandidate.KeyColumnIndices, - selectedCandidate.KeyComponents); + selectedCandidate.KeyComponents, + selectedCandidate.EstimatedRows.HasValue + ? ToCapacityHint(selectedCandidate.EstimatedRows.Value) + : null); if (selectedCandidate.RequiresResidualPredicate) { @@ -8721,7 +10571,8 @@ private readonly record struct LookupPlan( IndexSchema? Index, long LookupValue, int[]? KeyColumnIndices, - DbValue[]? KeyComponents); + DbValue[]? KeyComponents, + int? EstimatedRows); private bool TryPickLookupCandidate( string tableName, @@ -9036,7 +10887,8 @@ private IOperator BuildLookupOperator( lookupPlan.Index, lookupPlan.LookupValue, lookupPlan.KeyColumnIndices, - lookupPlan.KeyComponents); + lookupPlan.KeyComponents, + lookupPlan.EstimatedRows); } private IOperator BuildLookupOperator( @@ -9046,7 +10898,8 @@ private IOperator BuildLookupOperator( IndexSchema? index, long lookupValue, int[]? expectedKeyColumnIndices = null, - DbValue[]? expectedKeyComponents = null) + DbValue[]? expectedKeyComponents = null, + int? estimatedRows = null) { var tableTree = _catalog.GetTableTree(tableName, _pager); if (isPrimaryKey) @@ -9068,7 +10921,8 @@ private IOperator BuildLookupOperator( expectedKeyColumnIndices, expectedKeyComponents, expectedKeyCollations, - usesOrderedTextPayload); + usesOrderedTextPayload, + estimatedRows); } private static bool UsesDirectIntegerIndexKey(IndexSchema index, TableSchema schema) @@ -13136,6 +14990,27 @@ private static bool TryNormalizeSystemCatalogTableName(string tableName, out str return true; } + if (string.Equals(tableName, "sys.planner_histograms", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_histograms", StringComparison.OrdinalIgnoreCase)) + { + normalized = "sys.planner_histograms"; + return true; + } + + if (string.Equals(tableName, "sys.planner_heavy_hitters", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_heavy_hitters", StringComparison.OrdinalIgnoreCase)) + { + normalized = "sys.planner_heavy_hitters"; + return true; + } + + if (string.Equals(tableName, "sys.planner_index_prefix_stats", StringComparison.OrdinalIgnoreCase) || + string.Equals(tableName, "sys_planner_index_prefix_stats", StringComparison.OrdinalIgnoreCase)) + { + normalized = "sys.planner_index_prefix_stats"; + return true; + } + normalized = string.Empty; return false; } @@ -13218,6 +15093,21 @@ private bool TryBuildSystemCatalogSource(SimpleTableRef tableRef, out (IOperator rows = BuildSystemColumnStatsRows(); break; + case "sys.planner_histograms": + columns = SystemPlannerHistogramsColumns; + rows = BuildSystemPlannerHistogramsRows(); + break; + + case "sys.planner_heavy_hitters": + columns = SystemPlannerHeavyHittersColumns; + rows = BuildSystemPlannerHeavyHittersRows(); + break; + + case "sys.planner_index_prefix_stats": + columns = SystemPlannerIndexPrefixStatsColumns; + rows = BuildSystemPlannerIndexPrefixStatsRows(); + break; + default: return false; } @@ -13558,6 +15448,160 @@ private List BuildSystemColumnStatsRows() return rows; } + private long CountPlannerHistogramRows() + { + long count = 0; + foreach (var distribution in _catalog.GetColumnDistributionStatistics()) + count += distribution.HistogramBuckets.Count; + return count; + } + + private long CountSystemCatalogRows(string normalized) + { + if (string.Equals(normalized, "sys.saved_queries", StringComparison.Ordinal)) + return TryGetTableRowCount(InternalSavedQueriesTableName, out long savedQueryRows) ? savedQueryRows : 0; + + return normalized switch + { + "sys.tables" => _catalog.GetTableNames().Count, + "sys.columns" => CountSystemColumns(), + "sys.indexes" => CountSystemIndexes(), + "sys.foreign_keys" => CountSystemForeignKeys(), + "sys.views" => _catalog.GetViewNames().Count, + "sys.triggers" => _catalog.GetTriggers().Count, + "sys.objects" => CountSystemObjects(), + "sys.table_stats" => _catalog.GetTableStatistics().Count, + "sys.column_stats" => _catalog.GetColumnStatistics().Count, + "sys.planner_histograms" => CountPlannerHistogramRows(), + "sys.planner_heavy_hitters" => CountPlannerHeavyHitterRows(), + "sys.planner_index_prefix_stats" => CountPlannerIndexPrefixRows(), + _ => 0, + }; + } + + private long CountPlannerHeavyHitterRows() + { + long count = 0; + foreach (var distribution in _catalog.GetColumnDistributionStatistics()) + count += distribution.FrequentValues.Count; + return count; + } + + private long CountPlannerIndexPrefixRows() + { + long count = 0; + foreach (var stats in _catalog.GetIndexPrefixStatistics()) + count += stats.PrefixDistinctCounts.Count; + return count; + } + + private List BuildSystemPlannerHistogramsRows() + { + var distributions = _catalog.GetColumnDistributionStatistics(); + var rows = new List((int)Math.Min(CountPlannerHistogramRows(), int.MaxValue)); + + foreach (var distribution in distributions + .OrderBy(item => item.TableName, StringComparer.OrdinalIgnoreCase) + .ThenBy(item => GetColumnOrdinal(item.TableName, item.ColumnName)) + .ThenBy(item => item.ColumnName, StringComparer.OrdinalIgnoreCase)) + { + ColumnStatistics? columnStats = _catalog.GetColumnStatistics(distribution.TableName, distribution.ColumnName); + long nonNullCount = columnStats?.NonNullCount ?? 0; + bool isStale = columnStats?.IsStale ?? true; + int ordinal = GetColumnOrdinal(distribution.TableName, distribution.ColumnName); + + for (int i = 0; i < distribution.HistogramBuckets.Count; i++) + { + HistogramBucketStatistics bucket = distribution.HistogramBuckets[i]; + rows.Add( + [ + DbValue.FromText(distribution.TableName), + DbValue.FromText(distribution.ColumnName), + DbValue.FromInteger(ordinal), + DbValue.FromInteger(i + 1), + bucket.LowerBound, + bucket.UpperBound, + DbValue.FromInteger(bucket.RowCount), + DbValue.FromInteger(nonNullCount), + DbValue.FromInteger(isStale ? 1 : 0), + ]); + } + } + + return rows; + } + + private List BuildSystemPlannerHeavyHittersRows() + { + var distributions = _catalog.GetColumnDistributionStatistics(); + var rows = new List((int)Math.Min(CountPlannerHeavyHitterRows(), int.MaxValue)); + + foreach (var distribution in distributions + .OrderBy(item => item.TableName, StringComparer.OrdinalIgnoreCase) + .ThenBy(item => GetColumnOrdinal(item.TableName, item.ColumnName)) + .ThenBy(item => item.ColumnName, StringComparer.OrdinalIgnoreCase)) + { + ColumnStatistics? columnStats = _catalog.GetColumnStatistics(distribution.TableName, distribution.ColumnName); + long nonNullCount = columnStats?.NonNullCount ?? 0; + bool isStale = columnStats?.IsStale ?? true; + int ordinal = GetColumnOrdinal(distribution.TableName, distribution.ColumnName); + + foreach (FrequentValueStatistics frequentValue in distribution.FrequentValues + .OrderByDescending(item => item.RowCount) + .ThenBy(item => item.Value, Comparer.Create(static (left, right) => DbValue.Compare(left, right)))) + { + long frequencyPpm = nonNullCount > 0 + ? Math.Clamp((frequentValue.RowCount * 1_000_000L) / nonNullCount, 0, 1_000_000L) + : 0; + + rows.Add( + [ + DbValue.FromText(distribution.TableName), + DbValue.FromText(distribution.ColumnName), + DbValue.FromInteger(ordinal), + frequentValue.Value, + DbValue.FromInteger(frequentValue.RowCount), + DbValue.FromInteger(frequencyPpm), + DbValue.FromInteger(isStale ? 1 : 0), + ]); + } + } + + return rows; + } + + private List BuildSystemPlannerIndexPrefixStatsRows() + { + var prefixStats = _catalog.GetIndexPrefixStatistics(); + var rows = new List((int)Math.Min(CountPlannerIndexPrefixRows(), int.MaxValue)); + + foreach (var stats in prefixStats + .OrderBy(item => item.TableName, StringComparer.OrdinalIgnoreCase) + .ThenBy(item => item.IndexName, StringComparer.OrdinalIgnoreCase)) + { + TableStatistics? tableStats = _catalog.GetTableStatistics(stats.TableName); + long tableRowCount = tableStats?.RowCount ?? 0; + bool isStale = tableStats?.HasStaleColumns ?? true; + int maxPrefixLength = Math.Min(stats.PrefixColumns.Count, stats.PrefixDistinctCounts.Count); + + for (int prefixLength = 1; prefixLength <= maxPrefixLength; prefixLength++) + { + rows.Add( + [ + DbValue.FromText(stats.TableName), + DbValue.FromText(stats.IndexName), + DbValue.FromInteger(prefixLength), + DbValue.FromText(string.Join(",", stats.PrefixColumns.Take(prefixLength))), + DbValue.FromInteger(stats.PrefixDistinctCounts[prefixLength - 1]), + DbValue.FromInteger(tableRowCount), + DbValue.FromInteger(isStale ? 1 : 0), + ]); + } + } + + return rows; + } + private int GetColumnOrdinal(string tableName, string columnName) { var schema = _catalog.GetTable(tableName); @@ -14807,12 +16851,19 @@ private async ValueTask ExecuteResolvedInsertRowAsync( ForeignKeyMutationContext? mutationContext, bool adjustTableRowCount, ReusableInsertEncodingBuffer? reusableEncodingBuffer, + int rowIdReservationCountHint, CancellationToken ct) { // BEFORE INSERT triggers await FireTriggersAsync(tableName, TriggerTiming.Before, TriggerEvent.Insert, null, row, schema, ct); - var (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync(tableName, schema, tree, row, ct); + var (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync( + tableName, + schema, + tree, + row, + rowIdReservationCountHint, + ct); long? generatedIntegerIdentity = autoGeneratedRowId && schema.PrimaryKeyColumnIndex >= 0 && schema.Columns[schema.PrimaryKeyColumnIndex].IsIdentity && @@ -14833,7 +16884,13 @@ private async ValueTask ExecuteResolvedInsertRowAsync( { // Another writer may have advanced rowids; reload the high-water mark once and retry. InvalidateRowIdCache(tableName); - (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync(tableName, schema, tree, row, ct); + (rowId, autoGeneratedRowId) = await ResolveRowIdForInsertAsync( + tableName, + schema, + tree, + row, + rowIdReservationCountHint, + ct); generatedIntegerIdentity = autoGeneratedRowId && schema.PrimaryKeyColumnIndex >= 0 && schema.Columns[schema.PrimaryKeyColumnIndex].IsIdentity && @@ -14883,7 +16940,12 @@ private ReadOnlyMemory EncodeRowForInsert( } private async ValueTask<(long RowId, bool AutoGenerated)> ResolveRowIdForInsertAsync( - string tableName, TableSchema schema, BTree tree, DbValue[] row, CancellationToken ct) + string tableName, + TableSchema schema, + BTree tree, + DbValue[] row, + int rowIdReservationCountHint, + CancellationToken ct) { int pkIdx = schema.PrimaryKeyColumnIndex; if (pkIdx >= 0 && @@ -14898,7 +16960,7 @@ private ReadOnlyMemory EncodeRowForInsert( return (explicitRowId, false); } - long rowId = await AllocateRowIdAsync(tableName, schema, tree, ct); + long rowId = await AllocateRowIdAsync(tableName, schema, tree, rowIdReservationCountHint, ct); row[pkIdx] = DbValue.FromInteger(rowId); return (rowId, true); } @@ -14916,25 +16978,61 @@ private ReadOnlyMemory EncodeRowForInsert( return (rowId, false); } - return (await AllocateRowIdAsync(tableName, schema, tree, ct), true); + return (await AllocateRowIdAsync(tableName, schema, tree, rowIdReservationCountHint, ct), true); } - private async ValueTask AllocateRowIdAsync(string tableName, TableSchema schema, BTree tree, CancellationToken ct) + private async ValueTask AllocateRowIdAsync( + string tableName, + TableSchema schema, + BTree tree, + int rowIdReservationCountHint, + CancellationToken ct) { if (!_nextRowIdCache.TryGetValue(tableName, out long nextRowId)) nextRowId = await LoadNextRowIdAsync(tableName, schema, tree, ct); - long rowId = _nextRowIdReservationProvider is not null - ? _nextRowIdReservationProvider(tableName, nextRowId) - : nextRowId; + long rowId; + if (_nextRowIdRangeReservationProvider is not null) + rowId = ReserveRowIdFromLease(tableName, nextRowId, rowIdReservationCountHint); + else if (_nextRowIdReservationProvider is not null) + rowId = _nextRowIdReservationProvider(tableName, nextRowId); + else + rowId = nextRowId; UpdateNextRowIdState(tableName, schema, checked(rowId + 1)); return rowId; } + private long ReserveRowIdFromLease(string tableName, long minimumNextRowId, int rowIdReservationCountHint) + { + if (_rowIdReservationLeases.TryGetValue(tableName, out RowIdReservationLease? lease) && + lease.TryReserve(minimumNextRowId, out long leasedRowId)) + { + return leasedRowId; + } + + _rowIdReservationLeases.Remove(tableName); + + int reservationCount = Math.Max(1, rowIdReservationCountHint); + var (start, endExclusive) = _nextRowIdRangeReservationProvider!(tableName, minimumNextRowId, reservationCount); + if (start < minimumNextRowId || endExclusive <= start) + { + throw new CSharpDbException( + ErrorCode.Unknown, + $"Invalid row-id reservation range [{start}, {endExclusive}) for table '{tableName}'."); + } + + long nextLeaseRowId = checked(start + 1); + if (nextLeaseRowId < endExclusive) + _rowIdReservationLeases[tableName] = new RowIdReservationLease(nextLeaseRowId, endExclusive); + + return start; + } + private void InvalidateRowIdCache(string tableName) { _nextRowIdCache.Remove(tableName); + _rowIdReservationLeases.Remove(tableName); } private async ValueTask LoadNextRowIdAsync(string tableName, TableSchema schema, BTree tree, CancellationToken ct) @@ -15005,6 +17103,11 @@ private void ObserveExplicitRowId(string tableName, TableSchema schema, long nex _dirtyNextRowIdTables.Add(tableName); _nextRowIdObservationProvider?.Invoke(tableName, nextRowId); + if (_rowIdReservationLeases.TryGetValue(tableName, out RowIdReservationLease? lease) && + !lease.AdvanceTo(nextRowId)) + { + _rowIdReservationLeases.Remove(tableName); + } // Explicit rowid inserts can push the durable high-water mark forward, but persisting // every bump rewrites the schema catalog row on each commit. Mark the persisted hint as diff --git a/src/CSharpDB.Execution/README.md b/src/CSharpDB.Execution/README.md index 2bf860fb..04f28d25 100644 --- a/src/CSharpDB.Execution/README.md +++ b/src/CSharpDB.Execution/README.md @@ -16,13 +16,13 @@ Query planner, operator tree, and expression evaluator for the [CSharpDB](https: ### Query Planner - Translates AST statements into physical operator trees - Dispatches to type-specific handlers for all DDL and DML statements -- System catalog virtual tables: `sys.tables`, `sys.columns`, `sys.indexes`, `sys.views`, `sys.triggers`, `sys.objects`, `sys.saved_queries`, `sys.table_stats`, `sys.column_stats` +- System catalog virtual tables: `sys.tables`, `sys.columns`, `sys.indexes`, `sys.views`, `sys.triggers`, `sys.objects`, `sys.saved_queries`, `sys.table_stats`, `sys.column_stats`, `sys.planner_histograms`, `sys.planner_heavy_hitters`, `sys.planner_index_prefix_stats` - Compound query execution for `UNION`, `INTERSECT`, and `EXCEPT` - Subquery lowering for uncorrelated forms plus a correlated fallback path for supported contexts - Compiled expression cache (up to 4096 entries) for repeated queries - Trigger body caching with schema-sensitive invalidation - Sync point-lookup fast path for `SELECT ... WHERE pk = value` -- Persisted row-count reuse for `COUNT(*)`, planner cardinality estimates, skew-aware equality/`IN` estimation from internal heavy hitters, histogram-backed numeric range estimates, composite-prefix correlation modeling for multi-column filters/joins, and bounded small-chain inner-join reordering +- Persisted row-count reuse for `COUNT(*)`, planner cardinality estimates, skew-aware equality/`IN` estimation from heavy hitters, histogram-backed numeric range estimates, composite-prefix correlation modeling for multi-column filters/joins, public `EXPLAIN ESTIMATE FOR ` diagnostics, and bounded small-chain inner-join reordering ### Operator Tree (Iterator Model) - `IOperator` interface: `OpenAsync`, `MoveNextAsync`, `Current`, `OutputSchema` diff --git a/src/CSharpDB.Primitives/AdaptiveQueryReoptimizationOptions.cs b/src/CSharpDB.Primitives/AdaptiveQueryReoptimizationOptions.cs new file mode 100644 index 00000000..3de31969 --- /dev/null +++ b/src/CSharpDB.Primitives/AdaptiveQueryReoptimizationOptions.cs @@ -0,0 +1,13 @@ +namespace CSharpDB.Primitives; + +/// +/// Controls opt-in adaptive query re-optimization for SELECT joins. +/// +public sealed class AdaptiveQueryReoptimizationOptions +{ + public bool Enabled { get; init; } + public int DivergenceFactor { get; init; } = 8; + public int MinimumObservedRows { get; init; } = 4096; + public int MaxBufferedRows { get; init; } = 65536; + public int MaxReoptimizationsPerQuery { get; init; } = 1; +} diff --git a/src/CSharpDB.Sql/Ast.cs b/src/CSharpDB.Sql/Ast.cs index 11aed7d6..ab85a809 100644 --- a/src/CSharpDB.Sql/Ast.cs +++ b/src/CSharpDB.Sql/Ast.cs @@ -230,6 +230,11 @@ public sealed class AnalyzeStatement : Statement public string? TableName { get; init; } } +public sealed class ExplainEstimateStatement : Statement +{ + public required Statement Target { get; init; } +} + // ============ Common Table Expressions (CTEs) ============ public sealed class CteDefinition diff --git a/src/CSharpDB.Sql/Parser.cs b/src/CSharpDB.Sql/Parser.cs index aa4b19e4..a5d808ad 100644 --- a/src/CSharpDB.Sql/Parser.cs +++ b/src/CSharpDB.Sql/Parser.cs @@ -190,6 +190,7 @@ public Statement ParseStatement() TokenType.Alter => ParseAlterTable(), TokenType.With => ParseWith(), TokenType.Analyze => ParseAnalyze(), + TokenType.Explain => ParseExplain(), _ => throw Error($"Unexpected token '{token.Value}', expected a statement."), }; @@ -2028,6 +2029,22 @@ private AnalyzeStatement ParseAnalyze() return new AnalyzeStatement { TableName = tableName }; } + private ExplainEstimateStatement ParseExplain() + { + Expect(TokenType.Explain); + Expect(TokenType.Estimate); + Expect(TokenType.For); + + Statement target = Peek().Type switch + { + TokenType.Select => ParseQueryExpression(), + TokenType.With => ParseWith(), + _ => throw Error("EXPLAIN ESTIMATE FOR supports SELECT, WITH, and compound SELECT queries only."), + }; + + return new ExplainEstimateStatement { Target = target }; + } + private UpdateStatement ParseUpdate() { Expect(TokenType.Update); diff --git a/src/CSharpDB.Sql/README.md b/src/CSharpDB.Sql/README.md index 34acf3a6..39262c4d 100644 --- a/src/CSharpDB.Sql/README.md +++ b/src/CSharpDB.Sql/README.md @@ -107,7 +107,7 @@ The execution layer uses the same recursive pattern — `BuildFromOperator(Table ## AST Hierarchy -**Statements**: `CreateTableStatement`, `DropTableStatement`, `InsertStatement`, `SelectStatement`, `CompoundSelectStatement`, `UpdateStatement`, `DeleteStatement`, `AlterTableStatement`, `CreateIndexStatement`, `DropIndexStatement`, `CreateViewStatement`, `DropViewStatement`, `CreateTriggerStatement`, `DropTriggerStatement`, `AnalyzeStatement`, `WithStatement` +**Statements**: `CreateTableStatement`, `DropTableStatement`, `InsertStatement`, `SelectStatement`, `CompoundSelectStatement`, `UpdateStatement`, `DeleteStatement`, `AlterTableStatement`, `CreateIndexStatement`, `DropIndexStatement`, `CreateViewStatement`, `DropViewStatement`, `CreateTriggerStatement`, `DropTriggerStatement`, `AnalyzeStatement`, `ExplainEstimateStatement`, `WithStatement` **Expressions**: `LiteralExpression`, `ParameterExpression`, `ColumnRefExpression`, `BinaryExpression`, `UnaryExpression`, `LikeExpression`, `InExpression`, `InSubqueryExpression`, `ScalarSubqueryExpression`, `ExistsExpression`, `BetweenExpression`, `IsNullExpression`, `FunctionCallExpression` diff --git a/src/CSharpDB.Sql/SqlStatementClassifier.cs b/src/CSharpDB.Sql/SqlStatementClassifier.cs index bfcdc24e..d0891721 100644 --- a/src/CSharpDB.Sql/SqlStatementClassifier.cs +++ b/src/CSharpDB.Sql/SqlStatementClassifier.cs @@ -22,6 +22,6 @@ public static SqlStatementClassification Classify(string sql) public static bool IsReadOnly(Statement statement) { ArgumentNullException.ThrowIfNull(statement); - return statement is QueryStatement or WithStatement; + return statement is QueryStatement or WithStatement or ExplainEstimateStatement; } } diff --git a/src/CSharpDB.Sql/TokenType.cs b/src/CSharpDB.Sql/TokenType.cs index 70cea04e..7aa63c5d 100644 --- a/src/CSharpDB.Sql/TokenType.cs +++ b/src/CSharpDB.Sql/TokenType.cs @@ -85,6 +85,8 @@ public enum TokenType With, Recursive, Analyze, + Explain, + Estimate, Trigger, Before, After, diff --git a/src/CSharpDB.Sql/Tokenizer.cs b/src/CSharpDB.Sql/Tokenizer.cs index 807bc433..d83065e4 100644 --- a/src/CSharpDB.Sql/Tokenizer.cs +++ b/src/CSharpDB.Sql/Tokenizer.cs @@ -82,6 +82,8 @@ public sealed class Tokenizer ["WITH"] = TokenType.With, ["RECURSIVE"] = TokenType.Recursive, ["ANALYZE"] = TokenType.Analyze, + ["EXPLAIN"] = TokenType.Explain, + ["ESTIMATE"] = TokenType.Estimate, ["TRIGGER"] = TokenType.Trigger, ["BEFORE"] = TokenType.Before, ["AFTER"] = TokenType.After, diff --git a/src/CSharpDB.Storage/Catalog/CatalogService.cs b/src/CSharpDB.Storage/Catalog/CatalogService.cs index bd880e1d..59a1792a 100644 --- a/src/CSharpDB.Storage/Catalog/CatalogService.cs +++ b/src/CSharpDB.Storage/Catalog/CatalogService.cs @@ -615,6 +615,22 @@ public IReadOnlyCollection GetDirtyColumnStatistics() return dirty; } + internal IReadOnlyCollection GetColumnDistributionStatistics() + { + if (_columnDistributionStatsCache.Count == 0) + return Array.Empty(); + + return _columnDistributionStatsCache.Values.ToArray(); + } + + internal IReadOnlyCollection GetIndexPrefixStatistics() + { + if (_indexPrefixStatsCache.Count == 0) + return Array.Empty(); + + return _indexPrefixStatsCache.Values.ToArray(); + } + public void ApplyCommittedAdvisoryStatisticsSnapshot( IReadOnlyCollection tableStatistics, IReadOnlyCollection columnStatistics, diff --git a/src/CSharpDB.Storage/Catalog/SchemaCatalog.cs b/src/CSharpDB.Storage/Catalog/SchemaCatalog.cs index 55db0fda..6334c133 100644 --- a/src/CSharpDB.Storage/Catalog/SchemaCatalog.cs +++ b/src/CSharpDB.Storage/Catalog/SchemaCatalog.cs @@ -111,6 +111,12 @@ public bool TryGetFreshColumnDistributionStatistics(string tableName, string col public bool TryGetFreshIndexPrefixStatistics(string indexName, out IndexPrefixStatistics stats) => _service.TryGetFreshIndexPrefixStatistics(indexName, out stats); + internal IReadOnlyCollection GetColumnDistributionStatistics() => + _service.GetColumnDistributionStatistics(); + + internal IReadOnlyCollection GetIndexPrefixStatistics() => + _service.GetIndexPrefixStatistics(); + public bool TryGetTableRowCount(string tableName, out long rowCount) => _service.TryGetEstimatedTableRowCount(tableName, out rowCount); diff --git a/src/CSharpDB.Storage/Paging/CommitPathDiagnosticsSnapshot.cs b/src/CSharpDB.Storage/Paging/CommitPathDiagnosticsSnapshot.cs index 6923fb46..5f6af947 100644 --- a/src/CSharpDB.Storage/Paging/CommitPathDiagnosticsSnapshot.cs +++ b/src/CSharpDB.Storage/Paging/CommitPathDiagnosticsSnapshot.cs @@ -17,6 +17,9 @@ public readonly record struct CommitPathDiagnosticsSnapshot( long ExplicitLeafRebaseSuccessCount, long ExplicitLeafRebaseStructuralRejectCount, long ExplicitLeafRebaseCapacityRejectCount, + long ExplicitPendingLeafRebaseAttemptCount, + long ExplicitPendingLeafRebaseSuccessCount, + long ExplicitPendingLeafRebaseRejectCount, long ExplicitLeafRebaseRejectNonInsertOnlyCount, long ExplicitLeafRebaseRejectDuplicateKeyCount, long ExplicitLeafRebaseRejectSplitFallbackPreconditionCount, diff --git a/src/CSharpDB.Storage/Paging/Pager.cs b/src/CSharpDB.Storage/Paging/Pager.cs index 07fd2378..2d82d68e 100644 --- a/src/CSharpDB.Storage/Paging/Pager.cs +++ b/src/CSharpDB.Storage/Paging/Pager.cs @@ -64,6 +64,9 @@ private sealed class BTreeResourceDiagnosticsCounter(string resourceName) private long _explicitLeafRebaseSuccessCount; private long _explicitLeafRebaseStructuralRejectCount; private long _explicitLeafRebaseCapacityRejectCount; + private long _explicitPendingLeafRebaseAttemptCount; + private long _explicitPendingLeafRebaseSuccessCount; + private long _explicitPendingLeafRebaseRejectCount; private long _explicitLeafRebaseRejectNonInsertOnlyCount; private long _explicitLeafRebaseRejectDuplicateKeyCount; private long _explicitLeafRebaseRejectSplitFallbackPreconditionCount; @@ -113,6 +116,7 @@ private sealed class BTreeResourceDiagnosticsCounter(string resourceName) private long _hashedIndexDeferredFlushCount; private readonly ConcurrentDictionary _btreeResourceDiagnostics = new(); private readonly object _explicitCommitStateGate = new(); + private readonly Dictionary _explicitCachedPageVersions = new(); private uint _scheduledExplicitPageCount; private uint _scheduledExplicitSchemaRootPage; private uint _scheduledExplicitFreelistHead; @@ -230,6 +234,9 @@ internal CommitPathDiagnosticsSnapshot GetCommitPathDiagnosticsSnapshot() ExplicitLeafRebaseSuccessCount: Interlocked.Read(ref _explicitLeafRebaseSuccessCount), ExplicitLeafRebaseStructuralRejectCount: Interlocked.Read(ref _explicitLeafRebaseStructuralRejectCount), ExplicitLeafRebaseCapacityRejectCount: Interlocked.Read(ref _explicitLeafRebaseCapacityRejectCount), + ExplicitPendingLeafRebaseAttemptCount: Interlocked.Read(ref _explicitPendingLeafRebaseAttemptCount), + ExplicitPendingLeafRebaseSuccessCount: Interlocked.Read(ref _explicitPendingLeafRebaseSuccessCount), + ExplicitPendingLeafRebaseRejectCount: Interlocked.Read(ref _explicitPendingLeafRebaseRejectCount), ExplicitLeafRebaseRejectNonInsertOnlyCount: Interlocked.Read(ref _explicitLeafRebaseRejectNonInsertOnlyCount), ExplicitLeafRebaseRejectDuplicateKeyCount: Interlocked.Read(ref _explicitLeafRebaseRejectDuplicateKeyCount), ExplicitLeafRebaseRejectSplitFallbackPreconditionCount: Interlocked.Read(ref _explicitLeafRebaseRejectSplitFallbackPreconditionCount), @@ -311,6 +318,9 @@ internal void ResetCommitPathDiagnostics() Interlocked.Exchange(ref _explicitLeafRebaseSuccessCount, 0); Interlocked.Exchange(ref _explicitLeafRebaseStructuralRejectCount, 0); Interlocked.Exchange(ref _explicitLeafRebaseCapacityRejectCount, 0); + Interlocked.Exchange(ref _explicitPendingLeafRebaseAttemptCount, 0); + Interlocked.Exchange(ref _explicitPendingLeafRebaseSuccessCount, 0); + Interlocked.Exchange(ref _explicitPendingLeafRebaseRejectCount, 0); Interlocked.Exchange(ref _explicitLeafRebaseRejectNonInsertOnlyCount, 0); Interlocked.Exchange(ref _explicitLeafRebaseRejectDuplicateKeyCount, 0); Interlocked.Exchange(ref _explicitLeafRebaseRejectSplitFallbackPreconditionCount, 0); @@ -1010,6 +1020,18 @@ private void RecordExplicitLeafRebaseDiagnostics(InsertOnlyRebaseResult result) } } + private void RecordExplicitPendingLeafRebaseDiagnostics(InsertOnlyRebaseResult result) + { + if (result == InsertOnlyRebaseResult.NotApplicable) + return; + + Interlocked.Increment(ref _explicitPendingLeafRebaseAttemptCount); + if (result == InsertOnlyRebaseResult.Success) + Interlocked.Increment(ref _explicitPendingLeafRebaseSuccessCount); + else + Interlocked.Increment(ref _explicitPendingLeafRebaseRejectCount); + } + private void RecordExplicitLeafStructuralRejectReason(ExplicitLeafStructuralRejectReason reason) { switch (reason) @@ -1757,6 +1779,17 @@ private async ValueTask BeginExplicitCommitAsync(PagerTransac { if (conflictVersion > _transactions.CurrentCommitVersion) { + if (await TryResolveExplicitPendingWriteConflictAsync(tx, conflictPageId, conflictVersion, ct)) + { + if (HaveExplicitWritePageIdsChanged(writePageIds, tx.DirtyPages)) + { + refreshWritePageIds = true; + break; + } + + continue; + } + waitForCommitStateChange = true; break; } @@ -1853,7 +1886,11 @@ private async ValueTask BeginExplicitCommitAsync(PagerTransac RecordWalAppendDiagnostics(walAppendStartTicks); long pendingCommitReservationStartTicks = Stopwatch.GetTimestamp(); - pendingCommitReservation = _transactions?.ReservePendingCommit(writePageIds, tx.LogicalWriteKeys) + pendingCommitReservation = _transactions?.ReservePendingCommit( + writePageIds, + writePageIds.Length, + tx.ModifiedPages, + tx.LogicalWriteKeys) ?? throw new InvalidOperationException("Explicit write transactions require a transaction coordinator."); RecordExplicitPendingCommitReservationDiagnostics(pendingCommitReservationStartTicks); commitQueued = true; @@ -2001,7 +2038,8 @@ private void PublishExplicitCommitState( { lock (_explicitCommitStateGate) { - if (commitVersion > _lastAppliedExplicitCommitVersion) + bool applyCommitState = commitVersion > _lastAppliedExplicitCommitVersion; + if (applyCommitState) { _pageCount = headerReservation.PageCount; _schemaRootPage = headerReservation.SchemaRootPage; @@ -2013,7 +2051,17 @@ private void PublishExplicitCommitState( } for (int i = 0; i < writePageIds.Length; i++) - _buffers.SetCached(writePageIds[i], tx.ModifiedPages[writePageIds[i]]); + { + uint pageId = writePageIds[i]; + if (_explicitCachedPageVersions.TryGetValue(pageId, out long appliedVersion) && + appliedVersion >= commitVersion) + { + continue; + } + + _buffers.SetCached(pageId, tx.ModifiedPages[pageId]); + _explicitCachedPageVersions[pageId] = commitVersion; + } if (_pendingExplicitCommitCount > 0) _pendingExplicitCommitCount--; @@ -2192,6 +2240,8 @@ private void ReadFileHeaderFrom(byte[] page0) private async ValueTask ResetPagerStateFromCommittedStorageAsync(CancellationToken ct) { _buffers.ClearAll(); + lock (_explicitCommitStateGate) + _explicitCachedPageVersions.Clear(); // Re-read header from DB file (WAL may have committed data, so check WAL too) if (_device.Length >= PageConstants.PageSize) @@ -2743,6 +2793,42 @@ private bool TryFindExplicitWriteConflict( return false; } + private async ValueTask TryResolveExplicitPendingWriteConflictAsync( + PagerTransactionState tx, + uint conflictPageId, + long conflictVersion, + CancellationToken ct) + { + if (_transactions is null || + !tx.ModifiedPages.TryGetValue(conflictPageId, out byte[]? transactionPage) || + !_transactions.TryGetPendingPageImage(conflictPageId, conflictVersion, out ReadOnlyMemory pendingPage)) + { + return false; + } + + PageReadBuffer basePage = await _buffers.GetSnapshotPageReadAsync(conflictPageId, tx.Snapshot, ct); + if (transactionPage.AsSpan().SequenceEqual(pendingPage.Span)) + { + tx.ResolvedWriteConflictVersions[conflictPageId] = conflictVersion; + return true; + } + + InsertOnlyRebaseResult leafRebaseResult = LeafInsertRebaseHelper.TryRebaseInsertOnlyLeafPage( + conflictPageId, + basePage.Memory, + pendingPage, + transactionPage, + out byte[]? rebasedPage); + RecordExplicitPendingLeafRebaseDiagnostics(leafRebaseResult); + if (leafRebaseResult != InsertOnlyRebaseResult.Success) + return false; + + RecordExplicitLeafRebaseDiagnostics(InsertOnlyRebaseResult.Success); + tx.ModifiedPages[conflictPageId] = rebasedPage!; + tx.ResolvedWriteConflictVersions[conflictPageId] = conflictVersion; + return true; + } + private async ValueTask TryResolveExplicitWriteConflictAsync( PagerTransactionState tx, uint conflictPageId, diff --git a/src/CSharpDB.Storage/Transactions/TransactionCoordinator.cs b/src/CSharpDB.Storage/Transactions/TransactionCoordinator.cs index 0d4f2b3a..a7e48429 100644 --- a/src/CSharpDB.Storage/Transactions/TransactionCoordinator.cs +++ b/src/CSharpDB.Storage/Transactions/TransactionCoordinator.cs @@ -16,6 +16,7 @@ internal sealed class TransactionCoordinator : IDisposable private readonly SemaphoreSlim _writerLock = new(1, 1); private readonly SemaphoreSlim _commitLock = new(1, 1); private readonly Dictionary _pageLastWriteVersion = new(); + private readonly Dictionary> _pendingPageImages = new(); private readonly Dictionary _logicalLastWriteVersion = new(); private readonly Dictionary _activeExplicitTransactions = new(); private long _currentTransactionId; @@ -395,6 +396,28 @@ public bool TryGetPageLastWriteVersion(uint pageId, out long lastWriteVersion) return _pageLastWriteVersion.TryGetValue(pageId, out lastWriteVersion); } + internal bool TryGetPendingPageImage(uint pageId, long commitVersion, out ReadOnlyMemory page) + { + lock (_pageVersionGate) + { + if (_pendingPageImages.TryGetValue(pageId, out List? images)) + { + for (int i = images.Count - 1; i >= 0; i--) + { + PendingPageImage image = images[i]; + if (image.CommitVersion == commitVersion) + { + page = image.Page; + return true; + } + } + } + } + + page = default; + return false; + } + public async ValueTask WaitForCommitStateChangeAsync(CancellationToken ct = default) { Task waitTask; @@ -477,6 +500,13 @@ internal PendingCommitReservation ReservePendingCommit( uint[] pageIds, int pageCount, HashSet logicalWriteKeys) + => ReservePendingCommit(pageIds, pageCount, pageImages: null, logicalWriteKeys: logicalWriteKeys); + + internal PendingCommitReservation ReservePendingCommit( + uint[] pageIds, + int pageCount, + IReadOnlyDictionary? pageImages, + HashSet logicalWriteKeys) { ArgumentNullException.ThrowIfNull(pageIds); ArgumentNullException.ThrowIfNull(logicalWriteKeys); @@ -486,14 +516,20 @@ internal PendingCommitReservation ReservePendingCommit( (PendingPageVersion[] previousVersions, int previousVersionCount) = BuildPendingPageVersionBuffer(pageIds, pageCount); (PendingLogicalVersion[] previousLogicalVersions, int previousLogicalVersionCount) = BuildPendingLogicalVersionBuffer(logicalWriteKeys); - return ReservePendingCommitCore(previousVersions, previousVersionCount, previousLogicalVersions, previousLogicalVersionCount); + return ReservePendingCommitCore( + previousVersions, + previousVersionCount, + previousLogicalVersions, + previousLogicalVersionCount, + pageImages); } private PendingCommitReservation ReservePendingCommitCore( PendingPageVersion[] previousVersions, int previousVersionCount, PendingLogicalVersion[] previousLogicalVersions, - int previousLogicalVersionCount) + int previousLogicalVersionCount, + IReadOnlyDictionary? pageImages = null) { long commitVersion; lock (_pageVersionGate) @@ -510,6 +546,8 @@ private PendingCommitReservation ReservePendingCommitCore( _pageLastWriteVersion[pageId] = commitVersion; } + TrackPendingPageImages_NoLock(commitVersion, previousVersions, previousVersionCount, pageImages); + for (int i = 0; i < previousLogicalVersionCount; i++) { LogicalConflictKey logicalWriteKey = previousLogicalVersions[i].LogicalWriteKey; @@ -557,6 +595,7 @@ public void PublishPendingCommit(PendingCommitReservation reservation) } finally { + RemovePendingPageImages(reservation); reservation.ReleaseBuffers(); } } @@ -604,6 +643,8 @@ public void RevertPendingCommit(PendingCommitReservation reservation) else _logicalLastWriteVersion.Remove(logicalWriteKey); } + + RemovePendingPageImages_NoLock(reservation); } } finally @@ -648,6 +689,52 @@ private void ReleasePendingCommitWindow() _beginBarrier.Release(); } + private void TrackPendingPageImages_NoLock( + long commitVersion, + PendingPageVersion[] previousVersions, + int previousVersionCount, + IReadOnlyDictionary? pageImages) + { + if (pageImages is null) + return; + + for (int i = 0; i < previousVersionCount; i++) + { + uint pageId = previousVersions[i].PageId; + if (!pageImages.TryGetValue(pageId, out byte[]? pageImage)) + continue; + + if (!_pendingPageImages.TryGetValue(pageId, out List? images)) + { + images = []; + _pendingPageImages[pageId] = images; + } + + images.Add(new PendingPageImage(commitVersion, pageImage.ToArray())); + } + } + + private void RemovePendingPageImages(PendingCommitReservation reservation) + { + lock (_pageVersionGate) + RemovePendingPageImages_NoLock(reservation); + } + + private void RemovePendingPageImages_NoLock(PendingCommitReservation reservation) + { + PendingPageVersion[] previousPageVersions = reservation.PreviousPageVersions; + for (int i = 0; i < reservation.PreviousPageVersionCount; i++) + { + uint pageId = previousPageVersions[i].PageId; + if (!_pendingPageImages.TryGetValue(pageId, out List? images)) + continue; + + images.RemoveAll(image => image.CommitVersion == reservation.CommitVersion); + if (images.Count == 0) + _pendingPageImages.Remove(pageId); + } + } + private static (PendingPageVersion[] Buffer, int Count) BuildPendingPageVersionBuffer(IEnumerable pageIds) { if (pageIds.TryGetNonEnumeratedCount(out int count)) @@ -918,4 +1005,6 @@ internal void ReleaseBuffers() internal readonly record struct PendingPageVersion(uint PageId, long? PreviousVersion); internal readonly record struct PendingLogicalVersion(LogicalConflictKey LogicalWriteKey, long? PreviousVersion); + + private readonly record struct PendingPageImage(long CommitVersion, byte[] Page); } diff --git a/src/CSharpDB.Storage/Wal/MemoryWriteAheadLog.cs b/src/CSharpDB.Storage/Wal/MemoryWriteAheadLog.cs index 199be061..434f79a7 100644 --- a/src/CSharpDB.Storage/Wal/MemoryWriteAheadLog.cs +++ b/src/CSharpDB.Storage/Wal/MemoryWriteAheadLog.cs @@ -110,6 +110,9 @@ void IWalRuntimeDiagnosticsProvider.ResetWalFlushDiagnostics() ExplicitLeafRebaseSuccessCount: 0, ExplicitLeafRebaseStructuralRejectCount: 0, ExplicitLeafRebaseCapacityRejectCount: 0, + ExplicitPendingLeafRebaseAttemptCount: 0, + ExplicitPendingLeafRebaseSuccessCount: 0, + ExplicitPendingLeafRebaseRejectCount: 0, ExplicitLeafRebaseRejectNonInsertOnlyCount: 0, ExplicitLeafRebaseRejectDuplicateKeyCount: 0, ExplicitLeafRebaseRejectSplitFallbackPreconditionCount: 0, diff --git a/src/CSharpDB.Storage/Wal/WriteAheadLog.cs b/src/CSharpDB.Storage/Wal/WriteAheadLog.cs index 2e306194..5f7e37a3 100644 --- a/src/CSharpDB.Storage/Wal/WriteAheadLog.cs +++ b/src/CSharpDB.Storage/Wal/WriteAheadLog.cs @@ -210,6 +210,9 @@ CommitPathDiagnosticsSnapshot ICommitPathDiagnosticsProvider.GetCommitPathDiagno ExplicitLeafRebaseSuccessCount: 0, ExplicitLeafRebaseStructuralRejectCount: 0, ExplicitLeafRebaseCapacityRejectCount: 0, + ExplicitPendingLeafRebaseAttemptCount: 0, + ExplicitPendingLeafRebaseSuccessCount: 0, + ExplicitPendingLeafRebaseRejectCount: 0, ExplicitLeafRebaseRejectNonInsertOnlyCount: 0, ExplicitLeafRebaseRejectDuplicateKeyCount: 0, ExplicitLeafRebaseRejectSplitFallbackPreconditionCount: 0, diff --git a/tests/CSharpDB.Admin.Forms.Tests/Components/Shared/DataGridTests.cs b/tests/CSharpDB.Admin.Forms.Tests/Components/Shared/DataGridTests.cs index 88428733..c6a8957b 100644 --- a/tests/CSharpDB.Admin.Forms.Tests/Components/Shared/DataGridTests.cs +++ b/tests/CSharpDB.Admin.Forms.Tests/Components/Shared/DataGridTests.cs @@ -241,6 +241,18 @@ public void ApplyOverfetchedPageRows_OnLastPage_ComputesExactTotal() Assert.Equal(7, GetAllRows(component).Count); } + [Fact] + public void BuildSelectSql_UsesBoundedLimitOffsetForUnknownTotalViewPages() + { + var component = new DataGrid(); + SetField(component, "_page", 7); + SetField(component, "_pageSize", 50); + + string sql = (string)InvokeNonPublic(component, "BuildSelectSql", "low_stock_watch", (int?)51)!; + + Assert.Equal("SELECT * FROM low_stock_watch LIMIT 51 OFFSET 300", sql); + } + private static Dictionary GetFilters(DataGrid component) => GetField>(component, "_filters"); diff --git a/tests/CSharpDB.Api.Tests/HttpTransportClientTests.cs b/tests/CSharpDB.Api.Tests/HttpTransportClientTests.cs index 2b855141..a82a52ab 100644 --- a/tests/CSharpDB.Api.Tests/HttpTransportClientTests.cs +++ b/tests/CSharpDB.Api.Tests/HttpTransportClientTests.cs @@ -119,6 +119,32 @@ public async Task HttpTransport_SupportsTransactionsCollectionsSavedQueriesAndCh Assert.Equal(0, info.CollectionCount); } + [Fact] + public async Task HttpTransport_ExecutesPublicPlannerDiagnostics() + { + Assert.Null((await _client.ExecuteSqlAsync( + "CREATE TABLE http_planner_diag (id INTEGER PRIMARY KEY, value INTEGER);", + Ct)).Error); + Assert.Null((await _client.ExecuteSqlAsync( + "INSERT INTO http_planner_diag VALUES (1, 4), (2, 4), (3, 9);", + Ct)).Error); + Assert.Null((await _client.ExecuteSqlAsync("ANALYZE http_planner_diag;", Ct)).Error); + + SqlExecutionResult catalog = await _client.ExecuteSqlAsync( + "SELECT COUNT(*) FROM sys.planner_histograms WHERE table_name = 'http_planner_diag';", + Ct); + Assert.Null(catalog.Error); + Assert.NotNull(catalog.Rows); + Assert.True(Convert.ToInt64(Assert.Single(catalog.Rows)[0]) > 0); + + SqlExecutionResult explain = await _client.ExecuteSqlAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM http_planner_diag WHERE value = 4;", + Ct); + Assert.Null(explain.Error); + Assert.NotNull(explain.Rows); + Assert.Contains(explain.Rows, row => string.Equals(Convert.ToString(row[4]), "heavy-hitter", StringComparison.Ordinal)); + } + [Fact] public async Task RestApi_DefaultNoAuthModeStillWorks() { diff --git a/tests/CSharpDB.Benchmarks/BENCHMARK_CATALOG.md b/tests/CSharpDB.Benchmarks/BENCHMARK_CATALOG.md index 68fed6c3..3a2f55ef 100644 --- a/tests/CSharpDB.Benchmarks/BENCHMARK_CATALOG.md +++ b/tests/CSharpDB.Benchmarks/BENCHMARK_CATALOG.md @@ -58,6 +58,9 @@ These harnesses stay available, but they do not feed the main README unless they | `--commit-fan-in-diagnostics` | `diagnostic` | Shared auto-commit vs explicit transaction fan-in detail. | | `--insert-fan-in-diagnostics` | `diagnostic` | Disjoint key, hot right-edge, and auto-id insert fan-in detail. | | `--checkpoint-retention-diagnostics` | `diagnostic` | Background checkpoint progress while writers hold transactions. | +| `--optimizer-closeout` | `diagnostic` | Advanced optimizer close-out coverage for heavy hitters, histogram ranges, composite-prefix correlation, and bounded join reordering. | +| `--adaptive-reoptimization` | `diagnostic` | Opt-in adaptive join re-optimization coverage for disabled baseline, no-switch overhead, stale-stat fan-out, parameter-sensitive skew, hash build-side diagnostics, and synthetic switch counters. | +| `--async-io-closeout` | `diagnostic` | Async I/O batching close-out audit for save/backup/restore, logical rewrites, and inspector/WAL scan paths. | | `--concurrent-sqlite-capi-compare` | `diagnostic` | CSharpDB/SQLite C API concurrent insert comparisons. | | `--concurrent-adonet-compare` | `diagnostic` | CSharpDB/SQLite ADO.NET concurrent insert comparisons. | | `--direct-file-cache-transport` | `diagnostic` | Direct client and tuned file-cache comparison. | @@ -85,6 +88,7 @@ Scenario-specific commands such as `--durable-sql-batching-scenario`, `--concurr | `CollectionLookupFallbackBenchmarks` | `diagnostic` | | `CollectionPayloadBenchmarks` | `diagnostic` | | `CollectionSchemaBreadthBenchmarks` | `diagnostic` | +| `GeneratedCollectionCodecBenchmarks` | `diagnostic` | | `CompositeGroupedIndexBenchmarks` | `diagnostic` | | `CoveringIndexBenchmarks` | `diagnostic` | | `DistinctBenchmarks` | `diagnostic` | diff --git a/tests/CSharpDB.Benchmarks/CSharpDB.Benchmarks.csproj b/tests/CSharpDB.Benchmarks/CSharpDB.Benchmarks.csproj index fab30757..fc65dd10 100644 --- a/tests/CSharpDB.Benchmarks/CSharpDB.Benchmarks.csproj +++ b/tests/CSharpDB.Benchmarks/CSharpDB.Benchmarks.csproj @@ -26,6 +26,7 @@ + diff --git a/tests/CSharpDB.Benchmarks/Infrastructure/CommitPathDiagnosticsFormatter.cs b/tests/CSharpDB.Benchmarks/Infrastructure/CommitPathDiagnosticsFormatter.cs index 12ca4891..92bbafc3 100644 --- a/tests/CSharpDB.Benchmarks/Infrastructure/CommitPathDiagnosticsFormatter.cs +++ b/tests/CSharpDB.Benchmarks/Infrastructure/CommitPathDiagnosticsFormatter.cs @@ -13,7 +13,7 @@ public static string BuildSummary(CommitPathDiagnosticsSnapshot diagnostics) string hotBTreeResources = BuildHotBTreeResourceSummary(diagnostics.BTreeResourceDiagnostics); return string.Create( CultureInfo.InvariantCulture, - $"walAppends={diagnostics.WalAppendCount}, avgWalAppendMs={AverageMilliseconds(diagnostics.WalAppendTicks, diagnostics.WalAppendCount):F3}, explicitCommitLockWaits={diagnostics.ExplicitCommitLockWaitCount}, avgExplicitCommitLockWaitMs={AverageMilliseconds(diagnostics.ExplicitCommitLockWaitTicks, diagnostics.ExplicitCommitLockWaitCount):F3}, explicitCommitLockHolds={diagnostics.ExplicitCommitLockHoldCount}, avgExplicitCommitLockHoldMs={AverageMilliseconds(diagnostics.ExplicitCommitLockHoldTicks, diagnostics.ExplicitCommitLockHoldCount):F3}, explicitConflictResolutions={diagnostics.ExplicitConflictResolutionCount}, avgExplicitConflictResolutionMs={AverageMilliseconds(diagnostics.ExplicitConflictResolutionTicks, diagnostics.ExplicitConflictResolutionCount):F3}, leafRebases={diagnostics.ExplicitLeafRebaseAttemptCount}/{diagnostics.ExplicitLeafRebaseSuccessCount}/{diagnostics.ExplicitLeafRebaseStructuralRejectCount}/{diagnostics.ExplicitLeafRebaseCapacityRejectCount}, leafRejectReasons={diagnostics.ExplicitLeafRebaseRejectNonInsertOnlyCount}/{diagnostics.ExplicitLeafRebaseRejectDuplicateKeyCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackPreconditionCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackShapeCount}/{diagnostics.ExplicitLeafRebaseRejectOtherCount}, leafSplitPreconditions={diagnostics.ExplicitLeafRebaseRejectSplitFallbackMissingTraversalCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackDirtyAncestorCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackParentBoundaryCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackTargetPageDirtyCount}, dirtyParentRecoveries={diagnostics.ExplicitLeafRebaseRejectDirtyParentMissingParentPageCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentTransactionLeafNotSplitCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentBaseBoundaryMissingCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentInsertionShapeCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentInsertionMismatchCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentMissingLocalRightPageCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentLocalSplitShapeCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentRebaseFailureCount}, dirtyParentDescribeMatches={diagnostics.ExplicitLeafRebaseRejectDirtyParentDescribedInsertionMatchCount}, interiorRebases={diagnostics.ExplicitInteriorRebaseAttemptCount}/{diagnostics.ExplicitInteriorRebaseSuccessCount}/{diagnostics.ExplicitInteriorRebaseStructuralRejectCount}/{diagnostics.ExplicitInteriorRebaseCapacityRejectCount}, explicitPendingCommitWaits={diagnostics.ExplicitPendingCommitWaitCount}, avgExplicitPendingCommitWaitMs={AverageMilliseconds(diagnostics.ExplicitPendingCommitWaitTicks, diagnostics.ExplicitPendingCommitWaitCount):F3}, explicitHeaderPreparations={diagnostics.ExplicitHeaderPreparationCount}, avgExplicitHeaderPreparationMs={AverageMilliseconds(diagnostics.ExplicitHeaderPreparationTicks, diagnostics.ExplicitHeaderPreparationCount):F3}, explicitPendingCommitReservations={diagnostics.ExplicitPendingCommitReservationCount}, avgExplicitPendingCommitReservationMs={AverageMilliseconds(diagnostics.ExplicitPendingCommitReservationTicks, diagnostics.ExplicitPendingCommitReservationCount):F3}, durableBatchWindowWaits={diagnostics.DurableBatchWindowWaitCount}, avgDurableBatchWindowWaitMs={AverageMilliseconds(diagnostics.DurableBatchWindowWaitTicks, diagnostics.DurableBatchWindowWaitCount):F3}, pendingCommitWrites={diagnostics.PendingCommitWriteCount}, avgPendingCommitWriteMs={AverageMilliseconds(diagnostics.PendingCommitWriteTicks, diagnostics.PendingCommitWriteCount):F3}, pendingCommitDrains={diagnostics.PendingCommitDrainCount}, avgPendingCommitDrainMs={AverageMilliseconds(diagnostics.PendingCommitDrainTicks, diagnostics.PendingCommitDrainCount):F3}, bufferedFlushes={diagnostics.BufferedFlushCount}, avgBufferedFlushMs={AverageMilliseconds(diagnostics.BufferedFlushTicks, diagnostics.BufferedFlushCount):F3}, durableFlushes={diagnostics.DurableFlushCount}, avgDurableFlushMs={AverageMilliseconds(diagnostics.DurableFlushTicks, diagnostics.DurableFlushCount):F3}, publishBatches={diagnostics.PublishBatchCount}, avgPublishBatchMs={AverageMilliseconds(diagnostics.PublishBatchTicks, diagnostics.PublishBatchCount):F3}, finalizations={diagnostics.FinalizeCommitCount}, avgFinalizeMs={AverageMilliseconds(diagnostics.FinalizeCommitTicks, diagnostics.FinalizeCommitCount):F3}, checkpointDecisions={diagnostics.CheckpointDecisionCount}, avgCheckpointDecisionMs={AverageMilliseconds(diagnostics.CheckpointDecisionTicks, diagnostics.CheckpointDecisionCount):F3}, backgroundCheckpointStarts={diagnostics.BackgroundCheckpointStartCount}, btreeLeafSplits={diagnostics.BTreeLeafSplitCount}, btreeRightEdgeLeafSplits={diagnostics.BTreeRightEdgeLeafSplitCount}, btreeInteriorInserts={diagnostics.BTreeInteriorInsertCount}, btreeRightEdgeInteriorInserts={diagnostics.BTreeRightEdgeInteriorInsertCount}, btreeInteriorSplits={diagnostics.BTreeInteriorSplitCount}, btreeRightEdgeInteriorSplits={diagnostics.BTreeRightEdgeInteriorSplitCount}, btreeRootSplits={diagnostics.BTreeRootSplitCount}, hotBTreeResources={hotBTreeResources}, hashedAppendContext={diagnostics.HashedIndexAppendContextHitCount}/{diagnostics.HashedIndexAppendContextMissCount}, hashedAppendExternalMetadataReads={diagnostics.HashedIndexAppendExternalMetadataReadCount}, hashedAppendPromotions={diagnostics.HashedIndexAppendPromotionCount}, hashedAppendNotApplicable={diagnostics.HashedIndexAppendNotApplicableCount}, hashedAppendDeferred={diagnostics.HashedIndexDeferredAppendCount}/{diagnostics.HashedIndexDeferredFlushCount}, maxPendingCommits={diagnostics.MaxPendingCommitCount}, maxPendingCommitKiB={maxPendingCommitKiB:F1}"); + $"walAppends={diagnostics.WalAppendCount}, avgWalAppendMs={AverageMilliseconds(diagnostics.WalAppendTicks, diagnostics.WalAppendCount):F3}, explicitCommitLockWaits={diagnostics.ExplicitCommitLockWaitCount}, avgExplicitCommitLockWaitMs={AverageMilliseconds(diagnostics.ExplicitCommitLockWaitTicks, diagnostics.ExplicitCommitLockWaitCount):F3}, explicitCommitLockHolds={diagnostics.ExplicitCommitLockHoldCount}, avgExplicitCommitLockHoldMs={AverageMilliseconds(diagnostics.ExplicitCommitLockHoldTicks, diagnostics.ExplicitCommitLockHoldCount):F3}, explicitConflictResolutions={diagnostics.ExplicitConflictResolutionCount}, avgExplicitConflictResolutionMs={AverageMilliseconds(diagnostics.ExplicitConflictResolutionTicks, diagnostics.ExplicitConflictResolutionCount):F3}, leafRebases={diagnostics.ExplicitLeafRebaseAttemptCount}/{diagnostics.ExplicitLeafRebaseSuccessCount}/{diagnostics.ExplicitLeafRebaseStructuralRejectCount}/{diagnostics.ExplicitLeafRebaseCapacityRejectCount}, pendingLeafRebases={diagnostics.ExplicitPendingLeafRebaseAttemptCount}/{diagnostics.ExplicitPendingLeafRebaseSuccessCount}/{diagnostics.ExplicitPendingLeafRebaseRejectCount}, leafRejectReasons={diagnostics.ExplicitLeafRebaseRejectNonInsertOnlyCount}/{diagnostics.ExplicitLeafRebaseRejectDuplicateKeyCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackPreconditionCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackShapeCount}/{diagnostics.ExplicitLeafRebaseRejectOtherCount}, leafSplitPreconditions={diagnostics.ExplicitLeafRebaseRejectSplitFallbackMissingTraversalCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackDirtyAncestorCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackParentBoundaryCount}/{diagnostics.ExplicitLeafRebaseRejectSplitFallbackTargetPageDirtyCount}, dirtyParentRecoveries={diagnostics.ExplicitLeafRebaseRejectDirtyParentMissingParentPageCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentTransactionLeafNotSplitCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentBaseBoundaryMissingCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentInsertionShapeCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentInsertionMismatchCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentMissingLocalRightPageCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentLocalSplitShapeCount}/{diagnostics.ExplicitLeafRebaseRejectDirtyParentRebaseFailureCount}, dirtyParentDescribeMatches={diagnostics.ExplicitLeafRebaseRejectDirtyParentDescribedInsertionMatchCount}, interiorRebases={diagnostics.ExplicitInteriorRebaseAttemptCount}/{diagnostics.ExplicitInteriorRebaseSuccessCount}/{diagnostics.ExplicitInteriorRebaseStructuralRejectCount}/{diagnostics.ExplicitInteriorRebaseCapacityRejectCount}, explicitPendingCommitWaits={diagnostics.ExplicitPendingCommitWaitCount}, avgExplicitPendingCommitWaitMs={AverageMilliseconds(diagnostics.ExplicitPendingCommitWaitTicks, diagnostics.ExplicitPendingCommitWaitCount):F3}, explicitHeaderPreparations={diagnostics.ExplicitHeaderPreparationCount}, avgExplicitHeaderPreparationMs={AverageMilliseconds(diagnostics.ExplicitHeaderPreparationTicks, diagnostics.ExplicitHeaderPreparationCount):F3}, explicitPendingCommitReservations={diagnostics.ExplicitPendingCommitReservationCount}, avgExplicitPendingCommitReservationMs={AverageMilliseconds(diagnostics.ExplicitPendingCommitReservationTicks, diagnostics.ExplicitPendingCommitReservationCount):F3}, durableBatchWindowWaits={diagnostics.DurableBatchWindowWaitCount}, avgDurableBatchWindowWaitMs={AverageMilliseconds(diagnostics.DurableBatchWindowWaitTicks, diagnostics.DurableBatchWindowWaitCount):F3}, pendingCommitWrites={diagnostics.PendingCommitWriteCount}, avgPendingCommitWriteMs={AverageMilliseconds(diagnostics.PendingCommitWriteTicks, diagnostics.PendingCommitWriteCount):F3}, pendingCommitDrains={diagnostics.PendingCommitDrainCount}, avgPendingCommitDrainMs={AverageMilliseconds(diagnostics.PendingCommitDrainTicks, diagnostics.PendingCommitDrainCount):F3}, bufferedFlushes={diagnostics.BufferedFlushCount}, avgBufferedFlushMs={AverageMilliseconds(diagnostics.BufferedFlushTicks, diagnostics.BufferedFlushCount):F3}, durableFlushes={diagnostics.DurableFlushCount}, avgDurableFlushMs={AverageMilliseconds(diagnostics.DurableFlushTicks, diagnostics.DurableFlushCount):F3}, publishBatches={diagnostics.PublishBatchCount}, avgPublishBatchMs={AverageMilliseconds(diagnostics.PublishBatchTicks, diagnostics.PublishBatchCount):F3}, finalizations={diagnostics.FinalizeCommitCount}, avgFinalizeMs={AverageMilliseconds(diagnostics.FinalizeCommitTicks, diagnostics.FinalizeCommitCount):F3}, checkpointDecisions={diagnostics.CheckpointDecisionCount}, avgCheckpointDecisionMs={AverageMilliseconds(diagnostics.CheckpointDecisionTicks, diagnostics.CheckpointDecisionCount):F3}, backgroundCheckpointStarts={diagnostics.BackgroundCheckpointStartCount}, btreeLeafSplits={diagnostics.BTreeLeafSplitCount}, btreeRightEdgeLeafSplits={diagnostics.BTreeRightEdgeLeafSplitCount}, btreeInteriorInserts={diagnostics.BTreeInteriorInsertCount}, btreeRightEdgeInteriorInserts={diagnostics.BTreeRightEdgeInteriorInsertCount}, btreeInteriorSplits={diagnostics.BTreeInteriorSplitCount}, btreeRightEdgeInteriorSplits={diagnostics.BTreeRightEdgeInteriorSplitCount}, btreeRootSplits={diagnostics.BTreeRootSplitCount}, hotBTreeResources={hotBTreeResources}, hashedAppendContext={diagnostics.HashedIndexAppendContextHitCount}/{diagnostics.HashedIndexAppendContextMissCount}, hashedAppendExternalMetadataReads={diagnostics.HashedIndexAppendExternalMetadataReadCount}, hashedAppendPromotions={diagnostics.HashedIndexAppendPromotionCount}, hashedAppendNotApplicable={diagnostics.HashedIndexAppendNotApplicableCount}, hashedAppendDeferred={diagnostics.HashedIndexDeferredAppendCount}/{diagnostics.HashedIndexDeferredFlushCount}, maxPendingCommits={diagnostics.MaxPendingCommitCount}, maxPendingCommitKiB={maxPendingCommitKiB:F1}"); } private static string BuildHotBTreeResourceSummary(CommitPathBTreeResourceDiagnosticsSnapshot[]? resources) diff --git a/tests/CSharpDB.Benchmarks/Macro/AdaptiveReoptimizationBenchmark.cs b/tests/CSharpDB.Benchmarks/Macro/AdaptiveReoptimizationBenchmark.cs new file mode 100644 index 00000000..ec35f926 --- /dev/null +++ b/tests/CSharpDB.Benchmarks/Macro/AdaptiveReoptimizationBenchmark.cs @@ -0,0 +1,523 @@ +using System.Diagnostics; +using CSharpDB.Benchmarks.Infrastructure; +using CSharpDB.Engine; +using CSharpDB.Execution; +using CSharpDB.Primitives; +using CSharpDB.Sql; + +namespace CSharpDB.Benchmarks.Macro; + +/// +/// Focused diagnostics for opt-in adaptive join re-optimization. +/// Rows report internal adaptive counters so the benchmark is useful even when +/// a workload is intentionally stable and should not switch plans. +/// +public static class AdaptiveReoptimizationBenchmark +{ + private const int Iterations = 80; + + private static readonly AdaptiveScenario[] s_scenarios = + [ + new( + "DisabledStableJoin", + false, + AdaptiveThresholdMode.Default, + "SELECT COUNT(*) FROM adaptive_orders o JOIN adaptive_customers c ON c.id = o.customer_id WHERE o.status = 2", + "default-disabled baseline for a stable selective join"), + new( + "EnabledStableNoSwitch", + true, + AdaptiveThresholdMode.High, + "SELECT COUNT(*) FROM adaptive_orders o JOIN adaptive_customers c ON c.id = o.customer_id WHERE o.status = 2", + "enabled wrapper overhead when thresholds avoid adaptation"), + new( + "StaleStatJoinFanout", + true, + AdaptiveThresholdMode.Low, + "SELECT COUNT(*) FROM adaptive_orders o JOIN adaptive_customers c ON c.id = o.customer_id WHERE o.amount > 0", + "large post-ANALYZE fan-out shape that exposes stale range-stat divergence when the plan is lookup-driven"), + new( + "ParameterSensitiveSmall", + true, + AdaptiveThresholdMode.Low, + "SELECT COUNT(*) FROM adaptive_orders o JOIN adaptive_customers c ON c.id = o.customer_id WHERE o.status = 3", + "small selective value from the same cached query family"), + new( + "WrongHashBuildSide", + true, + AdaptiveThresholdMode.Low, + "SELECT COUNT(*) FROM adaptive_hash_left l JOIN adaptive_hash_right r ON r.code = l.code WHERE l.keep = 1 AND r.flag > 0", + "hash-build-side diagnostics for a large observed build candidate"), + ]; + + public static async Task> RunAsync() + { + string seedPath = await CreateSeedDatabaseAsync(); + try + { + var results = new List(s_scenarios.Length + 2); + foreach (AdaptiveScenario scenario in s_scenarios) + results.Add(await RunScenarioAsync(seedPath, scenario)); + + results.Add(await RunSyntheticIndexSwitchAsync()); + results.Add(await RunSyntheticHashBuildSwitchAsync()); + + return results; + } + finally + { + DeleteDatabaseFiles(seedPath); + } + } + + private static async Task RunScenarioAsync(string seedPath, AdaptiveScenario scenario) + { + string databasePath = CloneDatabaseFiles(seedPath, "adaptive-reoptimization"); + try + { + DatabaseOptions options = BenchmarkDurability.Apply(); + if (scenario.Enabled) + options = options.EnableAdaptiveQueryReoptimization(builder => ConfigureThresholds(builder, scenario.ThresholdMode)); + + await using var db = await Database.OpenAsync(databasePath, options); + db.ResetAdaptiveQueryReoptimizationDiagnostics(); + + for (int i = 0; i < 6; i++) + _ = await ExecuteScalarCountAsync(db, scenario.Sql); + + var histogram = new LatencyHistogram(); + long checksum = 0; + var sw = Stopwatch.StartNew(); + for (int i = 0; i < Iterations; i++) + { + var querySw = Stopwatch.StartNew(); + checksum += await ExecuteScalarCountAsync(db, scenario.Sql); + querySw.Stop(); + histogram.Record(querySw.Elapsed.TotalMilliseconds); + } + + sw.Stop(); + var diagnostics = db.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + + var result = new BenchmarkResult + { + Name = $"AdaptiveReoptimization_{scenario.Name}", + TotalOps = Iterations, + ElapsedMs = sw.Elapsed.TotalMilliseconds, + P50Ms = histogram.Percentile(0.50), + P90Ms = histogram.Percentile(0.90), + P95Ms = histogram.Percentile(0.95), + P99Ms = histogram.Percentile(0.99), + P999Ms = histogram.Percentile(0.999), + MinMs = histogram.Min, + MaxMs = histogram.Max, + MeanMs = histogram.Mean, + StdDevMs = histogram.StdDev, + ExtraInfo = + $"enabled={scenario.Enabled}, mode={scenario.ThresholdMode}, checksum={checksum}, " + + $"eligible={diagnostics.EligibleQueryCount}, attempts={diagnostics.AttemptCount}, " + + $"switches={diagnostics.SuccessfulSwitchCount}, rejected={diagnostics.RejectedSwitchCount}, " + + $"divergence={diagnostics.DivergenceEventCount}, bufferedRows={diagnostics.BufferedRowCount}, " + + $"maxBufferedFallback={diagnostics.MaxBufferedFallbackCount}, unsupportedFallback={diagnostics.UnsupportedFallbackCount}, " + + $"limitFallback={diagnostics.ReoptimizationLimitFallbackCount}, focus={scenario.Focus}", + }; + + Console.WriteLine( + $" {result.Name}: {result.OpsPerSecond:N0} queries/sec, P50={result.P50Ms:F4}ms, P99={result.P99Ms:F4}ms"); + Console.WriteLine($" {result.ExtraInfo}"); + return result; + } + finally + { + DeleteDatabaseFiles(databasePath); + } + } + + private static async Task RunSyntheticIndexSwitchAsync() + { + var rows = CreateSingleColumnRows(4_096); + ColumnDefinition[] schema = [new() { Name = "value", Type = DbType.Integer }]; + var counters = new AdaptiveRuntimeCounters(); + + for (int i = 0; i < 6; i++) + _ = await ExecuteSyntheticIndexSwitchOnceAsync(rows, schema, counters); + + var histogram = new LatencyHistogram(); + long checksum = 0; + var sw = Stopwatch.StartNew(); + for (int i = 0; i < Iterations; i++) + { + var querySw = Stopwatch.StartNew(); + checksum += await ExecuteSyntheticIndexSwitchOnceAsync(rows, schema, counters); + querySw.Stop(); + histogram.Record(querySw.Elapsed.TotalMilliseconds); + } + + sw.Stop(); + var result = CreateSyntheticResult( + "AdaptiveReoptimization_SyntheticIndexSwitch", + sw.Elapsed.TotalMilliseconds, + histogram, + checksum, + counters, + "operator-level index-to-hash switch validation with a deliberately low outer estimate"); + PrintResult(result); + return result; + } + + private static async Task RunSyntheticHashBuildSwitchAsync() + { + ColumnDefinition[] leftSchema = + [ + new() { Name = "id", Type = DbType.Integer }, + new() { Name = "code", Type = DbType.Integer }, + ]; + ColumnDefinition[] rightSchema = + [ + new() { Name = "code", Type = DbType.Integer }, + new() { Name = "payload", Type = DbType.Integer }, + ]; + var leftRows = Enumerable.Range(1, 64) + .Select(i => new[] { DbValue.FromInteger(i), DbValue.FromInteger(i) }) + .ToList(); + var rightRows = Enumerable.Range(1, 4_096) + .Select(i => new[] { DbValue.FromInteger(((i - 1) % 64) + 1), DbValue.FromInteger(i * 10) }) + .ToList(); + var counters = new AdaptiveRuntimeCounters(); + + for (int i = 0; i < 6; i++) + _ = await ExecuteSyntheticHashSwitchOnceAsync(leftRows, rightRows, leftSchema, rightSchema, counters); + + var histogram = new LatencyHistogram(); + long checksum = 0; + var sw = Stopwatch.StartNew(); + for (int i = 0; i < Iterations; i++) + { + var querySw = Stopwatch.StartNew(); + checksum += await ExecuteSyntheticHashSwitchOnceAsync(leftRows, rightRows, leftSchema, rightSchema, counters); + querySw.Stop(); + histogram.Record(querySw.Elapsed.TotalMilliseconds); + } + + sw.Stop(); + var result = CreateSyntheticResult( + "AdaptiveReoptimization_SyntheticHashBuildSwitch", + sw.Elapsed.TotalMilliseconds, + histogram, + checksum, + counters, + "operator-level hash build-side switch validation with a deliberately low build estimate"); + PrintResult(result); + return result; + } + + private static async Task ExecuteSyntheticIndexSwitchOnceAsync( + List rows, + ColumnDefinition[] schema, + AdaptiveRuntimeCounters counters) + { + var lease = CreateSyntheticLease(); + var op = new AdaptiveIndexNestedLoopJoinOperator( + new MaterializedOperator(rows, schema), + new MaterializedOperator([], schema), + schema, + source => source, + source => source, + lease, + counters.Diagnostics, + estimatedOuterRows: 64, + estimatedRowCount: rows.Count); + + long count = 0; + await op.OpenAsync(); + try + { + while (await op.MoveNextAsync()) + count++; + } + finally + { + await op.DisposeAsync(); + } + + return count; + } + + private static async Task ExecuteSyntheticHashSwitchOnceAsync( + List leftRows, + List rightRows, + ColumnDefinition[] leftSchema, + ColumnDefinition[] rightSchema, + AdaptiveRuntimeCounters counters) + { + var compositeSchema = new TableSchema + { + TableName = "synthetic_hash_join", + Columns = leftSchema.Concat(rightSchema).ToArray(), + }; + var lease = CreateSyntheticLease(); + var op = new AdaptiveHashJoinOperator( + new MaterializedOperator(leftRows, leftSchema), + new MaterializedOperator(rightRows, rightSchema), + JoinType.Inner, + residualCondition: null, + compositeSchema, + leftColCount: leftSchema.Length, + rightColCount: rightSchema.Length, + leftKeyIndices: [1], + rightKeyIndices: [0], + plannedBuildRightSide: true, + estimatedLeftRows: leftRows.Count, + estimatedRightRows: 64, + estimatedRowCount: rightRows.Count, + DbFunctionRegistry.Empty, + lease, + counters.Diagnostics); + + long count = 0; + await op.OpenAsync(); + try + { + while (await op.MoveNextAsync()) + count++; + } + finally + { + await op.DisposeAsync(); + } + + return count; + } + + private static BenchmarkResult CreateSyntheticResult( + string name, + double elapsedMs, + LatencyHistogram histogram, + long checksum, + AdaptiveRuntimeCounters counters, + string focus) + { + return new BenchmarkResult + { + Name = name, + TotalOps = Iterations, + ElapsedMs = elapsedMs, + P50Ms = histogram.Percentile(0.50), + P90Ms = histogram.Percentile(0.90), + P95Ms = histogram.Percentile(0.95), + P99Ms = histogram.Percentile(0.99), + P999Ms = histogram.Percentile(0.999), + MinMs = histogram.Min, + MaxMs = histogram.Max, + MeanMs = histogram.Mean, + StdDevMs = histogram.StdDev, + ExtraInfo = + $"enabled=True, mode=Synthetic, checksum={checksum}, eligible=0, " + + $"attempts={counters.AttemptCount}, switches={counters.SuccessfulSwitchCount}, " + + $"rejected={counters.RejectedSwitchCount}, divergence={counters.DivergenceCount}, " + + $"bufferedRows={counters.BufferedRowCount}, maxBufferedFallback={counters.MaxBufferedFallbackCount}, " + + $"unsupportedFallback={counters.UnsupportedFallbackCount}, limitFallback={counters.ReoptimizationLimitFallbackCount}, " + + $"focus={focus}", + }; + } + + private static AdaptiveQueryExecutionLease CreateSyntheticLease() + => new(new AdaptiveQueryReoptimizationOptions + { + Enabled = true, + DivergenceFactor = 2, + MinimumObservedRows = 64, + MaxBufferedRows = 65_536, + MaxReoptimizationsPerQuery = 1, + }); + + private static AdaptiveQueryReoptimizationOptionsBuilder ConfigureThresholds( + AdaptiveQueryReoptimizationOptionsBuilder builder, + AdaptiveThresholdMode mode) + { + return mode switch + { + AdaptiveThresholdMode.Low => builder + .WithDivergenceFactor(2) + .WithMinimumObservedRows(32) + .WithMaxBufferedRows(65_536) + .WithMaxReoptimizationsPerQuery(1), + AdaptiveThresholdMode.High => builder + .WithDivergenceFactor(128) + .WithMinimumObservedRows(65_536) + .WithMaxBufferedRows(65_536) + .WithMaxReoptimizationsPerQuery(1), + _ => builder, + }; + } + + private static async Task ExecuteScalarCountAsync(Database db, string sql) + { + await using var result = await db.ExecuteAsync(sql); + if (!await result.MoveNextAsync()) + throw new InvalidOperationException($"Query did not return a row: {sql}"); + + return result.Current[0].AsInteger; + } + + private static async Task CreateSeedDatabaseAsync() + { + string filePath = Path.Combine(Path.GetTempPath(), $"adaptive_reoptimization_seed_{Guid.NewGuid():N}.db"); + await using var db = await Database.OpenAsync(filePath, BenchmarkDurability.Apply()); + + await db.ExecuteAsync("CREATE TABLE adaptive_customers (id INTEGER PRIMARY KEY, region INTEGER NOT NULL, segment INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE adaptive_regions (id INTEGER PRIMARY KEY, name TEXT NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE adaptive_orders (id INTEGER PRIMARY KEY, customer_id INTEGER NOT NULL, region INTEGER NOT NULL, status INTEGER NOT NULL, amount INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE adaptive_hash_left (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, keep INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE adaptive_hash_right (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, flag INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE INDEX idx_adaptive_orders_status ON adaptive_orders(status)"); + await db.ExecuteAsync("CREATE INDEX idx_adaptive_orders_customer ON adaptive_orders(customer_id)"); + await db.ExecuteAsync("CREATE INDEX idx_adaptive_orders_region ON adaptive_orders(region)"); + + await db.BeginTransactionAsync(); + for (int region = 1; region <= 16; region++) + await db.ExecuteAsync($"INSERT INTO adaptive_regions VALUES ({region}, 'Region {region}')"); + + for (int customer = 1; customer <= 2_000; customer++) + { + int region = ((customer - 1) % 16) + 1; + int segment = customer <= 64 ? 1 : 2; + await db.ExecuteAsync($"INSERT INTO adaptive_customers VALUES ({customer}, {region}, {segment})"); + } + + int orderId = 1; + for (; orderId <= 256; orderId++) + { + int customerId = ((orderId - 1) % 2_000) + 1; + int region = ((customerId - 1) % 16) + 1; + int status = orderId <= 32 ? 1 : 3; + await db.ExecuteAsync($"INSERT INTO adaptive_orders VALUES ({orderId}, {customerId}, {region}, {status}, {orderId * 7})"); + } + + for (int id = 1; id <= 64; id++) + await db.ExecuteAsync($"INSERT INTO adaptive_hash_left VALUES ({id}, {id}, 1)"); + + for (int id = 1; id <= 16; id++) + await db.ExecuteAsync($"INSERT INTO adaptive_hash_right VALUES ({id}, {id}, 1, {id * 11})"); + + await db.CommitAsync(); + + await db.ExecuteAsync("ANALYZE adaptive_customers"); + await db.ExecuteAsync("ANALYZE adaptive_regions"); + await db.ExecuteAsync("ANALYZE adaptive_orders"); + await db.ExecuteAsync("ANALYZE adaptive_hash_left"); + await db.ExecuteAsync("ANALYZE adaptive_hash_right"); + + await db.BeginTransactionAsync(); + for (; orderId <= 24_000; orderId++) + { + int customerId = ((orderId - 1) % 2_000) + 1; + int region = ((customerId - 1) % 16) + 1; + int status = orderId <= 20_000 ? 1 : 2; + await db.ExecuteAsync($"INSERT INTO adaptive_orders VALUES ({orderId}, {customerId}, {region}, {status}, {orderId * 7})"); + } + + for (int id = 17; id <= 24_000; id++) + { + int code = ((id - 1) % 64) + 1; + await db.ExecuteAsync($"INSERT INTO adaptive_hash_right VALUES ({id}, {code}, 1, {id * 11})"); + } + + await db.CommitAsync(); + await db.CheckpointAsync(); + + return filePath; + } + + private static string CloneDatabaseFiles(string sourceFilePath, string prefix) + { + string destinationFilePath = Path.Combine(Path.GetTempPath(), $"{prefix}_{Guid.NewGuid():N}.db"); + File.Copy(sourceFilePath, destinationFilePath, overwrite: true); + + string sourceWalPath = sourceFilePath + ".wal"; + if (File.Exists(sourceWalPath)) + File.Copy(sourceWalPath, destinationFilePath + ".wal", overwrite: true); + + return destinationFilePath; + } + + private static void DeleteDatabaseFiles(string? filePath) + { + if (string.IsNullOrWhiteSpace(filePath)) + return; + + try { if (File.Exists(filePath)) File.Delete(filePath); } catch { } + try { if (File.Exists(filePath + ".wal")) File.Delete(filePath + ".wal"); } catch { } + } + + private static List CreateSingleColumnRows(int count) + { + var rows = new List(count); + for (int i = 1; i <= count; i++) + rows.Add([DbValue.FromInteger(i)]); + + return rows; + } + + private static void PrintResult(BenchmarkResult result) + { + Console.WriteLine( + $" {result.Name}: {result.OpsPerSecond:N0} ops/sec, P50={result.P50Ms:F4}ms, P99={result.P99Ms:F4}ms"); + Console.WriteLine($" {result.ExtraInfo}"); + } + + private sealed record AdaptiveScenario( + string Name, + bool Enabled, + AdaptiveThresholdMode ThresholdMode, + string Sql, + string Focus); + + private enum AdaptiveThresholdMode + { + Default, + Low, + High, + } + + private sealed class AdaptiveRuntimeCounters + { + public long AttemptCount { get; private set; } + public long SuccessfulSwitchCount { get; private set; } + public long RejectedSwitchCount { get; private set; } + public long DivergenceCount { get; private set; } + public long BufferedRowCount { get; private set; } + public long MaxBufferedFallbackCount { get; private set; } + public long ReoptimizationLimitFallbackCount { get; private set; } + public long UnsupportedFallbackCount { get; private set; } + + public AdaptiveQueryReoptimizationRuntimeDiagnostics Diagnostics { get; } + + public AdaptiveRuntimeCounters() + { + Diagnostics = new AdaptiveQueryReoptimizationRuntimeDiagnostics( + () => AttemptCount++, + () => SuccessfulSwitchCount++, + RecordRejectedSwitch, + () => DivergenceCount++, + count => BufferedRowCount += count); + } + + private void RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason reason) + { + RejectedSwitchCount++; + switch (reason) + { + case AdaptiveQueryReoptimizationFallbackReason.MaxBufferedRows: + MaxBufferedFallbackCount++; + break; + case AdaptiveQueryReoptimizationFallbackReason.ReoptimizationLimit: + ReoptimizationLimitFallbackCount++; + break; + case AdaptiveQueryReoptimizationFallbackReason.Unsupported: + UnsupportedFallbackCount++; + break; + } + } + } +} diff --git a/tests/CSharpDB.Benchmarks/Macro/AsyncIoCloseOutBenchmark.cs b/tests/CSharpDB.Benchmarks/Macro/AsyncIoCloseOutBenchmark.cs new file mode 100644 index 00000000..883d1a3f --- /dev/null +++ b/tests/CSharpDB.Benchmarks/Macro/AsyncIoCloseOutBenchmark.cs @@ -0,0 +1,397 @@ +using System.Diagnostics; +using CSharpDB.Benchmarks.Infrastructure; +using CSharpDB.Engine; +using CSharpDB.Primitives; +using CSharpDB.Storage.Checkpointing; +using CSharpDB.Storage.Diagnostics; +using CSharpDB.Storage.Paging; + +namespace CSharpDB.Benchmarks.Macro; + +/// +/// Measures the audited non-hot async I/O paths used to close the batching phase. +/// Rows are diagnostic: they validate coverage and track throughput, but do not +/// imply these paths should block releases without an explicit threshold. +/// +public static class AsyncIoCloseOutBenchmark +{ + private const int MaintenanceRows = 12_000; + private const int ForeignKeyRows = 8_000; + private const int WalRows = 1_200; + + public static async Task> RunAsync() + { + string maintenanceSeedPath = await CreateMaintenanceSeedDatabaseAsync(); + try + { + return + [ + await RunSaveToFileAsync(maintenanceSeedPath), + await RunBackupAsync(maintenanceSeedPath), + await RunRestoreStagingAsync(maintenanceSeedPath), + await RunVacuumAsync(maintenanceSeedPath), + await RunForeignKeyMigrationAsync(), + await RunDatabaseInspectorAsync(maintenanceSeedPath), + await RunWalInspectorAsync(), + ]; + } + finally + { + DeleteDatabaseFiles(maintenanceSeedPath); + } + } + + private static async Task RunSaveToFileAsync(string seedPath) + { + string sourcePath = CloneDatabaseFiles(seedPath, "async-io-save-source"); + string destinationPath = Path.Combine(Path.GetTempPath(), $"async_io_save_{Guid.NewGuid():N}.db"); + try + { + await using var db = await Database.OpenAsync(sourcePath, BenchmarkDurability.Apply()); + var sw = Stopwatch.StartNew(); + await db.SaveToFileAsync(destinationPath); + sw.Stop(); + + long bytes = GetFileLength(destinationPath); + return PrintFileResult( + "AsyncIoCloseOut_SaveToFile", + bytes, + sw.Elapsed.TotalMilliseconds, + "alreadyBatched=StorageDeviceCopyBatcher, path=Database.SaveToFileAsync"); + } + finally + { + DeleteDatabaseFiles(sourcePath); + DeleteDatabaseFiles(destinationPath); + } + } + + private static async Task RunBackupAsync(string seedPath) + { + string sourcePath = CloneDatabaseFiles(seedPath, "async-io-backup-source"); + string destinationPath = Path.Combine(Path.GetTempPath(), $"async_io_backup_{Guid.NewGuid():N}.db"); + try + { + await using var db = await Database.OpenAsync(sourcePath, BenchmarkDurability.Apply()); + var sw = Stopwatch.StartNew(); + DatabaseBackupResult backup = await DatabaseBackupCoordinator.BackupAsync( + db, + sourcePath, + destinationPath, + withManifest: false); + sw.Stop(); + + return PrintFileResult( + "AsyncIoCloseOut_Backup", + backup.DatabaseFileBytes, + sw.Elapsed.TotalMilliseconds, + $"alreadyBatched=StorageDeviceCopyBatcher, path=DatabaseBackupCoordinator.BackupAsync, pages={backup.PhysicalPageCount}"); + } + finally + { + DeleteDatabaseFiles(sourcePath); + DeleteDatabaseFiles(destinationPath); + } + } + + private static async Task RunRestoreStagingAsync(string seedPath) + { + string sourcePath = CloneDatabaseFiles(seedPath, "async-io-restore-source"); + string destinationPath = Path.Combine(Path.GetTempPath(), $"async_io_restore_{Guid.NewGuid():N}.db"); + try + { + var sw = Stopwatch.StartNew(); + DatabaseRestoreResult restore = await DatabaseBackupCoordinator.RestoreAsync( + sourcePath, + destinationPath, + static _ => ValueTask.CompletedTask); + sw.Stop(); + + long bytes = GetFileLength(destinationPath); + return PrintFileResult( + "AsyncIoCloseOut_RestoreStaging", + bytes, + sw.Elapsed.TotalMilliseconds, + $"alreadyBatched=LoadIntoMemory+SaveToFile staging, path=DatabaseBackupCoordinator.RestoreAsync, pages={restore.PhysicalPageCount}"); + } + finally + { + DeleteDatabaseFiles(sourcePath); + DeleteDatabaseFiles(destinationPath); + } + } + + private static async Task RunVacuumAsync(string seedPath) + { + string sourcePath = CloneDatabaseFiles(seedPath, "async-io-vacuum-source"); + try + { + var sw = Stopwatch.StartNew(); + DatabaseVacuumResult vacuum = await DatabaseMaintenanceCoordinator.VacuumAsync(sourcePath); + sw.Stop(); + + int totalOps = Math.Max(1, vacuum.PhysicalPageCountBefore); + var result = BuildSingleOperationResult( + "AsyncIoCloseOut_VacuumLogicalRewrite", + totalOps, + sw.Elapsed.TotalMilliseconds, + $"intentionallyLogical=BTreeCopyUtility, path=DatabaseMaintenanceCoordinator.VacuumAsync, pagesBefore={vacuum.PhysicalPageCountBefore}, pagesAfter={vacuum.PhysicalPageCountAfter}, bytesBefore={vacuum.DatabaseFileBytesBefore}, bytesAfter={vacuum.DatabaseFileBytesAfter}"); + + PrintResult(result, "pages/sec"); + return result; + } + finally + { + DeleteDatabaseFiles(sourcePath); + } + } + + private static async Task RunForeignKeyMigrationAsync() + { + string sourcePath = await CreateForeignKeyMigrationSeedDatabaseAsync(); + string backupPath = Path.Combine(Path.GetTempPath(), $"async_io_fk_backup_{Guid.NewGuid():N}.db"); + try + { + var request = new DatabaseForeignKeyMigrationRequest + { + BackupDestinationPath = backupPath, + Constraints = + [ + new DatabaseForeignKeyMigrationConstraintSpec + { + TableName = "child_rows", + ColumnName = "parent_id", + ReferencedTableName = "parent_rows", + ReferencedColumnName = "id", + OnDelete = ForeignKeyOnDeleteAction.Restrict, + }, + ], + }; + + var sw = Stopwatch.StartNew(); + DatabaseForeignKeyMigrationResult migration = + await DatabaseMaintenanceCoordinator.MigrateForeignKeysAsync(sourcePath, request); + sw.Stop(); + + int copiedRows = (int)Math.Clamp(migration.CopiedRows, 1, int.MaxValue); + var result = BuildSingleOperationResult( + "AsyncIoCloseOut_ForeignKeyMigrationRewrite", + copiedRows, + sw.Elapsed.TotalMilliseconds, + $"intentionallyLogical=BTreeCopyUtility, path=DatabaseMaintenanceCoordinator.MigrateForeignKeysAsync, affectedTables={migration.AffectedTables}, appliedForeignKeys={migration.AppliedForeignKeys}, copiedRows={migration.CopiedRows}, backup=true"); + + PrintResult(result, "rows/sec"); + return result; + } + finally + { + DeleteDatabaseFiles(sourcePath); + DeleteDatabaseFiles(backupPath); + } + } + + private static async Task RunDatabaseInspectorAsync(string seedPath) + { + string sourcePath = CloneDatabaseFiles(seedPath, "async-io-inspect-source"); + try + { + var sw = Stopwatch.StartNew(); + DatabaseInspectReport report = await DatabaseInspector.InspectAsync( + sourcePath, + new DatabaseInspectOptions { IncludePages = true }); + sw.Stop(); + + var result = BuildSingleOperationResult( + "AsyncIoCloseOut_DatabaseInspectorScan", + Math.Max(1, report.Header.PhysicalPageCount), + sw.Elapsed.TotalMilliseconds, + $"auditStatus=specializedDiagnostics, path=DatabaseInspector.InspectAsync, pages={report.Header.PhysicalPageCount}, issues={report.Issues.Count}"); + + PrintResult(result, "pages/sec"); + return result; + } + finally + { + DeleteDatabaseFiles(sourcePath); + } + } + + private static async Task RunWalInspectorAsync() + { + string sourcePath = Path.Combine(Path.GetTempPath(), $"async_io_wal_seed_{Guid.NewGuid():N}.db"); + try + { + var options = BenchmarkDurability.Apply(new DatabaseOptions().ConfigureStorageEngine(builder => + { + builder.UsePagerOptions(new PagerOptions + { + CheckpointPolicy = new FrameCountCheckpointPolicy(int.MaxValue), + }); + })); + + await using var db = await Database.OpenAsync(sourcePath, options); + await db.ExecuteAsync("CREATE TABLE wal_rows (id INTEGER PRIMARY KEY, payload TEXT)"); + + await db.BeginTransactionAsync(); + for (int id = 1; id <= WalRows; id++) + await db.ExecuteAsync($"INSERT INTO wal_rows VALUES ({id}, 'wal_payload_{id}')"); + await db.CommitAsync(); + + var sw = Stopwatch.StartNew(); + WalInspectReport report = await WalInspector.InspectAsync(sourcePath, options: null); + sw.Stop(); + + var result = BuildSingleOperationResult( + "AsyncIoCloseOut_WalInspectorScan", + Math.Max(1, report.FullFrameCount), + sw.Elapsed.TotalMilliseconds, + $"auditStatus=specializedDiagnostics, path=WalInspector.InspectAsync, frames={report.FullFrameCount}, commitFrames={report.CommitFrameCount}, walBytes={report.FileLengthBytes}, issues={report.Issues.Count}"); + + PrintResult(result, "frames/sec"); + return result; + } + finally + { + DeleteDatabaseFiles(sourcePath); + } + } + + private static async Task CreateMaintenanceSeedDatabaseAsync() + { + string filePath = Path.Combine(Path.GetTempPath(), $"async_io_closeout_seed_{Guid.NewGuid():N}.db"); + await using var db = await Database.OpenAsync(filePath, BenchmarkDurability.Apply()); + await db.ExecuteAsync("CREATE TABLE bench (id INTEGER PRIMARY KEY, value INTEGER, text_col TEXT, category TEXT)"); + await db.ExecuteAsync("CREATE INDEX idx_bench_category_value ON bench(category, value)"); + + const int batchSize = 500; + for (int batchStart = 1; batchStart <= MaintenanceRows; batchStart += batchSize) + { + int batchEnd = Math.Min(MaintenanceRows, batchStart + batchSize - 1); + await db.BeginTransactionAsync(); + for (int id = batchStart; id <= batchEnd; id++) + { + string category = GetCategory(id); + await db.ExecuteAsync( + $"INSERT INTO bench VALUES ({id}, {id * 10L}, 'maintenance_payload_{id}', '{category}')"); + } + + await db.CommitAsync(); + } + + await db.BeginTransactionAsync(); + for (int id = 4; id <= MaintenanceRows; id += 4) + await db.ExecuteAsync($"DELETE FROM bench WHERE id = {id}"); + await db.CommitAsync(); + await db.CheckpointAsync(); + + return filePath; + } + + private static async Task CreateForeignKeyMigrationSeedDatabaseAsync() + { + string filePath = Path.Combine(Path.GetTempPath(), $"async_io_fk_seed_{Guid.NewGuid():N}.db"); + await using var db = await Database.OpenAsync(filePath, BenchmarkDurability.Apply()); + await db.ExecuteAsync("CREATE TABLE parent_rows (id INTEGER PRIMARY KEY, payload TEXT)"); + await db.ExecuteAsync("CREATE TABLE child_rows (id INTEGER PRIMARY KEY, parent_id INTEGER NOT NULL, payload TEXT)"); + await db.ExecuteAsync("CREATE INDEX idx_child_rows_parent_id ON child_rows(parent_id)"); + + const int batchSize = 500; + for (int batchStart = 1; batchStart <= ForeignKeyRows; batchStart += batchSize) + { + int batchEnd = Math.Min(ForeignKeyRows, batchStart + batchSize - 1); + await db.BeginTransactionAsync(); + for (int id = batchStart; id <= batchEnd; id++) + { + await db.ExecuteAsync($"INSERT INTO parent_rows VALUES ({id}, 'parent_{id}')"); + await db.ExecuteAsync($"INSERT INTO child_rows VALUES ({id}, {id}, 'child_{id}')"); + } + + await db.CommitAsync(); + } + + await db.CheckpointAsync(); + return filePath; + } + + private static BenchmarkResult PrintFileResult( + string name, + long bytes, + double elapsedMs, + string extraInfo) + { + int pages = Math.Max(1, (int)Math.Ceiling(bytes / (double)PageConstants.PageSize)); + var result = BuildSingleOperationResult( + name, + pages, + elapsedMs, + $"{extraInfo}, bytes={bytes}, pages={pages}"); + + PrintResult(result, "pages/sec"); + return result; + } + + private static BenchmarkResult BuildSingleOperationResult( + string name, + int totalOps, + double elapsedMs, + string extraInfo) + { + return new BenchmarkResult + { + Name = name, + TotalOps = totalOps, + ElapsedMs = elapsedMs, + P50Ms = elapsedMs, + P90Ms = elapsedMs, + P95Ms = elapsedMs, + P99Ms = elapsedMs, + P999Ms = elapsedMs, + MinMs = elapsedMs, + MaxMs = elapsedMs, + MeanMs = elapsedMs, + StdDevMs = 0, + ExtraInfo = extraInfo, + }; + } + + private static void PrintResult(BenchmarkResult result, string unit) + { + Console.WriteLine( + $" {result.Name}: {result.OpsPerSecond:N0} {unit}, elapsed={result.ElapsedMs:F3}ms"); + Console.WriteLine($" {result.ExtraInfo}"); + } + + private static string CloneDatabaseFiles(string sourceFilePath, string prefix) + { + string destinationFilePath = Path.Combine(Path.GetTempPath(), $"{prefix}_{Guid.NewGuid():N}.db"); + File.Copy(sourceFilePath, destinationFilePath, overwrite: true); + + string sourceWalPath = sourceFilePath + ".wal"; + if (File.Exists(sourceWalPath)) + File.Copy(sourceWalPath, destinationFilePath + ".wal", overwrite: true); + + return destinationFilePath; + } + + private static void DeleteDatabaseFiles(string? filePath) + { + if (string.IsNullOrWhiteSpace(filePath)) + return; + + try { if (File.Exists(filePath)) File.Delete(filePath); } catch { } + try { if (File.Exists(filePath + ".wal")) File.Delete(filePath + ".wal"); } catch { } + try { if (File.Exists(filePath + ".manifest.json")) File.Delete(filePath + ".manifest.json"); } catch { } + } + + private static long GetFileLength(string path) + => File.Exists(path) ? new FileInfo(path).Length : 0; + + private static string GetCategory(int id) + => (id % 4) switch + { + 0 => "Alpha", + 1 => "Beta", + 2 => "Gamma", + _ => "Delta", + }; +} diff --git a/tests/CSharpDB.Benchmarks/Macro/InsertFanInDiagnosticsBenchmark.cs b/tests/CSharpDB.Benchmarks/Macro/InsertFanInDiagnosticsBenchmark.cs index 41e726ec..c5188fa4 100644 --- a/tests/CSharpDB.Benchmarks/Macro/InsertFanInDiagnosticsBenchmark.cs +++ b/tests/CSharpDB.Benchmarks/Macro/InsertFanInDiagnosticsBenchmark.cs @@ -85,7 +85,7 @@ private static InsertFanInScenario[] CreateScenarios() s_primaryWriterCounts, s_batchWindows); - // Shared auto-commit concurrent path: primary success case is disjoint explicit IDs. + // Shared auto-commit concurrent path: disjoint IDs plus hot right-edge and auto-ID coverage. AddScenarioMatrix( scenarios, CommitMode.AutoCommit, @@ -104,7 +104,7 @@ private static InsertFanInScenario[] CreateScenarios() s_primaryWriterCounts, s_batchWindows); - // Hot right-edge remains the failure-boundary case for concurrent insert writers. + // Hot right-edge concurrent writers should build pending-commit fan-in instead of serial retry pressure. AddScenarioMatrix( scenarios, CommitMode.AutoCommit, @@ -121,7 +121,7 @@ private static InsertFanInScenario[] CreateScenarios() s_boundaryWriterCounts, s_batchWindows); - // Auto-generated ID rows validate correctness/retry behavior, not best-case throughput. + // Auto-generated ID rows cover the row-ID reservation coordinator under the same fan-in pressure. AddScenarioMatrix( scenarios, CommitMode.AutoCommit, @@ -241,9 +241,11 @@ private static async Task RunScenarioAsync(InsertFanInScenario bench.Db.ResetWalFlushDiagnostics(); bench.Db.ResetCommitPathDiagnostics(); + bench.Db.ResetRowIdReservationDiagnostics(); RunStats stats = await RunPhaseAsync(bench.Db, scenario, TimeSpan.FromSeconds(5), recordLatencies: true); WalFlushDiagnosticsSnapshot walDiagnostics = bench.Db.GetWalFlushDiagnosticsSnapshot(); CommitPathDiagnosticsSnapshot commitDiagnostics = bench.Db.GetCommitPathDiagnosticsSnapshot(); + RowIdReservationDiagnosticsSnapshot rowIdDiagnostics = bench.Db.GetRowIdReservationDiagnosticsSnapshot(); var histogram = new LatencyHistogram(); foreach (double latencyMs in stats.CommitLatenciesMs) @@ -276,7 +278,7 @@ private static async Task RunScenarioAsync(InsertFanInScenario MeanMs = histogram.Mean, StdDevMs = histogram.StdDev, ExtraInfo = - $"mode={scenario.Mode}, implicitInsertMode={scenario.ReportedImplicitInsertMode}, idMode={scenario.IdMode}, keyPattern={scenario.KeyPattern}, writers={scenario.WriterCount}, rowsPerCommit=1, batchWindow={FormatBatchWindow(scenario.BatchWindow)}, retryBudget={s_retryOptions.MaxRetries}, rowsPerSec={rowsPerSecond:F1}, successfulCommits={stats.SuccessfulCommits}, attempts={stats.AttemptCount}, extraAttempts={extraAttempts}, exhaustedConflicts={stats.ExhaustedConflictCount}, duplicateKeyFailures={stats.DuplicateKeyCount}, terminalFailures={stats.TerminalFailureCount}, flushes={walDiagnostics.FlushCount}, flushesPerSec={flushesPerSecond:F1}, commitsPerFlush={commitsPerFlush:F2}, KiBPerFlush={kibPerFlush:F1}, batchWindowWaits={walDiagnostics.BatchWindowWaitCount}, {commitSummary}", + $"mode={scenario.Mode}, implicitInsertMode={scenario.ReportedImplicitInsertMode}, idMode={scenario.IdMode}, keyPattern={scenario.KeyPattern}, writers={scenario.WriterCount}, rowsPerCommit=1, batchWindow={FormatBatchWindow(scenario.BatchWindow)}, retryBudget={s_retryOptions.MaxRetries}, rowsPerSec={rowsPerSecond:F1}, successfulCommits={stats.SuccessfulCommits}, attempts={stats.AttemptCount}, extraAttempts={extraAttempts}, exhaustedConflicts={stats.ExhaustedConflictCount}, duplicateKeyFailures={stats.DuplicateKeyCount}, terminalFailures={stats.TerminalFailureCount}, flushes={walDiagnostics.FlushCount}, flushesPerSec={flushesPerSecond:F1}, commitsPerFlush={commitsPerFlush:F2}, KiBPerFlush={kibPerFlush:F1}, batchWindowWaits={walDiagnostics.BatchWindowWaitCount}, rowIdReservations={rowIdDiagnostics.ReservationCount}, reservedRowIds={rowIdDiagnostics.ReservedRowIdCount}, {commitSummary}", }; Console.WriteLine( diff --git a/tests/CSharpDB.Benchmarks/Macro/OptimizerCloseOutBenchmark.cs b/tests/CSharpDB.Benchmarks/Macro/OptimizerCloseOutBenchmark.cs new file mode 100644 index 00000000..827e71f1 --- /dev/null +++ b/tests/CSharpDB.Benchmarks/Macro/OptimizerCloseOutBenchmark.cs @@ -0,0 +1,250 @@ +using System.Diagnostics; +using CSharpDB.Benchmarks.Infrastructure; +using CSharpDB.Engine; + +namespace CSharpDB.Benchmarks.Macro; + +/// +/// Focused close-out coverage for the phase-2 stats-guided optimizer work. +/// The rows intentionally compare the same query shape before and after ANALYZE. +/// +public static class OptimizerCloseOutBenchmark +{ + private const int Iterations = 120; + + private static readonly OptimizerScenario[] s_scenarios = + [ + new( + "HeavyHitterEquality", + "SELECT COUNT(*) FROM optimizer_skew WHERE hot_code = 1", + "heavy hitters suppress unselective non-unique lookup choices"), + new( + "HistogramColdRange", + "SELECT COUNT(*) FROM optimizer_hist WHERE value BETWEEN 1000 AND 1099", + "equi-depth histograms guide skewed numeric range estimates"), + new( + "CompositeCorrelation", + "SELECT COUNT(*) FROM optimizer_corr WHERE region = 'East' AND city = 'EastCity'", + "composite-prefix distinct counts preserve correlated equality selectivity"), + new( + "BoundedJoinReorder", + "SELECT COUNT(*) FROM optimizer_big b JOIN optimizer_mid m ON b.code = m.code JOIN optimizer_small s ON m.code = s.code JOIN optimizer_tiny t ON s.code = t.code WHERE b.id BETWEEN 1 AND 25", + "bounded DP reorders small inner-join chains with selective predicates"), + ]; + + public static async Task> RunAsync() + { + string seedPath = await CreateSeedDatabaseAsync(); + string noAnalyzePath = CloneDatabaseFiles(seedPath, "optimizer-closeout-no-analyze"); + string analyzedPath = CloneDatabaseFiles(seedPath, "optimizer-closeout-analyzed"); + + try + { + await using var noAnalyzeDb = await Database.OpenAsync(noAnalyzePath, BenchmarkDurability.Apply()); + await using var analyzedDb = await Database.OpenAsync(analyzedPath, BenchmarkDurability.Apply()); + await AnalyzeCloseOutTablesAsync(analyzedDb); + + var results = new List(s_scenarios.Length * 2); + foreach (OptimizerScenario scenario in s_scenarios) + { + results.Add(await RunScenarioAsync(noAnalyzeDb, scenario, analyzed: false)); + results.Add(await RunScenarioAsync(analyzedDb, scenario, analyzed: true)); + } + + return results; + } + finally + { + DeleteDatabaseFiles(seedPath); + DeleteDatabaseFiles(noAnalyzePath); + DeleteDatabaseFiles(analyzedPath); + } + } + + private static async Task RunScenarioAsync( + Database db, + OptimizerScenario scenario, + bool analyzed) + { + for (int i = 0; i < 8; i++) + _ = await ExecuteScalarCountAsync(db, scenario.Sql); + + var histogram = new LatencyHistogram(); + long checksum = 0; + var sw = Stopwatch.StartNew(); + for (int i = 0; i < Iterations; i++) + { + var querySw = Stopwatch.StartNew(); + checksum += await ExecuteScalarCountAsync(db, scenario.Sql); + querySw.Stop(); + histogram.Record(querySw.Elapsed.TotalMilliseconds); + } + + sw.Stop(); + + string phase = analyzed ? "Analyzed" : "NoAnalyze"; + var result = new BenchmarkResult + { + Name = $"OptimizerCloseOut_{phase}_{scenario.Name}", + TotalOps = Iterations, + ElapsedMs = sw.Elapsed.TotalMilliseconds, + P50Ms = histogram.Percentile(0.50), + P90Ms = histogram.Percentile(0.90), + P95Ms = histogram.Percentile(0.95), + P99Ms = histogram.Percentile(0.99), + P999Ms = histogram.Percentile(0.999), + MinMs = histogram.Min, + MaxMs = histogram.Max, + MeanMs = histogram.Mean, + StdDevMs = histogram.StdDev, + ExtraInfo = $"analyzed={analyzed}, iterations={Iterations}, checksum={checksum}, focus={scenario.Focus}", + }; + + Console.WriteLine( + $" {result.Name}: {result.OpsPerSecond:N0} queries/sec, P50={result.P50Ms:F4}ms, P99={result.P99Ms:F4}ms"); + Console.WriteLine($" {result.ExtraInfo}"); + return result; + } + + private static async Task ExecuteScalarCountAsync(Database db, string sql) + { + await using var result = await db.ExecuteAsync(sql); + if (!await result.MoveNextAsync()) + throw new InvalidOperationException($"Query did not return a row: {sql}"); + + return result.Current[0].AsInteger; + } + + private static async Task AnalyzeCloseOutTablesAsync(Database db) + { + string[] tableNames = + [ + "optimizer_skew", + "optimizer_hist", + "optimizer_corr", + "optimizer_big", + "optimizer_mid", + "optimizer_small", + "optimizer_tiny", + ]; + + foreach (string tableName in tableNames) + await db.ExecuteAsync($"ANALYZE {tableName}"); + } + + private static async Task CreateSeedDatabaseAsync() + { + string filePath = Path.Combine(Path.GetTempPath(), $"optimizer_closeout_seed_{Guid.NewGuid():N}.db"); + await using var db = await Database.OpenAsync(filePath, BenchmarkDurability.Apply()); + + await CreateSkewTableAsync(db); + await CreateHistogramTableAsync(db); + await CreateCorrelationTableAsync(db); + await CreateJoinTablesAsync(db); + await db.CheckpointAsync(); + + return filePath; + } + + private static async Task CreateSkewTableAsync(Database db) + { + await db.ExecuteAsync("CREATE TABLE optimizer_skew (id INTEGER PRIMARY KEY, hot_code INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_skew_hot_code ON optimizer_skew(hot_code)"); + + await db.BeginTransactionAsync(); + for (int i = 1; i <= 9_000; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_skew VALUES ({i}, 1, {i * 3})"); + + for (int i = 9_001; i <= 10_000; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_skew VALUES ({i}, {i}, {i * 3})"); + + await db.CommitAsync(); + } + + private static async Task CreateHistogramTableAsync(Database db) + { + await db.ExecuteAsync("CREATE TABLE optimizer_hist (id INTEGER PRIMARY KEY, value INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_hist_value ON optimizer_hist(value)"); + + await db.BeginTransactionAsync(); + int id = 1; + for (int value = 1; value <= 10; value++) + { + for (int repeat = 0; repeat < 700; repeat++, id++) + await db.ExecuteAsync($"INSERT INTO optimizer_hist VALUES ({id}, {value}, {id * 5})"); + } + + for (int value = 1000; value < 1100; value++, id++) + await db.ExecuteAsync($"INSERT INTO optimizer_hist VALUES ({id}, {value}, {id * 5})"); + + await db.CommitAsync(); + } + + private static async Task CreateCorrelationTableAsync(Database db) + { + await db.ExecuteAsync("CREATE TABLE optimizer_corr (id INTEGER PRIMARY KEY, region TEXT NOT NULL, city TEXT NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_corr_region_city ON optimizer_corr(region, city)"); + + await db.BeginTransactionAsync(); + for (int i = 1; i <= 4_000; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_corr VALUES ({i}, 'East', 'EastCity', {i})"); + + for (int i = 4_001; i <= 8_000; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_corr VALUES ({i}, 'West', 'WestCity', {i})"); + + await db.CommitAsync(); + } + + private static async Task CreateJoinTablesAsync(Database db) + { + await db.ExecuteAsync("CREATE TABLE optimizer_big (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE optimizer_mid (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE optimizer_small (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE TABLE optimizer_tiny (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, payload INTEGER NOT NULL)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_big_code ON optimizer_big(code)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_mid_code ON optimizer_mid(code)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_small_code ON optimizer_small(code)"); + await db.ExecuteAsync("CREATE INDEX idx_optimizer_tiny_code ON optimizer_tiny(code)"); + + await db.BeginTransactionAsync(); + for (int i = 1; i <= 5_000; i++) + { + int code = ((i - 1) % 200) + 1; + await db.ExecuteAsync($"INSERT INTO optimizer_big VALUES ({i}, {code}, {i * 3})"); + } + + for (int i = 1; i <= 200; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_mid VALUES ({i}, {i}, {i * 5})"); + + for (int i = 1; i <= 10; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_small VALUES ({i}, {i}, {i * 7})"); + + for (int i = 1; i <= 2; i++) + await db.ExecuteAsync($"INSERT INTO optimizer_tiny VALUES ({i}, {i}, {i * 11})"); + + await db.CommitAsync(); + } + + private static string CloneDatabaseFiles(string sourceFilePath, string prefix) + { + string destinationFilePath = Path.Combine(Path.GetTempPath(), $"{prefix}_{Guid.NewGuid():N}.db"); + File.Copy(sourceFilePath, destinationFilePath, overwrite: true); + + string sourceWalPath = sourceFilePath + ".wal"; + if (File.Exists(sourceWalPath)) + File.Copy(sourceWalPath, destinationFilePath + ".wal", overwrite: true); + + return destinationFilePath; + } + + private static void DeleteDatabaseFiles(string? filePath) + { + if (string.IsNullOrWhiteSpace(filePath)) + return; + + try { if (File.Exists(filePath)) File.Delete(filePath); } catch { } + try { if (File.Exists(filePath + ".wal")) File.Delete(filePath + ".wal"); } catch { } + } + + private sealed record OptimizerScenario(string Name, string Sql, string Focus); +} diff --git a/tests/CSharpDB.Benchmarks/Micro/SystemCatalogBenchmarks.cs b/tests/CSharpDB.Benchmarks/Micro/SystemCatalogBenchmarks.cs index 989c50b9..7788aaff 100644 --- a/tests/CSharpDB.Benchmarks/Micro/SystemCatalogBenchmarks.cs +++ b/tests/CSharpDB.Benchmarks/Micro/SystemCatalogBenchmarks.cs @@ -20,6 +20,8 @@ public async Task GlobalSetup() { _bench = await BenchmarkDatabase.CreateWithSchemaAsync("CREATE TABLE seed (id INTEGER PRIMARY KEY, v INTEGER)"); await _bench.Db.ExecuteAsync("CREATE TABLE audit (id INTEGER)"); + await _bench.Db.ExecuteAsync("CREATE TABLE planner_diag (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER, v INTEGER)"); + await _bench.Db.ExecuteAsync("CREATE INDEX idx_planner_diag_ab ON planner_diag(a, b)"); for (int i = 0; i < TableCount; i++) { @@ -27,6 +29,17 @@ public async Task GlobalSetup() await _bench.Db.ExecuteAsync($"CREATE INDEX idx_t{i}_v ON t{i}(v)"); } + await _bench.Db.BeginTransactionAsync(); + for (int i = 1; i <= 200; i++) + { + int a = i % 5; + int b = i % 10; + int v = i <= 150 ? 1 : i; + await _bench.Db.ExecuteAsync($"INSERT INTO planner_diag VALUES ({i}, {a}, {b}, {v})"); + } + await _bench.Db.CommitAsync(); + await _bench.Db.ExecuteAsync("ANALYZE planner_diag"); + await _bench.Db.ExecuteAsync("CREATE VIEW v_seed AS SELECT id FROM seed"); await _bench.Db.ExecuteAsync( "CREATE TRIGGER trg_seed AFTER INSERT ON seed BEGIN INSERT INTO audit VALUES (NEW.id); END"); @@ -73,6 +86,35 @@ public async Task Sql_SysTriggersCount() await result.ToListAsync(); } + [Benchmark(Description = "SQL: SELECT COUNT(*) FROM sys.planner_histograms")] + public async Task Sql_PlannerHistogramsCount() + { + await using var result = await _bench.Db.ExecuteAsync("SELECT COUNT(*) FROM sys.planner_histograms"); + await result.ToListAsync(); + } + + [Benchmark(Description = "SQL: SELECT COUNT(*) FROM sys.planner_heavy_hitters")] + public async Task Sql_PlannerHeavyHittersCount() + { + await using var result = await _bench.Db.ExecuteAsync("SELECT COUNT(*) FROM sys.planner_heavy_hitters"); + await result.ToListAsync(); + } + + [Benchmark(Description = "SQL: SELECT COUNT(*) FROM sys.planner_index_prefix_stats")] + public async Task Sql_PlannerIndexPrefixStatsCount() + { + await using var result = await _bench.Db.ExecuteAsync("SELECT COUNT(*) FROM sys.planner_index_prefix_stats"); + await result.ToListAsync(); + } + + [Benchmark(Description = "SQL: EXPLAIN ESTIMATE skewed lookup")] + public async Task Sql_ExplainEstimateSkewedLookup() + { + await using var result = await _bench.Db.ExecuteAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_diag WHERE v = 1"); + await result.ToListAsync(); + } + [Benchmark(Description = "API: GetIndexes().Count")] public int Api_GetIndexesCount() { diff --git a/tests/CSharpDB.Benchmarks/Micro/WalPointReadBenchmarks.cs b/tests/CSharpDB.Benchmarks/Micro/WalPointReadBenchmarks.cs new file mode 100644 index 00000000..e72d4dd9 --- /dev/null +++ b/tests/CSharpDB.Benchmarks/Micro/WalPointReadBenchmarks.cs @@ -0,0 +1,169 @@ +using BenchmarkDotNet.Attributes; +using CSharpDB.Benchmarks.Infrastructure; +using CSharpDB.Engine; +using CSharpDB.Sql; +using CSharpDB.Storage.Checkpointing; +using CSharpDB.Storage.Paging; +using CSharpDB.Storage.StorageEngine; + +namespace CSharpDB.Benchmarks.Micro; + +[MemoryDiagnoser] +[SimpleJob(warmupCount: 3, iterationCount: 10)] +public class WalPointReadBenchmarks +{ + private const int LookupBatchSize = 256; + private const int ProbeSequenceLength = 32_768; + private const int WalCheckpointThreshold = 1_000_000; + + private readonly SelectStatement[] _probeStatements = new SelectStatement[ProbeSequenceLength]; + + private string _dbPath = null!; + private Database _db = null!; + private int _cursor; + private long _sink; + + [Params(100, 1_000, 5_000, 10_000)] + public int TargetFrames { get; set; } + + [Params(WalPointReadState.WalBacked, WalPointReadState.Checkpointed)] + public WalPointReadState State { get; set; } + + [GlobalSetup] + public void GlobalSetup() + => GlobalSetupAsync().GetAwaiter().GetResult(); + + [GlobalCleanup] + public void GlobalCleanup() + => GlobalCleanupAsync().GetAwaiter().GetResult(); + + [Benchmark(OperationsPerInvoke = LookupBatchSize, Description = "SQL primary-key point read")] + public async Task SqlPrimaryKeyPointRead() + { + int offset = NextBatchOffset(); + for (int i = 0; i < LookupBatchSize; i++) + { + await using var result = await _db.ExecuteAsync(_probeStatements[(offset + i) % ProbeSequenceLength]); + if (await result.MoveNextAsync()) + _sink ^= result.Current[0].AsInteger; + } + } + + private async Task GlobalSetupAsync() + { + _dbPath = Path.Combine(Path.GetTempPath(), $"csharpdb_wal_point_read_bench_{Guid.NewGuid():N}.db"); + _db = await Database.OpenAsync(_dbPath, CreateOptions()); + await _db.ExecuteAsync("CREATE TABLE t (id INTEGER PRIMARY KEY, val INTEGER, data TEXT)"); + + int rowCount = Math.Max(1, TargetFrames / 2); + await SeedRowsAsync(rowCount); + + if (State == WalPointReadState.Checkpointed) + await _db.CheckpointAsync(); + + var lookupRng = new Random(123); + for (int i = 0; i < ProbeSequenceLength; i++) + _probeStatements[i] = CreatePointLookupStatement(lookupRng.Next(0, rowCount)); + + await WarmupAsync(); + } + + private async Task GlobalCleanupAsync() + { + if (_db != null) + await _db.DisposeAsync(); + + InMemoryBenchmarkDatabaseFactory.DeleteDatabaseFiles(_dbPath); + } + + private async Task SeedRowsAsync(int rowCount) + { + const int batchSize = 500; + var rng = new Random(42); + + for (int batchStart = 0; batchStart < rowCount; batchStart += batchSize) + { + await _db.BeginTransactionAsync(); + int batchEnd = Math.Min(batchStart + batchSize, rowCount); + for (int id = batchStart; id < batchEnd; id++) + { + string text = DataGenerator.RandomString(rng, 50); + await _db.ExecuteAsync($"INSERT INTO t VALUES ({id}, {rng.Next()}, '{text}')"); + } + + await _db.CommitAsync(); + } + } + + private async Task WarmupAsync() + { + for (int i = 0; i < LookupBatchSize; i++) + { + await using var result = await _db.ExecuteAsync(_probeStatements[i]); + if (await result.MoveNextAsync()) + _sink ^= result.Current[0].AsInteger; + } + } + + private static DatabaseOptions CreateOptions() + { + return BenchmarkDurability.Apply(new DatabaseOptions + { + StorageEngineOptions = new StorageEngineOptions + { + PagerOptions = new PagerOptions + { + CheckpointPolicy = new FrameCountCheckpointPolicy(WalCheckpointThreshold), + }, + }, + }); + } + + private static SelectStatement CreatePointLookupStatement(int id) + { + return new SelectStatement + { + IsDistinct = false, + Columns = + [ + new SelectColumn + { + IsStar = false, + Expression = new ColumnRefExpression { ColumnName = "val" }, + Alias = null, + }, + ], + From = new SimpleTableRef { TableName = "t" }, + Where = new BinaryExpression + { + Op = BinaryOp.Equals, + Left = new ColumnRefExpression { ColumnName = "id" }, + Right = new LiteralExpression + { + LiteralType = TokenType.IntegerLiteral, + Value = (long)id, + }, + }, + GroupBy = null, + Having = null, + OrderBy = null, + Limit = null, + Offset = null, + }; + } + + private int NextBatchOffset() + { + int offset = _cursor; + _cursor += LookupBatchSize; + if (_cursor >= ProbeSequenceLength) + _cursor = 0; + return offset; + } +} + +public enum WalPointReadState +{ + WalBacked, + Checkpointed, +} diff --git a/tests/CSharpDB.Benchmarks/Program.cs b/tests/CSharpDB.Benchmarks/Program.cs index 483de1a4..deda01cb 100644 --- a/tests/CSharpDB.Benchmarks/Program.cs +++ b/tests/CSharpDB.Benchmarks/Program.cs @@ -130,6 +130,21 @@ await RunSuiteWithRepeatsAsync( repeatCount); return; + case "--optimizer-closeout": + EnsureReproConfigured(); + await RunSuiteWithRepeatsAsync("optimizer-closeout", RunOptimizerCloseOutOnceAsync, repeatCount); + return; + + case "--adaptive-reoptimization": + EnsureReproConfigured(); + await RunSuiteWithRepeatsAsync("adaptive-reoptimization", RunAdaptiveReoptimizationOnceAsync, repeatCount); + return; + + case "--async-io-closeout": + EnsureReproConfigured(); + await RunSuiteWithRepeatsAsync("async-io-closeout", RunAsyncIoCloseOutOnceAsync, repeatCount); + return; + case "--write-transaction-scenario": EnsureReproConfigured(); await RunSuiteWithRepeatsAsync( @@ -291,6 +306,15 @@ await RunBenchmarkPlanAsync( Console.WriteLine("=== Checkpoint Retention Benchmark ==="); await RunSuiteWithRepeatsAsync("checkpoint-retention-diagnostics", RunCheckpointRetentionDiagnosticsOnceAsync, repeatCount); Console.WriteLine(); + Console.WriteLine("=== Optimizer Close-Out Benchmark ==="); + await RunSuiteWithRepeatsAsync("optimizer-closeout", RunOptimizerCloseOutOnceAsync, repeatCount); + Console.WriteLine(); + Console.WriteLine("=== Adaptive Reoptimization Benchmark ==="); + await RunSuiteWithRepeatsAsync("adaptive-reoptimization", RunAdaptiveReoptimizationOnceAsync, repeatCount); + Console.WriteLine(); + Console.WriteLine("=== Async I/O Close-Out Benchmark ==="); + await RunSuiteWithRepeatsAsync("async-io-closeout", RunAsyncIoCloseOutOnceAsync, repeatCount); + Console.WriteLine(); Console.WriteLine("=== Hybrid Storage Mode Benchmark ==="); await RunSuiteWithRepeatsAsync("hybrid-storage-mode", RunHybridStorageModeOnceAsync, repeatCount); Console.WriteLine(); @@ -370,6 +394,30 @@ await RunBenchmarkPlanAsync( ranAny = true; } + if (requestedModes.Contains("--optimizer-closeout")) + { + EnsureReproConfigured(); + if (ranAny) Console.WriteLine(); + await RunSuiteWithRepeatsAsync("optimizer-closeout", RunOptimizerCloseOutOnceAsync, repeatCount); + ranAny = true; + } + + if (requestedModes.Contains("--adaptive-reoptimization")) + { + EnsureReproConfigured(); + if (ranAny) Console.WriteLine(); + await RunSuiteWithRepeatsAsync("adaptive-reoptimization", RunAdaptiveReoptimizationOnceAsync, repeatCount); + ranAny = true; + } + + if (requestedModes.Contains("--async-io-closeout")) + { + EnsureReproConfigured(); + if (ranAny) Console.WriteLine(); + await RunSuiteWithRepeatsAsync("async-io-closeout", RunAsyncIoCloseOutOnceAsync, repeatCount); + ranAny = true; + } + if (requestedModes.Contains("--concurrent-write-diagnostics")) { EnsureReproConfigured(); @@ -701,6 +749,24 @@ private static async Task> RunCheckpointRetentionScenarioO return [await CheckpointRetentionDiagnosticsBenchmark.RunNamedScenarioAsync(scenarioName)]; } + private static async Task> RunOptimizerCloseOutOnceAsync() + { + Console.WriteLine("--- Optimizer Close-Out Benchmark ---"); + return await OptimizerCloseOutBenchmark.RunAsync(); + } + + private static async Task> RunAdaptiveReoptimizationOnceAsync() + { + Console.WriteLine("--- Adaptive Reoptimization Benchmark ---"); + return await AdaptiveReoptimizationBenchmark.RunAsync(); + } + + private static async Task> RunAsyncIoCloseOutOnceAsync() + { + Console.WriteLine("--- Async I/O Close-Out Benchmark ---"); + return await AsyncIoCloseOutBenchmark.RunAsync(); + } + private static async Task> RunWriteTransactionScenarioOnceAsync(string scenarioName) { Console.WriteLine($"--- Explicit WriteTransaction Scenario: {scenarioName} ---"); @@ -872,6 +938,9 @@ private static Task RunSuiteByKeyAsync(string suiteKey, int repeatCount) "commit-fan-in-diagnostics" => RunSuiteWithRepeatsAsync("commit-fan-in-diagnostics", RunCommitFanInDiagnosticsOnceAsync, repeatCount), "insert-fan-in-diagnostics" => RunSuiteWithRepeatsAsync("insert-fan-in-diagnostics", RunInsertFanInDiagnosticsOnceAsync, repeatCount), "checkpoint-retention-diagnostics" => RunSuiteWithRepeatsAsync("checkpoint-retention-diagnostics", RunCheckpointRetentionDiagnosticsOnceAsync, repeatCount), + "optimizer-closeout" => RunSuiteWithRepeatsAsync("optimizer-closeout", RunOptimizerCloseOutOnceAsync, repeatCount), + "adaptive-reoptimization" => RunSuiteWithRepeatsAsync("adaptive-reoptimization", RunAdaptiveReoptimizationOnceAsync, repeatCount), + "async-io-closeout" => RunSuiteWithRepeatsAsync("async-io-closeout", RunAsyncIoCloseOutOnceAsync, repeatCount), "concurrent-write-diagnostics" => RunSuiteWithRepeatsAsync("concurrent-write-diagnostics", RunConcurrentWriteDiagnosticsOnceAsync, repeatCount), "concurrent-sqlite-capi-compare" => RunSuiteWithRepeatsAsync("concurrent-sqlite-capi-compare", RunConcurrentSqliteCApiComparisonOnceAsync, repeatCount), "concurrent-adonet-compare" => RunSuiteWithRepeatsAsync("concurrent-adonet-compare", RunConcurrentAdoNetComparisonOnceAsync, repeatCount), @@ -1171,6 +1240,9 @@ private static void PrintHelp() Console.WriteLine(" dotnet run -- --insert-fan-in-scenario AutoCommit_ExplicitId_W8_Batch250us Run one insert fan-in scenario"); Console.WriteLine(" dotnet run -- --checkpoint-retention-diagnostics Run focused background-checkpoint retention diagnostics"); Console.WriteLine(" dotnet run -- --checkpoint-retention-scenario W8_Blocker3s_Batch250us Run one checkpoint-retention scenario"); + Console.WriteLine(" dotnet run -- --optimizer-closeout Run focused advanced optimizer close-out diagnostics"); + Console.WriteLine(" dotnet run -- --adaptive-reoptimization Run focused opt-in adaptive join reoptimization diagnostics"); + Console.WriteLine(" dotnet run -- --async-io-closeout Run focused async I/O batching close-out diagnostics"); Console.WriteLine(" dotnet run -- --write-transaction-scenario UpdateDisjoint_W8_Rows1_Batch250us_Prealloc1MiB Run one explicit WriteTransaction scenario"); Console.WriteLine(" dotnet run -- --concurrent-write-diagnostics Run focused multi-writer durable commit diagnostics"); Console.WriteLine(" dotnet run -- --concurrent-write-scenario W8_Batch250us_Prealloc1MiB Run one concurrent durable-write scenario"); @@ -1195,7 +1267,7 @@ private static void PrintHelp() Console.WriteLine(" dotnet run -- --release-core --repeat 3 --repro Run only the suites that feed published README tables"); Console.WriteLine(" dotnet run -- --stress Run stress & durability tests"); Console.WriteLine(" dotnet run -- --scaling Run scaling experiments"); - Console.WriteLine(" dotnet run -- --macro --stress --scaling --write-diagnostics --durable-sql-batching --write-transaction-diagnostics --commit-fan-in-diagnostics --insert-fan-in-diagnostics --checkpoint-retention-diagnostics --concurrent-write-diagnostics --concurrent-sqlite-capi-compare --direct-file-cache-transport --hybrid-storage-mode --master-table --sqlite-compare --strict-insert-compare --native-aot-insert-compare --efcore-compare --efcore-compare-hybrid-shared-connection --efcore-compare-auto-open-close --hybrid-cold-open --hybrid-hot-set-read --hybrid-post-checkpoint Run non-micro suites in one invocation"); + Console.WriteLine(" dotnet run -- --macro --stress --scaling --write-diagnostics --durable-sql-batching --write-transaction-diagnostics --commit-fan-in-diagnostics --insert-fan-in-diagnostics --checkpoint-retention-diagnostics --optimizer-closeout --adaptive-reoptimization --async-io-closeout --concurrent-write-diagnostics --concurrent-sqlite-capi-compare --direct-file-cache-transport --hybrid-storage-mode --master-table --sqlite-compare --strict-insert-compare --native-aot-insert-compare --efcore-compare --efcore-compare-hybrid-shared-connection --efcore-compare-auto-open-close --hybrid-cold-open --hybrid-hot-set-read --hybrid-post-checkpoint Run non-micro suites in one invocation"); Console.WriteLine(" dotnet run -- --macro --repeat 3 Repeat suite and emit median-of-N CSV"); Console.WriteLine(" dotnet run -- --master-table --repeat 3 --repro Run a stable median master comparison refresh"); Console.WriteLine(" dotnet run -- --sqlite-compare --repeat 3 --repro Run a stable local SQLite median comparison capture"); diff --git a/tests/CSharpDB.Benchmarks/README.md b/tests/CSharpDB.Benchmarks/README.md index 1a2d3c0e..7a88f9e1 100644 --- a/tests/CSharpDB.Benchmarks/README.md +++ b/tests/CSharpDB.Benchmarks/README.md @@ -25,7 +25,7 @@ Current release health: | Item | Status | |---|---| | Latest release guardrail | `PASS` | -| Latest compare | `PASS=185, WARN=0, SKIP=0, FAIL=0` | +| Latest compare | `PASS=187, WARN=0, SKIP=0, FAIL=0` | | Promotion state | Current published tables are promoted from the April 26, 2026 release-core suite | | Durability default | CSharpDB values are durable unless a row explicitly says otherwise | @@ -40,25 +40,25 @@ The generated block below contains the scorecard first, then the current core re | Field | Value | |---|---| -| Published snapshot | April 26, 2026 release-core snapshot | -| Run date | Release-core artifacts captured April 26, 2026 PT; release guardrail compare captured April 27, 2026 UTC | -| Promotion status | Promoted after release-core suite and release guardrail compare passed; focused micro retry replaced volatile guardrail samples | -| Latest release guardrail | PASS=185, WARN=0, SKIP=0, FAIL=0 | +| Published snapshot | May 6, 2026 release-core snapshot | +| Run date | Release-core artifacts captured May 6, 2026 UTC; latest release guardrail compare captured May 6, 2026 UTC | +| Promotion status | Published tables are promoted from the May 6 release-core suite; latest release guardrail compare passed after focused close-out validation | +| Latest release guardrail | PASS=187, WARN=0, SKIP=0, FAIL=0 | | Runner | Intel i9-11900K, 16 logical cores, Windows 10.0.26300, .NET SDK 10.0.203, .NET runtime 10.0.7 | | Repro mode | priority=High, affinity=0xFF when captured with --repro | -| Commit | b7cb52ee2c30f31538e96480b6d055ff52439c26 plus uncommitted collection binary payload and benchmark updates | +| Commit | 47a700950a150669ce404294c594dd845550f460 | ### Approved Source Artifacts | Artifact | Command | Source CSV | |---|---|---| -| `master` | `--master-table --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/master-table-20260426-215529-median-of-3.csv` | -| `batching` | `--durable-sql-batching --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/durable-sql-batching-20260426-221413-median-of-3.csv` | -| `concurrent` | `--concurrent-write-diagnostics --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/concurrent-write-diagnostics-20260426-223659-median-of-3.csv` | -| `storage` | `--hybrid-storage-mode --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-storage-mode-20260426-224331-median-of-3.csv` | -| `hotset` | `--hybrid-hot-set-read --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-hot-set-read-20260426-225908-median-of-3.csv` | -| `coldopen` | `--hybrid-cold-open --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-cold-open-20260426-225949-median-of-3.csv` | -| `sqlite` | `--sqlite-compare --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/sqlite-compare-20260426-230045-median-of-3.csv` | +| `master` | `--master-table --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/master-table-20260506-024609-median-of-3.csv` | +| `batching` | `--durable-sql-batching --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/durable-sql-batching-20260506-030458-median-of-3.csv` | +| `concurrent` | `--concurrent-write-diagnostics --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/concurrent-write-diagnostics-20260506-032735-median-of-3.csv` | +| `storage` | `--hybrid-storage-mode --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-storage-mode-20260506-033407-median-of-3.csv` | +| `hotset` | `--hybrid-hot-set-read --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-hot-set-read-20260506-034948-median-of-3.csv` | +| `coldopen` | `--hybrid-cold-open --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-cold-open-20260506-035030-median-of-3.csv` | +| `sqlite` | `--sqlite-compare --repeat 3 --repro` | `tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/sqlite-compare-20260506-035128-median-of-3.csv` | ### Scorecard @@ -66,16 +66,16 @@ These are the headline rows readers should use first. Detailed tables below map | Area | Metric | Result | Source | |---|---|---|---| -| Release health | Latest guardrail compare | PASS: PASS=185, FAIL=0 | `Compare-Baseline.ps1` after focused micro retry | -| SQL durable write | Single INSERT | 450.4 ops/sec | `master` | -| SQL durable write | Batch x100 | 41.88K rows/sec | `master` | -| SQL hot read | Point lookup | 1.27M ops/sec | `master` | -| SQL concurrent read | 8 readers, reused snapshots x32 | 9.97M COUNT(*) ops/sec | `master` | -| Collection hot read | Point Get | 1.60M ops/sec | `master` | -| Single-writer ingest | InsertBatch B1000 | 233.06K rows/sec | `batching` | -| Concurrent durable write | W8, 250us commit window | 891.0 commits/sec | `concurrent` | -| Resident hot set | Hybrid hot-set SQL burst | 311.62K ops/sec | `hotset` | -| Local SQLite reference | SQLite WAL+FULL B1000 | 203.30K rows/sec | `sqlite` | +| Release health | Latest guardrail compare | PASS: PASS=187, FAIL=0 | `Run-Perf-Guardrails.ps1 -Mode release` after close-out validation | +| SQL durable write | Single INSERT | 267.1 ops/sec | `master` | +| SQL durable write | Batch x100 | 25.56K rows/sec | `master` | +| SQL hot read | Point lookup | 1.48M ops/sec | `master` | +| SQL concurrent read | 8 readers, reused snapshots x32 | 9.68M COUNT(*) ops/sec | `master` | +| Collection hot read | Point Get | 1.99M ops/sec | `master` | +| Single-writer ingest | InsertBatch B1000 | 211.99K rows/sec | `batching` | +| Concurrent durable write | W8, 250us commit window | 890.1 commits/sec | `concurrent` | +| Resident hot set | Hybrid hot-set SQL burst | 383.87K ops/sec | `hotset` | +| Local SQLite reference | SQLite WAL+FULL B1000 | 155.66K rows/sec | `sqlite` | ## Current Core Results @@ -85,21 +85,21 @@ These detailed tables are generated from the approved source artifacts listed ab | Surface | Single write | Batch x100 | Point read | Concurrent read | |---|---|---|---|---| -| SQL file-backed | 450.4 ops/sec | 41.88K rows/sec | 1.27M ops/sec | 9.97M COUNT(*) ops/sec | -| SQL hybrid incremental-durable | 449.3 ops/sec | 41.77K rows/sec | 1.29M ops/sec | 10.27M COUNT(*) ops/sec | -| SQL in-memory | 194.98K ops/sec | 708.75K rows/sec | 1.24M ops/sec | 10.09M COUNT(*) ops/sec | -| Collection file-backed | 447.3 ops/sec | 42.28K docs/sec | 1.60M ops/sec | - | -| Collection hybrid incremental-durable | 450.8 ops/sec | 42.34K docs/sec | 1.66M ops/sec | - | -| Collection in-memory | 205.25K ops/sec | 872.89K docs/sec | 1.59M ops/sec | - | +| SQL file-backed | 267.1 ops/sec | 25.56K rows/sec | 1.48M ops/sec | 9.68M COUNT(*) ops/sec | +| SQL hybrid incremental-durable | 276.1 ops/sec | 26.55K rows/sec | 1.47M ops/sec | 10.04M COUNT(*) ops/sec | +| SQL in-memory | 259.48K ops/sec | 934.22K rows/sec | 1.49M ops/sec | 10.26M COUNT(*) ops/sec | +| Collection file-backed | 265.7 ops/sec | 24.53K docs/sec | 1.99M ops/sec | - | +| Collection hybrid incremental-durable | 276.9 ops/sec | 25.75K docs/sec | 2.02M ops/sec | - | +| Collection in-memory | 262.14K ops/sec | 969.55K docs/sec | 2.02M ops/sec | - | ### Single-Writer Durable Ingest | Batch shape | Rows/sec | P50 | P99 | |---|---|---|---| -| InsertBatch B1 | 358.4 rows/sec | 2.7074 ms | 4.1941 ms | -| InsertBatch B100 | 33.92K rows/sec | 2.7875 ms | 4.2581 ms | -| InsertBatch B1000 | 233.06K rows/sec | 3.5123 ms | 7.7143 ms | -| InsertBatch B10000 | 618.81K rows/sec | 10.9217 ms | 162.9867 ms | +| InsertBatch B1 | 271.1 rows/sec | 3.5697 ms | 6.0705 ms | +| InsertBatch B100 | 26.04K rows/sec | 3.6636 ms | 7.7177 ms | +| InsertBatch B1000 | 211.99K rows/sec | 4.0197 ms | 8.0891 ms | +| InsertBatch B10000 | 799.29K rows/sec | 8.7654 ms | 118.3244 ms | ### Concurrent Durable Writes @@ -107,47 +107,175 @@ Each row is total successful commits/sec across one shared engine. The intended | Scenario | Commits/sec | Commits/flush | P50 | P99 | |---|---|---|---|---| -| W4, window 0 | 231.1 commits/sec | 1.00 | 17.0038 ms | 23.3867 ms | -| W4, window 250us | 449.6 commits/sec | 1.99 | 8.6405 ms | 15.8344 ms | -| W8, window 0 | 240.5 commits/sec | 1.00 | 32.8008 ms | 41.7800 ms | -| W8, window 250us | 891.0 commits/sec | 3.92 | 8.4358 ms | 16.9718 ms | +| W4, window 0 | 247.0 commits/sec | 1.00 | 15.8147 ms | 23.2742 ms | +| W4, window 250us | 463.4 commits/sec | 1.99 | 8.2404 ms | 16.5526 ms | +| W8, window 0 | 239.2 commits/sec | 1.00 | 32.7490 ms | 49.7798 ms | +| W8, window 250us | 890.1 commits/sec | 3.94 | 8.4327 ms | 17.7755 ms | ### Storage Mode Hot Steady State | Mode | SQL insert | SQL batch x100 | SQL point lookup | Collection put | Collection batch x100 | Collection get | |---|---|---|---|---|---|---| -| File-backed | 383.4 ops/sec | 33.66K rows/sec | 777.29K ops/sec | 387.5 ops/sec | 34.87K docs/sec | 862.93K ops/sec | -| Hybrid incremental-durable | 377.7 ops/sec | 33.97K rows/sec | 721.75K ops/sec | 379.1 ops/sec | 34.56K docs/sec | 864.55K ops/sec | -| In-memory | 123.41K ops/sec | 449.84K rows/sec | 711.22K ops/sec | 124.92K ops/sec | 519.03K docs/sec | 872.07K ops/sec | +| File-backed | 250.1 ops/sec | 22.99K rows/sec | 822.57K ops/sec | 250.8 ops/sec | 22.55K docs/sec | 1.16M ops/sec | +| Hybrid incremental-durable | 240.9 ops/sec | 22.15K rows/sec | 836.17K ops/sec | 231.9 ops/sec | 22.20K docs/sec | 1.18M ops/sec | +| In-memory | 147.33K ops/sec | 576.11K rows/sec | 870.15K ops/sec | 147.12K ops/sec | 606.84K docs/sec | 1.23M ops/sec | ### Resident Hot-Set Reads | Mode | SQL hot burst | SQL P50 | Collection hot burst | Collection P50 | |---|---|---|---|---| -| File-backed | 27.88K ops/sec | 0.0346 ms | 28.98K ops/sec | 0.0329 ms | -| Hybrid incremental-durable | 27.16K ops/sec | 0.0366 ms | 29.22K ops/sec | 0.0341 ms | -| Hybrid hot-set incremental-durable | 311.62K ops/sec | 0.0031 ms | 295.45K ops/sec | 0.0029 ms | -| In-memory | 84.82K ops/sec | 0.0049 ms | 122.39K ops/sec | 0.0044 ms | +| File-backed | 27.01K ops/sec | 0.0354 ms | 28.71K ops/sec | 0.0331 ms | +| Hybrid incremental-durable | 26.61K ops/sec | 0.0348 ms | 28.23K ops/sec | 0.0322 ms | +| Hybrid hot-set incremental-durable | 383.87K ops/sec | 0.0022 ms | 272.32K ops/sec | 0.0021 ms | +| In-memory | 104.73K ops/sec | 0.0033 ms | 112.58K ops/sec | 0.0033 ms | ### Cold Open And First Read | Mode | SQL open+first lookup P50 | Collection open+first get P50 | |---|---|---| -| File-backed | 18.1275 ms | 20.0627 ms | -| Hybrid incremental-durable | 19.9048 ms | 19.4950 ms | -| Hybrid hot-set incremental-durable | 89.5360 ms | 134.5282 ms | -| In-memory | 15.8719 ms | 21.7409 ms | +| File-backed | 18.7545 ms | 19.6066 ms | +| Hybrid incremental-durable | 19.7809 ms | 18.1360 ms | +| Hybrid hot-set incremental-durable | 85.5186 ms | 125.1785 ms | +| In-memory | 12.4251 ms | 18.6550 ms | ### Local SQLite Matched Rows | Engine / row | Throughput | P50 | P99 | |---|---|---|---| -| CSharpDB InsertBatch B1000 | 233.06K rows/sec | 3.5123 ms | 7.7143 ms | -| SQLite WAL+FULL prepared B1000 | 203.30K rows/sec | 4.6622 ms | 22.0984 ms | -| CSharpDB SQL point lookup | 1.27M ops/sec | 0.0005 ms | 0.0026 ms | -| SQLite WAL+FULL point lookup | 70.21K ops/sec | 0.0119 ms | 0.0369 ms | +| CSharpDB InsertBatch B1000 | 211.99K rows/sec | 4.0197 ms | 8.0891 ms | +| SQLite WAL+FULL prepared B1000 | 155.66K rows/sec | 5.9735 ms | 22.9219 ms | +| CSharpDB SQL point lookup | 1.48M ops/sec | 0.0005 ms | 0.0018 ms | +| SQLite WAL+FULL point lookup | 93.91K ops/sec | 0.0088 ms | 0.0282 ms | +## Focused Insert Fan-In Validation + +These rows are from the May 5, 2026 targeted validation run for the opt-in `ImplicitInsertExecutionMode.ConcurrentWriteTransactions` insert path. They are not promoted release-core scorecard rows yet; they document the current proof point for hot one-row concurrent inserts and are guarded by current-only release checks until the release-core suite includes these shapes directly. + +Command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --insert-fan-in-diagnostics --repro +``` + +Source CSV: + +`tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/insert-fan-in-diagnostics-20260505-233424.csv` + +| Insert shape | Writers/window | Commits/sec | Commits/flush | Notes | +|---|---|---:|---:|---| +| Serialized explicit hot right-edge | W8, 250us | 278.4 | 1.00 | Default serialized control | +| Concurrent explicit hot right-edge | W8, 250us | 910.3 | 3.33 | Pending right-edge rebase path | +| Concurrent auto-ID hot right-edge | W8, 250us | 913.1 | 3.34 | Row-ID reservation plus pending rebase | +| Concurrent explicit disjoint ranges | W8, 250us | 1,049.6 | 3.96 | Existing best-case concurrent insert shape remains strong | + +Operational guidance: keep `Serialized` as the default. Use `ConcurrentWriteTransactions` only for workloads that can benefit from shared-engine one-row commit fan-in; `InsertBatch` remains the preferred bulk-ingest path. + +## Focused Optimizer And Async I/O Close-Out Validation + +These May 5, 2026 rows are diagnostic close-out proof for the current advanced optimizer and async I/O batching phases. They are not promoted release-core scorecard rows; they document workload-shaped gains and audited coverage while future adaptive re-optimization and specialized maintenance tuning remain separate roadmap items. Public planner-stat inspection is now covered by the `sys.planner_*` catalog and `EXPLAIN ESTIMATE` diagnostic benchmarks. + +Optimizer command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --optimizer-closeout --repro +``` + +Optimizer source CSV: + +`tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/optimizer-closeout-20260505-204536.csv` + +| Optimizer shape | No ANALYZE | ANALYZE | Ratio | What it validates | +|---|---:|---:|---:|---| +| Heavy-hitter equality | 11,671 queries/sec | 17,091 queries/sec | 1.46x | Skew-aware equality and non-unique lookup costing | +| Histogram cold range | 21,895 queries/sec | 23,175 queries/sec | 1.06x | Equi-depth range estimates avoid worse plans | +| Composite correlation | 522 queries/sec | 987 queries/sec | 1.89x | Composite-prefix stats preserve correlated equality selectivity | +| Bounded join reorder | 9,628 queries/sec | 11,247 queries/sec | 1.17x | Small inner-join chains use bounded DP reordering | + +Public planner diagnostics smoke command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --micro --filter *SystemCatalogBenchmarks*Planner* --job Dry +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --micro --filter *ExplainEstimate* --job Dry +``` + +| Public planner diagnostic shape | TableCount | Mean | Allocation | What it validates | +|---|---:|---:|---:|---| +| `COUNT(*) FROM sys.planner_histograms` | 100 | 235.3 ns | 552 B | Virtual histogram catalog count fast path | +| `COUNT(*) FROM sys.planner_heavy_hitters` | 100 | 227.2 ns | 552 B | Virtual heavy-hitter catalog count fast path | +| `COUNT(*) FROM sys.planner_index_prefix_stats` | 100 | 203.2 ns | 528 B | Virtual composite-prefix catalog count fast path | +| `EXPLAIN ESTIMATE` skewed lookup | 100 | 345.8 us | 334.83 KB | Bounded estimate diagnostics without executing user rows | + +## Focused Adaptive Re-Optimization Validation + +These May 6, 2026 rows are diagnostic proof for opt-in adaptive query re-optimization. The SQL rows measure default-disabled behavior, enabled wrapper overhead, and eligible join shapes on the benchmark seed data. The synthetic rows force the adaptive operator switch paths directly so the run records divergence and switch counters even when the normal planner already avoids a bad plan. + +Command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --adaptive-reoptimization +``` + +Source CSV: + +`tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/adaptive-reoptimization-20260506-073419.csv` + +| Adaptive shape | Throughput | P50 | P99 | Switches | What it validates | +|---|---:|---:|---:|---:|---| +| Disabled stable join | 251 queries/sec | 3.681 ms | 9.002 ms | 0 | Default path stays silent with no adaptive counters | +| Enabled stable no-switch | 191 queries/sec | 4.666 ms | 9.588 ms | 0 | Opt-in wrapper overhead when thresholds avoid adaptation | +| Stale-stat fan-out diagnostic | 48 queries/sec | 18.324 ms | 43.867 ms | 0 | Eligible stale/range workload shape on the current planner | +| Parameter-sensitive small | 910 queries/sec | 0.747 ms | 4.448 ms | 0 | Small selective value with adaptive eligibility enabled | +| Hash build-side diagnostic | 77 queries/sec | 12.687 ms | 22.376 ms | 0 | Eligible hash-build workload shape on the current planner | +| Synthetic index switch | 8,651 ops/sec | 0.115 ms | 0.121 ms | 86 | Index-to-hash switch path, buffered replay, and divergence counters | +| Synthetic hash build switch | 1,403 ops/sec | 0.594 ms | 2.246 ms | 86 | Hash build-side flip path and divergence counters | + +Interpretation: this feature is not a universal speedup. It should be enabled only for workloads where stale statistics or parameter-sensitive joins produce materially wrong join choices; stable, well-analyzed plans should expect no gain and some opt-in wrapper cost. + +Async I/O command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --async-io-closeout --repro +``` + +Async I/O source CSV: + +`tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/async-io-closeout-20260505-204638.csv` + +| Async I/O shape | Throughput | Classification | +|---|---:|---| +| SaveToFile snapshot copy | 52,762 pages/sec | Already batched through `StorageDeviceCopyBatcher` | +| Backup snapshot copy | 8,136 pages/sec | Already batched through backup/snapshot copy helpers | +| Restore staging | 9,996 pages/sec | Already batched through load/save staging | +| Vacuum logical rewrite | 3,365 pages/sec | Intentionally logical through `BTreeCopyUtility` | +| FK migration rewrite | 42,749 rows/sec | Intentionally logical through `BTreeCopyUtility` | +| Database inspector scan | 18,600 pages/sec | Specialized diagnostic path | +| WAL inspector scan | 2,310 frames/sec | Specialized diagnostic path over a live 20-frame WAL | + +## Focused Generated Collection Fast-Path Validation + +These April 26, 2026 rows are diagnostic proof for the opt-in source-generated collection fast path. They compare source-generated JSON payloads with generated binary direct payloads for supported document graphs. They are not promoted release-core scorecard rows; durable single-row collection writes can still be flush-bound, so these numbers mainly describe CPU and allocation wins in encode/decode and field/index-reader paths. + +Command: + +```powershell +dotnet run -c Release --project .\tests\CSharpDB.Benchmarks\CSharpDB.Benchmarks.csproj -- --micro --filter *GeneratedCollection* +``` + +Source CSVs: + +`BenchmarkDotNet.Artifacts/results/CSharpDB.Benchmarks.Micro.GeneratedCollection*Benchmarks-report.csv` + +| Path | Source-gen JSON | Generated binary | Gain | Allocation | +|---|---:|---:|---:|---| +| Encode payload | 600.1 ns | 306.2 ns | 1.96x | 552 B to 136 B | +| Decode payload | 2,277.9 ns | 371.9 ns | 6.12x | 1,240 B to 480 B | +| Indexed int field read | 187.23 ns | 29.74 ns | 6.30x | 0 B to 0 B | +| Text field UTF-8 read | 185.82 ns | 27.26 ns | 6.82x | 56 B to 0 B | +| Key match | 21.48 ns | 19.91 ns | 1.08x | 0 B to 0 B | + +Interpretation: generated collections are worth using when collection payload CPU, direct field extraction, or index-reader cost is visible in the profile. They should not be sold as a durable commit-throughput feature, because WAL flush policy still dominates one-row durable writes. + ## Core Benchmark Map | Performance question | Published surface | Benchmark source | @@ -155,6 +283,12 @@ Each row is total successful commits/sec across one shared engine. The intended | Durable SQL and collection top-line API speed | Single insert/put, batch x100, point lookup, concurrent reads | `--master-table --repeat 3 --repro` | | Single-writer durable ingest | `B1`, `B100`, `B1000`, optional `B10000` batch rows | `--durable-sql-batching --repeat 3 --repro` | | Concurrent durable writes | `W4` and `W8`, `0` vs `250us`, disjoint explicit-key auto-commit | `--concurrent-write-diagnostics --repeat 3 --repro` | +| Concurrent insert fan-in | Serialized controls, disjoint explicit keys, hot explicit right-edge, hot auto-ID | `--insert-fan-in-diagnostics --repro` | +| Advanced optimizer close-out | Heavy hitters, histogram ranges, composite-prefix correlation, bounded join reorder | `--optimizer-closeout --repro` | +| Public planner diagnostics | Planner histogram/heavy-hitter/prefix catalogs and bounded estimate explanations | `--micro --filter *SystemCatalogBenchmarks*Planner*`; `--micro --filter *ExplainEstimate*` | +| Adaptive query re-optimization | Default-disabled baseline, enabled wrapper overhead, eligible join shapes, synthetic switch counters | `--adaptive-reoptimization` | +| Async I/O batching close-out | Save/backup/restore, vacuum/FK logical rewrites, inspector/WAL scans | `--async-io-closeout --repro` | +| Generated collection fast path | Generated binary payload encode/decode, direct field reads, UTF-8 text field reads, key matching | `--micro --filter *GeneratedCollection*` | | Storage mode tradeoffs | file-backed, hybrid incremental, in-memory hot steady-state | `--hybrid-storage-mode --repeat 3 --repro` | | Resident hot-set behavior | file-backed vs hybrid hot-set vs in-memory hot burst | `--hybrid-hot-set-read --repeat 3 --repro` | | Cold open / first read | startup cost and first lookup/get latency | `--hybrid-cold-open --repeat 3 --repro` | diff --git a/tests/CSharpDB.Benchmarks/perf-thresholds.json b/tests/CSharpDB.Benchmarks/perf-thresholds.json index bf8ad821..389c9822 100644 --- a/tests/CSharpDB.Benchmarks/perf-thresholds.json +++ b/tests/CSharpDB.Benchmarks/perf-thresholds.json @@ -701,6 +701,38 @@ ], "maxMeanRegressionPercent": 25.0, "skipAllocationComparison": true, + "requiredCurrentRows": [ + { + "match": { + "Name": "InsertFanIn_AutoCommitConcurrent_ExplicitId_W8_Batch250us_5s" + }, + "extraInfoChecks": [ + { + "key": "rowsPerSec", + "minValue": 700.0 + }, + { + "key": "commitsPerFlush", + "minValue": 2.5 + } + ] + }, + { + "match": { + "Name": "InsertFanIn_AutoCommitConcurrent_AutoId_W8_Batch250us_5s" + }, + "extraInfoChecks": [ + { + "key": "rowsPerSec", + "minValue": 700.0 + }, + { + "key": "commitsPerFlush", + "minValue": 2.5 + } + ] + }, + ], "overrides": [ { "match": { diff --git a/tests/CSharpDB.Benchmarks/release-core-manifest.json b/tests/CSharpDB.Benchmarks/release-core-manifest.json index a41cab4e..2967d470 100644 --- a/tests/CSharpDB.Benchmarks/release-core-manifest.json +++ b/tests/CSharpDB.Benchmarks/release-core-manifest.json @@ -1,49 +1,49 @@ { "schemaVersion": 1, "metadata": { - "Published snapshot": "April 26, 2026 release-core snapshot", - "Run date": "Release-core artifacts captured April 26, 2026 PT; release guardrail compare captured April 27, 2026 UTC", - "Promotion status": "Promoted after release-core suite and release guardrail compare passed; focused micro retry replaced volatile guardrail samples", - "Latest release guardrail": "PASS=185, WARN=0, SKIP=0, FAIL=0", + "Published snapshot": "May 6, 2026 release-core snapshot", + "Run date": "Release-core artifacts captured May 6, 2026 UTC; latest release guardrail compare captured May 6, 2026 UTC", + "Promotion status": "Published tables are promoted from the May 6 release-core suite; latest release guardrail compare passed after focused close-out validation", + "Latest release guardrail": "PASS=187, WARN=0, SKIP=0, FAIL=0", "Runner": "Intel i9-11900K, 16 logical cores, Windows 10.0.26300, .NET SDK 10.0.203, .NET runtime 10.0.7", "Repro mode": "priority=High, affinity=0xFF when captured with --repro", - "Commit": "b7cb52ee2c30f31538e96480b6d055ff52439c26 plus uncommitted collection binary payload and benchmark updates" + "Commit": "47a700950a150669ce404294c594dd845550f460" }, "artifacts": [ { "id": "master", "command": "--master-table --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/master-table-20260426-215529-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/master-table-20260506-024609-median-of-3.csv" }, { "id": "batching", "command": "--durable-sql-batching --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/durable-sql-batching-20260426-221413-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/durable-sql-batching-20260506-030458-median-of-3.csv" }, { "id": "concurrent", "command": "--concurrent-write-diagnostics --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/concurrent-write-diagnostics-20260426-223659-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/concurrent-write-diagnostics-20260506-032735-median-of-3.csv" }, { "id": "storage", "command": "--hybrid-storage-mode --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-storage-mode-20260426-224331-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-storage-mode-20260506-033407-median-of-3.csv" }, { "id": "hotset", "command": "--hybrid-hot-set-read --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-hot-set-read-20260426-225908-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-hot-set-read-20260506-034948-median-of-3.csv" }, { "id": "coldopen", "command": "--hybrid-cold-open --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-cold-open-20260426-225949-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/hybrid-cold-open-20260506-035030-median-of-3.csv" }, { "id": "sqlite", "command": "--sqlite-compare --repeat 3 --repro", - "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/sqlite-compare-20260426-230045-median-of-3.csv" + "path": "tests/CSharpDB.Benchmarks/bin/Release/net10.0/results/sqlite-compare-20260506-035128-median-of-3.csv" } ], "sections": [ @@ -58,8 +58,8 @@ "cells": [ { "value": "Release health" }, { "value": "Latest guardrail compare" }, - { "value": "PASS: PASS=185, FAIL=0" }, - { "value": "`Compare-Baseline.ps1` after focused micro retry" } + { "value": "PASS: PASS=187, FAIL=0" }, + { "value": "`Run-Perf-Guardrails.ps1 -Mode release` after close-out validation" } ] }, { diff --git a/tests/CSharpDB.Benchmarks/scripts/Compare-Baseline.ps1 b/tests/CSharpDB.Benchmarks/scripts/Compare-Baseline.ps1 index ffed5518..d589f557 100644 --- a/tests/CSharpDB.Benchmarks/scripts/Compare-Baseline.ps1 +++ b/tests/CSharpDB.Benchmarks/scripts/Compare-Baseline.ps1 @@ -442,6 +442,127 @@ function Test-ExtraInfoChecks $notes.Add("$key OK (baseline=$baselineRaw, current=$currentRaw)") | Out-Null } + + $minValueText = [string](Get-OptionalProperty -Object $check -Name "minValue" -DefaultValue "") + if (-not [string]::IsNullOrWhiteSpace($minValueText)) + { + $minValue = [double]$minValueText + if ($currentValue -lt $minValue) + { + $regressed = $true + $formattedMinValue = "{0:N2}" -f $minValue + $notes.Add("$key>=$formattedMinValue (current=$currentRaw)") | Out-Null + continue + } + + $formattedMinValue = "{0:N2}" -f $minValue + $notes.Add("$key>=$formattedMinValue OK (current=$currentRaw)") | Out-Null + } + + $maxValueText = [string](Get-OptionalProperty -Object $check -Name "maxValue" -DefaultValue "") + if (-not [string]::IsNullOrWhiteSpace($maxValueText)) + { + $maxValue = [double]$maxValueText + if ($currentValue -gt $maxValue) + { + $regressed = $true + $formattedMaxValue = "{0:N2}" -f $maxValue + $notes.Add("$key<=$formattedMaxValue (current=$currentRaw)") | Out-Null + continue + } + + $formattedMaxValue = "{0:N2}" -f $maxValue + $notes.Add("$key<=$formattedMaxValue OK (current=$currentRaw)") | Out-Null + } + } + + return [pscustomobject]@{ + Regressed = $regressed + Notes = [string[]]$notes.ToArray() + } +} + +function Test-CurrentExtraInfoChecks +{ + param( + [Parameter(Mandatory = $true)]$CurrentRow, + [Parameter(Mandatory = $false)]$Checks + ) + + $notes = New-Object System.Collections.Generic.List[string] + $regressed = $false + + if ($null -eq $Checks) + { + return [pscustomobject]@{ + Regressed = $false + Notes = [string[]]@() + } + } + + foreach ($check in @($Checks)) + { + if ($null -eq $check) + { + continue + } + + $key = [string](Get-OptionalProperty -Object $check -Name "key" -DefaultValue "") + if ([string]::IsNullOrWhiteSpace($key)) + { + throw "requiredCurrentRows extraInfoChecks entries must define a key." + } + + $currentRaw = Get-ExtraInfoValue -Row $CurrentRow -Key $key + if ($null -eq $currentRaw) + { + $regressed = $true + $notes.Add("ExtraInfo $key missing") | Out-Null + continue + } + + $currentValue = Convert-ComparableNumber -Value $currentRaw -Description "current ExtraInfo '$key'" + + $minRatioText = [string](Get-OptionalProperty -Object $check -Name "minRatio" -DefaultValue "") + $maxRatioText = [string](Get-OptionalProperty -Object $check -Name "maxRatio" -DefaultValue "") + if (-not [string]::IsNullOrWhiteSpace($minRatioText) -or -not [string]::IsNullOrWhiteSpace($maxRatioText)) + { + $regressed = $true + $notes.Add("$key ratio check requires a baseline row") | Out-Null + continue + } + + $minValueText = [string](Get-OptionalProperty -Object $check -Name "minValue" -DefaultValue "") + if (-not [string]::IsNullOrWhiteSpace($minValueText)) + { + $minValue = [double]$minValueText + if ($currentValue -lt $minValue) + { + $regressed = $true + $formattedMinValue = "{0:N2}" -f $minValue + $notes.Add("$key>=$formattedMinValue (current=$currentRaw)") | Out-Null + continue + } + + $formattedMinValue = "{0:N2}" -f $minValue + $notes.Add("$key>=$formattedMinValue OK (current=$currentRaw)") | Out-Null + } + + $maxValueText = [string](Get-OptionalProperty -Object $check -Name "maxValue" -DefaultValue "") + if (-not [string]::IsNullOrWhiteSpace($maxValueText)) + { + $maxValue = [double]$maxValueText + if ($currentValue -gt $maxValue) + { + $regressed = $true + $formattedMaxValue = "{0:N2}" -f $maxValue + $notes.Add("$key<=$formattedMaxValue (current=$currentRaw)") | Out-Null + continue + } + + $formattedMaxValue = "{0:N2}" -f $maxValue + $notes.Add("$key<=$formattedMaxValue OK (current=$currentRaw)") | Out-Null + } } return [pscustomobject]@{ @@ -634,6 +755,7 @@ function Add-ComparisonResult [Parameter(Mandatory = $true)]$Results, [Parameter(Mandatory = $true)][string]$Csv, [Parameter(Mandatory = $true)][string]$Key, + [string]$Metric = "Mean", [string]$BaselineMean = "", [string]$CurrentMean = "", [string]$MeanDeltaPct = "", @@ -649,6 +771,7 @@ function Add-ComparisonResult $Results.Add([pscustomobject]@{ Csv = $Csv Key = $Key + Metric = $Metric BaselineMean = $BaselineMean CurrentMean = $CurrentMean MeanDeltaPct = $MeanDeltaPct @@ -956,6 +1079,72 @@ foreach ($check in $config.checks) $baselineRows = Import-Csv -Path $baselineFile $currentRows = Import-Csv -Path $currentFile + $requiredCurrentRows = Get-OptionalProperty -Object $check -Name "requiredCurrentRows" -DefaultValue @() + foreach ($requiredCurrent in @($requiredCurrentRows)) + { + if ($null -eq $requiredCurrent) + { + continue + } + + $criteria = Get-OptionalProperty -Object $requiredCurrent -Name "match" -DefaultValue $requiredCurrent + $currentMatch = $currentRows | + Where-Object { Test-RowMatch -Row $_ -Criteria $criteria } | + Select-Object -First 1 + $requiredCurrentKey = Build-RowKey -Row $criteria -KeyColumns @($criteria.PSObject.Properties.Name) + if ($null -eq $currentMatch) + { + Add-ComparisonResult -Results $results ` + -Csv $csvName ` + -Key $requiredCurrentKey ` + -Enforcement $machineCompatibility.Summary ` + -Status "FAIL" ` + -Notes "Required current row missing" + $failureCount++ + continue + } + + $currentOnlyExtraInfoChecks = Get-OptionalProperty -Object $requiredCurrent -Name "extraInfoChecks" -DefaultValue @() + $currentOnlyExtraInfoEvaluation = Test-CurrentExtraInfoChecks -CurrentRow $currentMatch -Checks $currentOnlyExtraInfoChecks + $currentOnlyStatus = + if ($machineCompatibility.Classification -eq "different") + { + "SKIP" + } + elseif ($currentOnlyExtraInfoEvaluation.Regressed) + { + $(if ($machineCompatibility.Classification -eq "compatible") { "WARN" } else { "FAIL" }) + } + else + { + "PASS" + } + + if ($currentOnlyStatus -eq "FAIL") + { + $failureCount++ + } + + $currentOnlyNotes = + if ($currentOnlyExtraInfoEvaluation.Notes.Count -gt 0) + { + "Current row present ; $($currentOnlyExtraInfoEvaluation.Notes -join '; ')" + } + else + { + "Current row present" + } + + Add-ComparisonResult -Results $results ` + -Csv $csvName ` + -Key $requiredCurrentKey ` + -CurrentMean ([string](Get-OptionalProperty -Object $currentMatch -Name "Mean" -DefaultValue "")) ` + -CurrentAlloc ([string](Get-OptionalProperty -Object $currentMatch -Name "Allocated" -DefaultValue "")) ` + -Enforcement $machineCompatibility.Summary ` + -Status $currentOnlyStatus ` + -Notes $currentOnlyNotes + } + $rowsToCompare = New-Object System.Collections.Generic.List[object] $requiredRows = Get-OptionalProperty -Object $check -Name "requiredRows" -DefaultValue $null if ($null -eq $requiredRows) @@ -1005,12 +1194,51 @@ foreach ($check in $config.checks) Select-Object -First 1 $rowKey = Build-RowKey -Row $baselineRow -KeyColumns $keyColumns + + $maxMeanRegression = [double](Get-OptionalProperty -Object $check -Name "maxMeanRegressionPercent" -DefaultValue $defaultMeanRegression) + $maxAllocRegressionPct = [double](Get-OptionalProperty -Object $check -Name "maxAllocRegressionPercent" -DefaultValue $defaultAllocRegressionPct) + $maxAllocRegressionBytes = [double](Get-OptionalProperty -Object $check -Name "maxAllocRegressionBytes" -DefaultValue $defaultAllocRegressionBytes) + $skipMeanComparison = [bool](Get-OptionalProperty -Object $check -Name "skipMeanComparison" -DefaultValue $false) + $skipAllocationComparison = [bool](Get-OptionalProperty -Object $check -Name "skipAllocationComparison" -DefaultValue $false) + $metricColumn = [string](Get-OptionalProperty -Object $check -Name "metricColumn" -DefaultValue "Mean") + if ([string]::IsNullOrWhiteSpace($metricColumn)) + { + throw "Check '$csvName' has an empty metricColumn value." + } + + $extraInfoChecks = Get-OptionalProperty -Object $check -Name "extraInfoChecks" -DefaultValue @() + + $overrides = Get-OptionalProperty -Object $check -Name "overrides" -DefaultValue @() + foreach ($override in $overrides) + { + if (Test-RowMatch -Row $baselineRow -Criteria (Get-OptionalProperty -Object $override -Name "match" -DefaultValue $null)) + { + $maxMeanRegression = [double](Get-OptionalProperty -Object $override -Name "maxMeanRegressionPercent" -DefaultValue $maxMeanRegression) + $maxAllocRegressionPct = [double](Get-OptionalProperty -Object $override -Name "maxAllocRegressionPercent" -DefaultValue $maxAllocRegressionPct) + $maxAllocRegressionBytes = [double](Get-OptionalProperty -Object $override -Name "maxAllocRegressionBytes" -DefaultValue $maxAllocRegressionBytes) + $skipMeanComparison = [bool](Get-OptionalProperty -Object $override -Name "skipMeanComparison" -DefaultValue $skipMeanComparison) + $skipAllocationComparison = [bool](Get-OptionalProperty -Object $override -Name "skipAllocationComparison" -DefaultValue $skipAllocationComparison) + $metricColumn = [string](Get-OptionalProperty -Object $override -Name "metricColumn" -DefaultValue $metricColumn) + if ([string]::IsNullOrWhiteSpace($metricColumn)) + { + throw "Check '$csvName' override for '$rowKey' has an empty metricColumn value." + } + + $overrideExtraInfoChecks = Get-OptionalProperty -Object $override -Name "extraInfoChecks" -DefaultValue $null + if ($null -ne $overrideExtraInfoChecks) + { + $extraInfoChecks = $overrideExtraInfoChecks + } + } + } + if ($null -eq $currentRow) { Add-ComparisonResult -Results $results ` -Csv $csvName ` -Key $rowKey ` - -BaselineMean ([string](Get-OptionalProperty -Object $baselineRow -Name "Mean" -DefaultValue "")) ` + -Metric $metricColumn ` + -BaselineMean ([string](Get-OptionalProperty -Object $baselineRow -Name $metricColumn -DefaultValue "")) ` -BaselineAlloc ([string](Get-OptionalProperty -Object $baselineRow -Name "Allocated" -DefaultValue "")) ` -Enforcement $machineCompatibility.Summary ` -Status "FAIL" ` @@ -1019,20 +1247,21 @@ foreach ($check in $config.checks) continue } - $baselineMeanRaw = [string](Get-OptionalProperty -Object $baselineRow -Name "Mean" -DefaultValue "") - $currentMeanRaw = [string](Get-OptionalProperty -Object $currentRow -Name "Mean" -DefaultValue "") + $baselineMeanRaw = [string](Get-OptionalProperty -Object $baselineRow -Name $metricColumn -DefaultValue "") + $currentMeanRaw = [string](Get-OptionalProperty -Object $currentRow -Name $metricColumn -DefaultValue "") if ((Test-MetricUnavailable -Value $baselineMeanRaw) -or (Test-MetricUnavailable -Value $currentMeanRaw)) { Add-ComparisonResult -Results $results ` -Csv $csvName ` -Key $rowKey ` + -Metric $metricColumn ` -BaselineMean $baselineMeanRaw ` -CurrentMean $currentMeanRaw ` -BaselineAlloc ([string](Get-OptionalProperty -Object $baselineRow -Name "Allocated" -DefaultValue "")) ` -CurrentAlloc ([string](Get-OptionalProperty -Object $currentRow -Name "Allocated" -DefaultValue "")) ` -Enforcement $machineCompatibility.Summary ` -Status "FAIL" ` - -Notes "Benchmark did not produce a numeric Mean value" + -Notes "Benchmark did not produce a numeric $metricColumn value" $failureCount++ continue } @@ -1041,8 +1270,6 @@ foreach ($check in $config.checks) $currentMeanNs = Convert-TimeToNanoseconds -Value $currentMeanRaw $meanDeltaPct = (($currentMeanNs - $baselineMeanNs) / $baselineMeanNs) * 100.0 - $skipAllocationComparison = [bool](Get-OptionalProperty -Object $check -Name "skipAllocationComparison" -DefaultValue $false) - $baselineAlloc = [string](Get-OptionalProperty -Object $baselineRow -Name "Allocated" -DefaultValue "") $currentAlloc = [string](Get-OptionalProperty -Object $currentRow -Name "Allocated" -DefaultValue "") $allocComparisonEnabled = -not $skipAllocationComparison @@ -1051,6 +1278,7 @@ foreach ($check in $config.checks) Add-ComparisonResult -Results $results ` -Csv $csvName ` -Key $rowKey ` + -Metric $metricColumn ` -BaselineMean (Format-Nanoseconds -Nanoseconds $baselineMeanNs) ` -CurrentMean (Format-Nanoseconds -Nanoseconds $currentMeanNs) ` -MeanDeltaPct ("{0:N2}" -f $meanDeltaPct) ` @@ -1067,6 +1295,7 @@ foreach ($check in $config.checks) Add-ComparisonResult -Results $results ` -Csv $csvName ` -Key $rowKey ` + -Metric $metricColumn ` -BaselineMean (Format-Nanoseconds -Nanoseconds $baselineMeanNs) ` -CurrentMean (Format-Nanoseconds -Nanoseconds $currentMeanNs) ` -MeanDeltaPct ("{0:N2}" -f $meanDeltaPct) ` @@ -1102,23 +1331,6 @@ foreach ($check in $config.checks) $allocDeltaPct = 0.0 } - $maxMeanRegression = [double](Get-OptionalProperty -Object $check -Name "maxMeanRegressionPercent" -DefaultValue $defaultMeanRegression) - $maxAllocRegressionPct = [double](Get-OptionalProperty -Object $check -Name "maxAllocRegressionPercent" -DefaultValue $defaultAllocRegressionPct) - $maxAllocRegressionBytes = [double](Get-OptionalProperty -Object $check -Name "maxAllocRegressionBytes" -DefaultValue $defaultAllocRegressionBytes) - - $overrides = Get-OptionalProperty -Object $check -Name "overrides" -DefaultValue @() - foreach ($override in $overrides) - { - if (Test-RowMatch -Row $baselineRow -Criteria (Get-OptionalProperty -Object $override -Name "match" -DefaultValue $null)) - { - $maxMeanRegression = [double](Get-OptionalProperty -Object $override -Name "maxMeanRegressionPercent" -DefaultValue $maxMeanRegression) - $maxAllocRegressionPct = [double](Get-OptionalProperty -Object $override -Name "maxAllocRegressionPercent" -DefaultValue $maxAllocRegressionPct) - $maxAllocRegressionBytes = [double](Get-OptionalProperty -Object $override -Name "maxAllocRegressionBytes" -DefaultValue $maxAllocRegressionBytes) - } - } - - $skipMeanComparison = [bool](Get-OptionalProperty -Object $check -Name "skipMeanComparison" -DefaultValue $false) - $extraInfoChecks = Get-OptionalProperty -Object $check -Name "extraInfoChecks" -DefaultValue @() $extraInfoEvaluation = Test-ExtraInfoChecks -BaselineRow $baselineRow -CurrentRow $currentRow -Checks $extraInfoChecks $meanRegressed = (-not $skipMeanComparison) -and ($meanDeltaPct -gt $maxMeanRegression) @@ -1144,14 +1356,15 @@ foreach ($check in $config.checks) $failureCount++ } + $metricThresholdNote = if ($skipMeanComparison) { "$metricColumn skipped" } else { "$metricColumn<=${maxMeanRegression}%" } $thresholdNote = if ($allocComparisonEnabled) { - "$(if ($skipMeanComparison) { "Mean skipped" } else { "Mean<=${maxMeanRegression}%" }) ; Alloc<=${maxAllocRegressionPct}% or +${maxAllocRegressionBytes}B" + "$metricThresholdNote ; Alloc<=${maxAllocRegressionPct}% or +${maxAllocRegressionBytes}B" } else { - "$(if ($skipMeanComparison) { "Mean skipped" } else { "Mean<=${maxMeanRegression}%" }) ; Alloc skipped" + "$metricThresholdNote ; Alloc skipped" } if ($extraInfoEvaluation.Notes.Count -gt 0) @@ -1170,6 +1383,7 @@ foreach ($check in $config.checks) Add-ComparisonResult -Results $results ` -Csv $csvName ` -Key $rowKey ` + -Metric $metricColumn ` -BaselineMean (Format-Nanoseconds -Nanoseconds $baselineMeanNs) ` -CurrentMean (Format-Nanoseconds -Nanoseconds $currentMeanNs) ` -MeanDeltaPct ("{0:N2}" -f $meanDeltaPct) ` @@ -1189,7 +1403,7 @@ $skipCount = @($results | Where-Object Status -eq "SKIP").Count $rowCount = $results.Count $summary = "Compared $rowCount rows against baseline. PASS=$passCount, WARN=$warnCount, SKIP=$skipCount, FAIL=$failureCount" Write-Host $summary -$results | Sort-Object Csv, Key | Format-Table Csv, Key, MeanDeltaPct, AllocDeltaPct, AllocDeltaBytes, Enforcement, Status -AutoSize +$results | Sort-Object Csv, Key | Format-Table Csv, Key, Metric, MeanDeltaPct, AllocDeltaPct, AllocDeltaBytes, Enforcement, Status -AutoSize if (-not [string]::IsNullOrWhiteSpace($ReportPath)) { @@ -1220,13 +1434,13 @@ if (-not [string]::IsNullOrWhiteSpace($ReportPath)) $lines.Add("") $lines.Add($summary) $lines.Add("") - $lines.Add("| CSV | Key | Mean Delta% | Alloc Delta% | Alloc Delta B | Enforcement | Status | Notes |") - $lines.Add("|---|---|---:|---:|---:|---|---|---|") + $lines.Add("| CSV | Key | Metric | Metric Delta% | Alloc Delta% | Alloc Delta B | Enforcement | Status | Notes |") + $lines.Add("|---|---|---|---:|---:|---:|---|---|---|") foreach ($result in ($results | Sort-Object Csv, Key)) { $notes = [string]$result.Notes $notes = $notes -replace "\|", "\\|" - $lines.Add("| $($result.Csv) | $($result.Key) | $($result.MeanDeltaPct) | $($result.AllocDeltaPct) | $($result.AllocDeltaBytes) | $($result.Enforcement) | $($result.Status) | $notes |") + $lines.Add("| $($result.Csv) | $($result.Key) | $($result.Metric) | $($result.MeanDeltaPct) | $($result.AllocDeltaPct) | $($result.AllocDeltaBytes) | $($result.Enforcement) | $($result.Status) | $notes |") } Set-Content -Path $ReportPath -Value $lines -Encoding UTF8 diff --git a/tests/CSharpDB.Data.Tests/AdaptiveQueryReoptimizationConnectionTests.cs b/tests/CSharpDB.Data.Tests/AdaptiveQueryReoptimizationConnectionTests.cs new file mode 100644 index 00000000..562c8842 --- /dev/null +++ b/tests/CSharpDB.Data.Tests/AdaptiveQueryReoptimizationConnectionTests.cs @@ -0,0 +1,59 @@ +using CSharpDB.Engine; + +namespace CSharpDB.Data.Tests; + +public sealed class AdaptiveQueryReoptimizationConnectionTests +{ + private static CancellationToken Ct => TestContext.Current.CancellationToken; + + [Fact] + public void ConnectionStringBuilder_ParsesAdaptiveQueryReoptimization() + { + var builder = new CSharpDbConnectionStringBuilder( + "Data Source=bench.db;Adaptive Query Reoptimization=true"); + + Assert.Equal("bench.db", builder.DataSource); + Assert.True(builder.AdaptiveQueryReoptimization); + } + + [Fact] + public void EmbeddedResolver_AppliesAdaptiveQueryReoptimizationWhenNoDirectOptionsAreSupplied() + { + var builder = new CSharpDbConnectionStringBuilder( + "Data Source=bench.db;Adaptive Query Reoptimization=true"); + + ResolvedEmbeddedConfiguration configuration = + CSharpDbEmbeddedConfigurationResolver.Resolve(builder, null, null); + + Assert.True(configuration.HasRequestedTuning); + Assert.True(configuration.EffectiveAdaptiveQueryReoptimization); + Assert.True(configuration.EffectiveDirectDatabaseOptions.AdaptiveQueryReoptimization.Enabled); + } + + [Fact] + public void EmbeddedResolver_ExplicitDirectOptionsTakePrecedence() + { + var builder = new CSharpDbConnectionStringBuilder( + "Data Source=bench.db;Adaptive Query Reoptimization=true"); + var explicitOptions = new DatabaseOptions(); + + ResolvedEmbeddedConfiguration configuration = + CSharpDbEmbeddedConfigurationResolver.Resolve(builder, explicitOptions, null); + + Assert.Same(explicitOptions, configuration.EffectiveDirectDatabaseOptions); + Assert.False(configuration.EffectiveAdaptiveQueryReoptimization); + Assert.False(configuration.EffectiveDirectDatabaseOptions.AdaptiveQueryReoptimization.Enabled); + } + + [Fact] + public async Task OpenAsync_RejectsAdaptiveQueryReoptimizationForRemoteConnections() + { + await using var connection = new CSharpDbConnection( + "Endpoint=http://localhost:5820;Transport=Grpc;Adaptive Query Reoptimization=true"); + + var ex = await Assert.ThrowsAsync( + () => connection.OpenAsync(Ct)); + + Assert.Contains("remote host", ex.Message, StringComparison.OrdinalIgnoreCase); + } +} diff --git a/tests/CSharpDB.DataGen/DataGenOptions.cs b/tests/CSharpDB.DataGen/DataGenOptions.cs index f6f6e529..a7e1ecfc 100644 --- a/tests/CSharpDB.DataGen/DataGenOptions.cs +++ b/tests/CSharpDB.DataGen/DataGenOptions.cs @@ -28,6 +28,7 @@ public sealed class DataGenOptions public double HotKeyRate { get; init; } = 0.20; public double RecentRate { get; init; } = 0.80; public int AvgDocSizeBytes { get; init; } = 1024; + public int MaxDirectDocumentSizeBytes { get; init; } = 2048; public int TenantCount { get; init; } = 250; public int DeviceCount { get; init; } = 100_000; public int OrdersPerCustomer { get; init; } = 5; @@ -115,6 +116,7 @@ public static DataGenOptions Parse(string[] args) HotKeyRate = ParseDouble(values, "hot-key-rate", 0.20), RecentRate = ParseDouble(values, "recent-rate", 0.80), AvgDocSizeBytes = ParseInt(values, "avg-size", ParseInt(values, "avg-doc-size", 1024)), + MaxDirectDocumentSizeBytes = ParseInt(values, "max-direct-doc-size", ParseInt(values, "max-direct-document-size", 2048)), TenantCount = ParseInt(values, "tenant-count", 250), DeviceCount = ParseInt(values, "device-count", defaultDeviceCount), OrdersPerCustomer = ParseInt(values, "orders-per-customer", 5), @@ -211,6 +213,9 @@ private static void Validate(DataGenOptions options) if (options.AvgDocSizeBytes <= 0) throw new DataGenUsageException("--avg-size must be a positive integer."); + if (options.MaxDirectDocumentSizeBytes <= 0) + throw new DataGenUsageException("--max-direct-doc-size must be a positive integer."); + if (options.TenantCount <= 0) throw new DataGenUsageException("--tenant-count must be a positive integer."); diff --git a/tests/CSharpDB.DataGen/Generators/SpecDataGenerator.cs b/tests/CSharpDB.DataGen/Generators/SpecDataGenerator.cs index 84a9a4af..ca0e8b05 100644 --- a/tests/CSharpDB.DataGen/Generators/SpecDataGenerator.cs +++ b/tests/CSharpDB.DataGen/Generators/SpecDataGenerator.cs @@ -279,6 +279,7 @@ public IDisposable PushVariable(string name, object? value) "hotkeyrate" => Options.HotKeyRate, "recentrate" => Options.RecentRate, "avgdocsizebytes" or "avgsize" => Options.AvgDocSizeBytes, + "maxdirectdocumentsizebytes" or "maxdirectdocsize" => Options.MaxDirectDocumentSizeBytes, "tenantcount" => Options.TenantCount, "devicecount" => Options.DeviceCount, "orderspercustomer" => Options.OrdersPerCustomer, @@ -862,6 +863,17 @@ private static int EvaluateTargetJsonSize(JsonElement rule, RuleExecutionContext ? EvaluateRequiredValues(bucketsRule, context, $"{path}.buckets").Select(value => ConvertToInt32(value, path)).ToArray() : s_defaultDocumentSizeBuckets; + if (context.Options.DirectLoad) + { + int maxDirectSize = context.Options.MaxDirectDocumentSizeBytes; + averageBytes = Math.Min(averageBytes, maxDirectSize); + buckets = buckets + .Where(bucket => bucket <= maxDirectSize) + .Append(maxDirectSize) + .Distinct() + .ToArray(); + } + if (buckets.Length == 0) throw new InvalidOperationException($"targetJsonSize rule at '{path}' requires at least one bucket."); @@ -882,6 +894,9 @@ private static int EvaluateTargetJsonSize(JsonElement rule, RuleExecutionContext } int targetSize = ConvertToInt32(EvaluateRequired(rule, "targetSize", context, path), path); + if (context.Options.DirectLoad) + targetSize = Math.Min(targetSize, context.Options.MaxDirectDocumentSizeBytes); + Random rng = CreateRandom(rule, context, path, "padObjectToSize"); string? containerPath = GetOptionalString(rule, "containerPath"); string blobField = GetOptionalString(rule, "blobField") ?? "blob"; diff --git a/tests/CSharpDB.DataGen/Output/BinaryDirectLoader.cs b/tests/CSharpDB.DataGen/Output/BinaryDirectLoader.cs index f84c67c0..e8303b7f 100644 --- a/tests/CSharpDB.DataGen/Output/BinaryDirectLoader.cs +++ b/tests/CSharpDB.DataGen/Output/BinaryDirectLoader.cs @@ -1,7 +1,8 @@ using System.Diagnostics.CodeAnalysis; +using System.Text.Json; using CSharpDB.DataGen.Specs; using CSharpDB.Engine; -using System.Text.Json; +using CSharpDB.Primitives; namespace CSharpDB.DataGen.Output; @@ -15,7 +16,7 @@ public static async Task LoadSqlTablesAsync( { string dbPath = PrepareDatabasePath(options); - await using var db = await Database.OpenAsync(dbPath, ct); + await using var db = await Database.OpenAsync(dbPath, CreateDirectLoadOptions(), ct); foreach (string statement in SqlSpecBuilder.BuildSchemaScript(tables, includeIndexes: false) .Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries)) { @@ -52,7 +53,7 @@ public static async Task LoadCollectionsAsync( { string dbPath = PrepareDatabasePath(options); - await using var db = await Database.OpenAsync(dbPath, ct); + await using var db = await Database.OpenAsync(dbPath, CreateDirectLoadOptions(), ct); foreach (CollectionSpec collectionSpec in collections) { GeneratedCollectionSource source = GetRequiredCollectionSource(sources, collectionSpec.GeneratorKey); @@ -85,10 +86,12 @@ private static async Task LoadSqlTableAsync( CancellationToken ct) { var batch = db.PrepareInsertBatch(table.Name, batchSize); + var rowValues = new DbValue[table.Columns.Count]; foreach (IReadOnlyDictionary row in rows) { ct.ThrowIfCancellationRequested(); - batch.AddRow(SqlSpecBuilder.BuildDbValues(table, row)); + SqlSpecBuilder.WriteDbValues(table, row, rowValues); + batch.AddRow((ReadOnlySpan)rowValues); if (batch.Count >= batchSize) await FlushInsertBatchAsync(db, batch, ct); } @@ -198,6 +201,10 @@ private static string PrepareDatabasePath(DataGenOptions options) return fullPath; } + private static DatabaseOptions CreateDirectLoadOptions() + => new DatabaseOptions() + .ConfigureStorageEngine(static builder => builder.UseWriteOptimizedPreset()); + private static GeneratedSqlTableSource GetRequiredSource( IReadOnlyDictionary sources, string generatorKey) diff --git a/tests/CSharpDB.DataGen/Program.cs b/tests/CSharpDB.DataGen/Program.cs index 731bc1c7..4140807e 100644 --- a/tests/CSharpDB.DataGen/Program.cs +++ b/tests/CSharpDB.DataGen/Program.cs @@ -33,6 +33,8 @@ public static async Task Main(string[] args) Console.WriteLine($"Batch size : {options.BatchSize:N0}"); Console.WriteLine($"Write files : {options.WriteFiles}"); Console.WriteLine($"Direct load : {options.DirectLoad}"); + if (options.DirectLoad && options.Dataset == DatasetKind.Documents) + Console.WriteLine($"Direct doc cap: {options.MaxDirectDocumentSizeBytes:N0} bytes"); RunSummary summary = options.Dataset switch { @@ -225,6 +227,7 @@ private static GeneratedCollectionSource GetRequiredCollectionSource( HotKeyRate: options.HotKeyRate, RecentRate: options.RecentRate, AvgDocSizeBytes: options.AvgDocSizeBytes, + MaxDirectDocumentSizeBytes: options.MaxDirectDocumentSizeBytes, TenantCount: options.TenantCount, DeviceCount: options.DeviceCount, OrdersPerCustomer: options.OrdersPerCustomer, @@ -254,6 +257,7 @@ private static void PrintHelp() Console.WriteLine(" --null-rate <0..1> Sparse/null field rate"); Console.WriteLine(" --hot-key-rate <0..1> Fraction of traffic hitting the hot key band"); Console.WriteLine(" --recent-rate <0..1> Fraction of rows skewed toward recent timestamps"); + Console.WriteLine(" --max-direct-doc-size Max target JSON bytes for direct collection loads (default 2048)"); Console.WriteLine(); Console.WriteLine("Dataset-specific options:"); Console.WriteLine(" relational: --orders-per-customer --items-per-order --tenant-count "); @@ -293,6 +297,7 @@ private sealed record SerializableOptions( double HotKeyRate, double RecentRate, int AvgDocSizeBytes, + int MaxDirectDocumentSizeBytes, int TenantCount, int DeviceCount, int OrdersPerCustomer, diff --git a/tests/CSharpDB.DataGen/README.md b/tests/CSharpDB.DataGen/README.md index 7d8b7005..10860897 100644 --- a/tests/CSharpDB.DataGen/README.md +++ b/tests/CSharpDB.DataGen/README.md @@ -164,6 +164,10 @@ The spec format is documented in the [data generation plan](../../docs/CSharpDB- **Direct load** (`--load-direct`): writes rows or documents directly into a CSharpDB database file. This bypasses CSV/JSONL parsing entirely, which is useful when you want to isolate storage-engine performance from import overhead. +Direct SQL loads use the current recommended bulk path: `Database.PrepareInsertBatch(...)`, one explicit transaction per batch, `UseWriteOptimizedPreset()`, monotonic generated keys when the spec defines them, and secondary indexes built after the hot ingest phase when `--build-indexes` is set. The loader also reuses its per-table row buffer before copying values into the reusable `InsertBatch` buffers, so it avoids allocating a new `DbValue[]` for every generated row. + +Direct collection loads use `Collection` with one explicit transaction per batch and deferred index creation. That is the fastest general-purpose public API for arbitrary spec-shaped JSON today. The source-generated `GetGeneratedCollectionAsync(...)` path is faster for known typed models, but DataGen specs intentionally emit dynamic `JsonElement` documents, so generated collection models are not a safe default for this project. When `--load-direct` is enabled, document target sizes are capped by `--max-direct-doc-size` so generated collection payloads stay within the current inline B-tree value envelope; file-only JSONL generation is not capped this way. + **Skip files** (`--no-files`): when combined with `--load-direct`, skips file output entirely so you only get the database. ## Quick reference: all CLI options @@ -187,5 +191,6 @@ The spec format is documented in the [data generation plan](../../docs/CSharpDB- | `--items-per-order ` | 4 | Relational: average items per order | | `--tenant-count ` | 250 | Relational/docs: number of distinct tenants | | `--avg-size ` | 1024 | Docs: target average document size | +| `--max-direct-doc-size ` | 2048 | Docs/direct-load: max target JSON size for collection payloads | | `--source-database ` | -- | From-database: path to the existing CSharpDB to read schema from | | `--device-count ` | 100000 | Time-series: number of distinct devices | diff --git a/tests/CSharpDB.DataGen/Specs/SqlSpecBuilder.cs b/tests/CSharpDB.DataGen/Specs/SqlSpecBuilder.cs index c14ceff0..a2940001 100644 --- a/tests/CSharpDB.DataGen/Specs/SqlSpecBuilder.cs +++ b/tests/CSharpDB.DataGen/Specs/SqlSpecBuilder.cs @@ -36,15 +36,29 @@ public static DbValue[] BuildDbValues( IReadOnlyDictionary row) { var values = new DbValue[table.Columns.Count]; + WriteDbValues(table, row, values); + return values; + } + + public static void WriteDbValues( + SqlTableSpec table, + IReadOnlyDictionary row, + Span destination) + { + if (destination.Length < table.Columns.Count) + { + throw new ArgumentException( + $"Destination must have at least {table.Columns.Count} values.", + nameof(destination)); + } + for (int i = 0; i < table.Columns.Count; i++) { SqlColumnSpec column = table.Columns[i]; string sourceField = string.IsNullOrWhiteSpace(column.SourceField) ? column.Name : column.SourceField; row.TryGetValue(sourceField, out object? rawValue); - values[i] = ConvertToDbValue(column, rawValue); + destination[i] = ConvertToDbValue(column, rawValue); } - - return values; } public static IReadOnlyList GetCsvHeaders(SqlTableSpec table) diff --git a/tests/CSharpDB.Tests/AdaptiveQueryReoptimizationTests.cs b/tests/CSharpDB.Tests/AdaptiveQueryReoptimizationTests.cs new file mode 100644 index 00000000..7d63ecff --- /dev/null +++ b/tests/CSharpDB.Tests/AdaptiveQueryReoptimizationTests.cs @@ -0,0 +1,433 @@ +using System.Reflection; +using CSharpDB.Engine; +using CSharpDB.Execution; +using CSharpDB.Primitives; +using CSharpDB.Sql; + +namespace CSharpDB.Tests; + +public sealed class AdaptiveQueryReoptimizationTests +{ + private static CancellationToken Ct => TestContext.Current.CancellationToken; + + [Fact] + public void DatabaseOptions_DefaultsKeepAdaptiveReoptimizationDisabled() + { + var options = new DatabaseOptions(); + + Assert.False(options.AdaptiveQueryReoptimization.Enabled); + Assert.Equal(8, options.AdaptiveQueryReoptimization.DivergenceFactor); + Assert.Equal(4096, options.AdaptiveQueryReoptimization.MinimumObservedRows); + Assert.Equal(65536, options.AdaptiveQueryReoptimization.MaxBufferedRows); + Assert.Equal(1, options.AdaptiveQueryReoptimization.MaxReoptimizationsPerQuery); + } + + [Fact] + public void EnableAdaptiveQueryReoptimization_ConfiguresOptInOptions() + { + var original = new DatabaseOptions(); + + DatabaseOptions enabled = original.EnableAdaptiveQueryReoptimization(builder => builder + .WithDivergenceFactor(3) + .WithMinimumObservedRows(7) + .WithMaxBufferedRows(11) + .WithMaxReoptimizationsPerQuery(2)); + + Assert.False(original.AdaptiveQueryReoptimization.Enabled); + Assert.True(enabled.AdaptiveQueryReoptimization.Enabled); + Assert.Equal(3, enabled.AdaptiveQueryReoptimization.DivergenceFactor); + Assert.Equal(7, enabled.AdaptiveQueryReoptimization.MinimumObservedRows); + Assert.Equal(11, enabled.AdaptiveQueryReoptimization.MaxBufferedRows); + Assert.Equal(2, enabled.AdaptiveQueryReoptimization.MaxReoptimizationsPerQuery); + } + + [Fact] + public void EnableAdaptiveQueryReoptimization_RejectsInvalidThresholds() + { + var options = new DatabaseOptions(); + + Assert.Throws(() => + options.EnableAdaptiveQueryReoptimization(builder => builder.WithDivergenceFactor(1))); + Assert.Throws(() => + options.EnableAdaptiveQueryReoptimization(builder => builder.WithMinimumObservedRows(0))); + Assert.Throws(() => + options.EnableAdaptiveQueryReoptimization(builder => builder.WithMaxBufferedRows(0))); + Assert.Throws(() => + options.EnableAdaptiveQueryReoptimization(builder => builder.WithMaxReoptimizationsPerQuery(-1))); + } + + [Fact] + public async Task AdaptiveIndexNestedLoop_SwitchesToHashAlternativeBeforeRowsAreEmitted() + { + var counters = new AdaptiveRuntimeCounters(); + var rows = CreateSingleColumnRows(5); + var outputSchema = OneIntegerColumnSchema(); + bool lookupChosen = false; + bool hashChosen = false; + + var lease = new AdaptiveQueryExecutionLease(new AdaptiveQueryReoptimizationOptions + { + Enabled = true, + DivergenceFactor = 2, + MinimumObservedRows = 1, + MaxBufferedRows = 16, + MaxReoptimizationsPerQuery = 1, + }); + var op = new AdaptiveIndexNestedLoopJoinOperator( + new MaterializedOperator(rows, outputSchema), + new MaterializedOperator([], outputSchema), + outputSchema, + source => + { + lookupChosen = true; + return source; + }, + source => + { + hashChosen = true; + return source; + }, + lease, + counters.Diagnostics, + estimatedOuterRows: 1, + estimatedRowCount: null); + + List actualRows = await ReadAllRowsAsync(op); + + Assert.False(lookupChosen); + Assert.True(hashChosen); + Assert.Equal([1L, 2L, 3L, 4L, 5L], actualRows.Select(row => row[0].AsInteger).ToArray()); + Assert.Equal(1, counters.AttemptCount); + Assert.Equal(1, counters.DivergenceCount); + Assert.Equal(1, counters.SuccessfulSwitchCount); + Assert.Equal(3, counters.BufferedRowCount); + } + + [Fact] + public async Task AdaptiveIndexNestedLoop_FallsBackWhenBufferedRowCapIsReached() + { + var counters = new AdaptiveRuntimeCounters(); + var rows = CreateSingleColumnRows(5); + var outputSchema = OneIntegerColumnSchema(); + bool lookupChosen = false; + bool hashChosen = false; + + var lease = new AdaptiveQueryExecutionLease(new AdaptiveQueryReoptimizationOptions + { + Enabled = true, + DivergenceFactor = 2, + MinimumObservedRows = 1, + MaxBufferedRows = 2, + MaxReoptimizationsPerQuery = 1, + }); + var op = new AdaptiveIndexNestedLoopJoinOperator( + new MaterializedOperator(rows, outputSchema), + new MaterializedOperator([], outputSchema), + outputSchema, + source => + { + lookupChosen = true; + return source; + }, + source => + { + hashChosen = true; + return source; + }, + lease, + counters.Diagnostics, + estimatedOuterRows: 1, + estimatedRowCount: null); + + List actualRows = await ReadAllRowsAsync(op); + + Assert.True(lookupChosen); + Assert.False(hashChosen); + Assert.Equal([1L, 2L, 3L, 4L, 5L], actualRows.Select(row => row[0].AsInteger).ToArray()); + Assert.Equal(1, counters.AttemptCount); + Assert.Equal(1, counters.RejectedSwitchCount); + Assert.Equal(1, counters.MaxBufferedFallbackCount); + Assert.Equal(2, counters.BufferedRowCount); + } + + [Fact] + public async Task AdaptiveHashJoin_SwitchesBuildSideWhenObservedBuildSideDiverges() + { + var counters = new AdaptiveRuntimeCounters(); + ColumnDefinition[] leftSchema = + [ + new() { Name = "id", Type = DbType.Integer }, + new() { Name = "code", Type = DbType.Integer }, + ]; + ColumnDefinition[] rightSchema = + [ + new() { Name = "code", Type = DbType.Integer }, + new() { Name = "payload", Type = DbType.Integer }, + ]; + var compositeSchema = new TableSchema + { + TableName = "join", + Columns = leftSchema.Concat(rightSchema).ToArray(), + }; + + var leftRows = new List + { + new[] { DbValue.FromInteger(1), DbValue.FromInteger(1) }, + new[] { DbValue.FromInteger(2), DbValue.FromInteger(2) }, + }; + var rightRows = Enumerable.Range(1, 6) + .Select(i => new[] { DbValue.FromInteger(i), DbValue.FromInteger(i * 10) }) + .ToList(); + + var lease = new AdaptiveQueryExecutionLease(new AdaptiveQueryReoptimizationOptions + { + Enabled = true, + DivergenceFactor = 2, + MinimumObservedRows = 1, + MaxBufferedRows = 16, + MaxReoptimizationsPerQuery = 1, + }); + var op = new AdaptiveHashJoinOperator( + new MaterializedOperator(leftRows, leftSchema), + new MaterializedOperator(rightRows, rightSchema), + JoinType.Inner, + residualCondition: null, + compositeSchema, + leftColCount: 2, + rightColCount: 2, + leftKeyIndices: [1], + rightKeyIndices: [0], + plannedBuildRightSide: true, + estimatedLeftRows: 2, + estimatedRightRows: 1, + estimatedRowCount: null, + DbFunctionRegistry.Empty, + lease, + counters.Diagnostics); + + List actualRows = await ReadAllRowsAsync(op); + + Assert.Equal(2, actualRows.Count); + Assert.Equal([1L, 2L], actualRows.Select(row => row[0].AsInteger).Order().ToArray()); + Assert.Equal(1, counters.AttemptCount); + Assert.Equal(1, counters.DivergenceCount); + Assert.Equal(1, counters.SuccessfulSwitchCount); + } + + [Fact] + public async Task AdaptiveReoptimization_IsSilentWhenDisabled() + { + await using var db = await Database.OpenInMemoryAsync(Ct); + await SetupHashJoinTablesAsync(db); + + await using var result = await db.ExecuteAsync( + "SELECT l.id, r.payload FROM adaptive_left l JOIN adaptive_right r ON l.code = r.code", + Ct); + var rows = await result.ToListAsync(Ct); + var diagnostics = db.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + + Assert.Equal(3, rows.Count); + Assert.Equal(0, diagnostics.EligibleQueryCount); + Assert.Equal(0, diagnostics.AttemptCount); + Assert.DoesNotContain(EnumerateOperatorTree(GetRootOperator(result)), op => op is AdaptiveHashJoinOperator); + } + + [Fact] + public async Task AdaptiveReoptimization_WrapsEligibleHashJoinWhenEnabled() + { + await using var db = await Database.OpenInMemoryAsync( + new DatabaseOptions().EnableAdaptiveQueryReoptimization(builder => builder + .WithDivergenceFactor(2) + .WithMinimumObservedRows(1) + .WithMaxBufferedRows(64)), + Ct); + await SetupHashJoinTablesAsync(db); + db.ResetAdaptiveQueryReoptimizationDiagnostics(); + + await using var result = await db.ExecuteAsync( + "SELECT l.id, r.payload FROM adaptive_left l JOIN adaptive_right r ON l.code = r.code", + Ct); + var rows = await result.ToListAsync(Ct); + var diagnostics = db.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + + Assert.Equal(3, rows.Count); + Assert.Contains(EnumerateOperatorTree(GetRootOperator(result)), op => op is AdaptiveHashJoinOperator); + Assert.Equal(1, diagnostics.EligibleQueryCount); + Assert.Equal(1, diagnostics.AttemptCount); + Assert.True(diagnostics.BufferedRowCount > 0); + } + + [Fact] + public async Task AdaptiveReoptimization_PreservesLeftJoinNullExtension() + { + await using var db = await Database.OpenInMemoryAsync( + new DatabaseOptions().EnableAdaptiveQueryReoptimization(builder => builder + .WithDivergenceFactor(2) + .WithMinimumObservedRows(1) + .WithMaxBufferedRows(64)), + Ct); + + await db.ExecuteAsync("CREATE TABLE adaptive_left (id INTEGER PRIMARY KEY, code INTEGER NOT NULL)", Ct); + await db.ExecuteAsync("CREATE TABLE adaptive_right (code INTEGER PRIMARY KEY, payload INTEGER NOT NULL)", Ct); + await db.ExecuteAsync("INSERT INTO adaptive_left VALUES (1, 1)", Ct); + await db.ExecuteAsync("INSERT INTO adaptive_left VALUES (2, 2)", Ct); + await db.ExecuteAsync("INSERT INTO adaptive_right VALUES (1, 10)", Ct); + db.ResetAdaptiveQueryReoptimizationDiagnostics(); + + await using var result = await db.ExecuteAsync( + "SELECT l.id, r.payload FROM adaptive_left l LEFT JOIN adaptive_right r ON r.code = l.code ORDER BY l.id", + Ct); + var rows = await result.ToListAsync(Ct); + var diagnostics = db.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + + Assert.Equal(2, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal(10L, rows[0][1].AsInteger); + Assert.Equal(2L, rows[1][0].AsInteger); + Assert.True(rows[1][1].IsNull); + Assert.Equal(1, diagnostics.EligibleQueryCount); + Assert.Equal(1, diagnostics.AttemptCount); + } + + [Theory] + [InlineData("SELECT * FROM adaptive_left l JOIN adaptive_right r ON l.code = r.code")] + [InlineData("SELECT l.id FROM adaptive_left l CROSS JOIN adaptive_right r")] + [InlineData("SELECT l.id FROM adaptive_left l JOIN adaptive_right r ON l.code = r.code UNION SELECT id FROM adaptive_left")] + public async Task AdaptiveReoptimization_DoesNotAdaptUnsupportedShapes(string sql) + { + await using var db = await Database.OpenInMemoryAsync( + new DatabaseOptions().EnableAdaptiveQueryReoptimization(), + Ct); + await SetupHashJoinTablesAsync(db); + db.ResetAdaptiveQueryReoptimizationDiagnostics(); + + await using var result = await db.ExecuteAsync(sql, Ct); + _ = await result.ToListAsync(Ct); + var diagnostics = db.GetAdaptiveQueryReoptimizationDiagnosticsSnapshot(); + + Assert.Equal(0, diagnostics.EligibleQueryCount); + Assert.Equal(0, diagnostics.AttemptCount); + } + + private static async ValueTask SetupHashJoinTablesAsync(Database db) + { + await db.ExecuteAsync("CREATE TABLE adaptive_left (id INTEGER PRIMARY KEY, code INTEGER NOT NULL)", Ct); + await db.ExecuteAsync("CREATE TABLE adaptive_right (id INTEGER PRIMARY KEY, code INTEGER NOT NULL, payload INTEGER NOT NULL)", Ct); + + for (int i = 1; i <= 3; i++) + { + await db.ExecuteAsync($"INSERT INTO adaptive_left VALUES ({i}, {i})", Ct); + await db.ExecuteAsync($"INSERT INTO adaptive_right VALUES ({i}, {i}, {i * 10})", Ct); + } + } + + private static async Task> ReadAllRowsAsync(IOperator op) + { + var rows = new List(); + await op.OpenAsync(Ct); + try + { + while (await op.MoveNextAsync(Ct)) + rows.Add((DbValue[])op.Current.Clone()); + } + finally + { + await op.DisposeAsync(); + } + + return rows; + } + + private static List CreateSingleColumnRows(int count) + { + var rows = new List(count); + for (int i = 1; i <= count; i++) + rows.Add([DbValue.FromInteger(i)]); + + return rows; + } + + private static ColumnDefinition[] OneIntegerColumnSchema() + => [new ColumnDefinition { Name = "value", Type = DbType.Integer }]; + + private static IOperator GetRootOperator(QueryResult result) + { + IOperator storedOperator = GetStoredOperator(result); + return storedOperator is BatchToRowOperatorAdapter batchAdapter + ? batchAdapter.BatchSource as IOperator + ?? throw new InvalidOperationException("Batch adapter did not expose an operator root.") + : storedOperator; + } + + private static IOperator GetStoredOperator(QueryResult result) + { + var operatorField = typeof(QueryResult).GetField("_operator", BindingFlags.Instance | BindingFlags.NonPublic) + ?? throw new InvalidOperationException("QueryResult operator field not found."); + var storedOperator = (IOperator?)operatorField.GetValue(result); + if (storedOperator != null) + return storedOperator; + + var batchOperatorField = typeof(QueryResult).GetField("_batchOperator", BindingFlags.Instance | BindingFlags.NonPublic) + ?? throw new InvalidOperationException("QueryResult batch operator field not found."); + return (IOperator?)batchOperatorField.GetValue(result) + ?? throw new InvalidOperationException("QueryResult did not contain an operator."); + } + + private static IEnumerable EnumerateOperatorTree(IOperator? start) + { + for (IOperator? current = start; current != null;) + { + yield return current; + + if (current is BatchToRowOperatorAdapter batchAdapter && + batchAdapter.BatchSource is IOperator batchOperator) + { + current = batchOperator; + continue; + } + + current = current is IUnaryOperatorSource unary ? unary.Source : null; + } + } + + private sealed class AdaptiveRuntimeCounters + { + public long AttemptCount { get; private set; } + public long SuccessfulSwitchCount { get; private set; } + public long RejectedSwitchCount { get; private set; } + public long DivergenceCount { get; private set; } + public long BufferedRowCount { get; private set; } + public long MaxBufferedFallbackCount { get; private set; } + public long ReoptimizationLimitFallbackCount { get; private set; } + public long UnsupportedFallbackCount { get; private set; } + + public AdaptiveQueryReoptimizationRuntimeDiagnostics Diagnostics { get; } + + public AdaptiveRuntimeCounters() + { + Diagnostics = new AdaptiveQueryReoptimizationRuntimeDiagnostics( + () => AttemptCount++, + () => SuccessfulSwitchCount++, + RecordRejectedSwitch, + () => DivergenceCount++, + count => BufferedRowCount += count); + } + + private void RecordRejectedSwitch(AdaptiveQueryReoptimizationFallbackReason reason) + { + RejectedSwitchCount++; + switch (reason) + { + case AdaptiveQueryReoptimizationFallbackReason.MaxBufferedRows: + MaxBufferedFallbackCount++; + break; + case AdaptiveQueryReoptimizationFallbackReason.ReoptimizationLimit: + ReoptimizationLimitFallbackCount++; + break; + case AdaptiveQueryReoptimizationFallbackReason.Unsupported: + UnsupportedFallbackCount++; + break; + } + } + } +} diff --git a/tests/CSharpDB.Tests/ClientSqlExecutionTests.cs b/tests/CSharpDB.Tests/ClientSqlExecutionTests.cs index f0864003..f288af05 100644 --- a/tests/CSharpDB.Tests/ClientSqlExecutionTests.cs +++ b/tests/CSharpDB.Tests/ClientSqlExecutionTests.cs @@ -5,6 +5,48 @@ namespace CSharpDB.Tests; public sealed class ClientSqlExecutionTests { + [Fact] + public async Task ExecuteSqlAsync_PublicPlannerDiagnostics_WorkThroughDirectClient() + { + var ct = TestContext.Current.CancellationToken; + string dbPath = Path.Combine(Path.GetTempPath(), $"csharpdb_client_test_{Guid.NewGuid():N}.db"); + + try + { + await using var client = CSharpDbClient.Create(new CSharpDbClientOptions + { + DataSource = dbPath, + }); + + Assert.Null((await client.ExecuteSqlAsync("CREATE TABLE planner_client (id INTEGER PRIMARY KEY, value INTEGER);", ct)).Error); + Assert.Null((await client.ExecuteSqlAsync("INSERT INTO planner_client VALUES (1, 7), (2, 7), (3, 8);", ct)).Error); + Assert.Null((await client.ExecuteSqlAsync("ANALYZE planner_client;", ct)).Error); + + var catalog = await client.ExecuteSqlAsync( + "SELECT COUNT(*) FROM sys.planner_heavy_hitters WHERE table_name = 'planner_client' AND column_name = 'value';", + ct); + Assert.Null(catalog.Error); + Assert.True(catalog.IsQuery); + Assert.NotNull(catalog.Rows); + Assert.True(Convert.ToInt64(Assert.Single(catalog.Rows)[0], CultureInfo.InvariantCulture) > 0); + + var explain = await client.ExecuteSqlAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_client WHERE value = 7;", + ct); + Assert.Null(explain.Error); + Assert.True(explain.IsQuery); + Assert.NotNull(explain.ColumnNames); + Assert.Contains("decision", explain.ColumnNames); + Assert.NotNull(explain.Rows); + Assert.Contains(explain.Rows, row => string.Equals(Convert.ToString(row[4], CultureInfo.InvariantCulture), "heavy-hitter", StringComparison.Ordinal)); + } + finally + { + DeleteIfExists(dbPath); + DeleteIfExists(dbPath + ".wal"); + } + } + [Fact] public async Task ExecuteSqlAsync_HandlesTriggerBodyAndFinalStatementWithoutSemicolon() { @@ -46,6 +88,38 @@ INSERT INTO orders VALUES (1, 3) } } + [Fact] + public async Task ExecuteSqlAsync_ReturnsQueryColumnTypes() + { + var ct = TestContext.Current.CancellationToken; + string dbPath = Path.Combine(Path.GetTempPath(), $"csharpdb_client_test_{Guid.NewGuid():N}.db"); + + try + { + await using var client = CSharpDbClient.Create(new CSharpDbClientOptions + { + DataSource = dbPath, + }); + + Assert.Null((await client.ExecuteSqlAsync("CREATE TABLE client_types (id INTEGER, code TEXT, amount REAL);", ct)).Error); + Assert.Null((await client.ExecuteSqlAsync("INSERT INTO client_types VALUES (1, 'A', 12.5);", ct)).Error); + + var result = await client.ExecuteSqlAsync("SELECT id, code, amount FROM client_types;", ct); + + Assert.Null(result.Error); + Assert.True(result.IsQuery); + Assert.NotNull(result.ColumnNames); + Assert.NotNull(result.ColumnTypes); + Assert.Equal(["id", "code", "amount"], result.ColumnNames); + Assert.Equal(["INTEGER", "TEXT", "REAL"], result.ColumnTypes); + } + finally + { + DeleteIfExists(dbPath); + DeleteIfExists(dbPath + ".wal"); + } + } + [Fact] public async Task ExecuteSqlAsync_SelectDateWithoutFrom_ReturnsDateText() { diff --git a/tests/CSharpDB.Tests/DatabaseConcurrencyTests.cs b/tests/CSharpDB.Tests/DatabaseConcurrencyTests.cs index 8c569434..d1f6cb28 100644 --- a/tests/CSharpDB.Tests/DatabaseConcurrencyTests.cs +++ b/tests/CSharpDB.Tests/DatabaseConcurrencyTests.cs @@ -298,6 +298,85 @@ public async Task ConcurrentExplicitWriteTransactions_ImplicitIdentityInsertsRes Assert.Equal(2L, rows[1][1].AsInteger); } + [Fact] + public async Task ConcurrentExplicitWriteTransactions_ImplicitIdentityMultiRowInsertsReserveRanges() + { + var ct = TestContext.Current.CancellationToken; + await _db.ExecuteAsync("CREATE TABLE identity_range_bench (id INTEGER PRIMARY KEY IDENTITY, writer INTEGER)", ct); + _db.ResetRowIdReservationDiagnostics(); + + await using var tx1 = await _db.BeginWriteTransactionAsync(ct); + await using var tx2 = await _db.BeginWriteTransactionAsync(ct); + + await tx1.ExecuteAsync("INSERT INTO identity_range_bench (writer) VALUES (1), (1), (1)", ct); + await tx2.ExecuteAsync("INSERT INTO identity_range_bench (writer) VALUES (2), (2)", ct); + + await tx1.CommitAsync(ct); + await tx2.CommitAsync(ct); + + await using var result = await _db.ExecuteAsync( + "SELECT id, writer FROM identity_range_bench ORDER BY id", + ct); + var rows = await result.ToListAsync(ct); + + Assert.Equal(5, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal(3L, rows[2][0].AsInteger); + Assert.Equal(4L, rows[3][0].AsInteger); + Assert.Equal(5L, rows[4][0].AsInteger); + + RowIdReservationDiagnosticsSnapshot diagnostics = _db.GetRowIdReservationDiagnosticsSnapshot(); + Assert.Equal(2, diagnostics.ReservationCount); + Assert.Equal(5, diagnostics.ReservedRowIdCount); + } + + [Fact] + public async Task ConcurrentExplicitWriteTransactions_RolledBackIdentityRangeDoesNotDuplicateAfterReopen() + { + var ct = TestContext.Current.CancellationToken; + string dbPath = Path.Combine(Path.GetTempPath(), $"csharpdb_identity_range_reopen_{Guid.NewGuid():N}.db"); + + Database? db = null; + try + { + db = await Database.OpenAsync(dbPath, ct); + await db.ExecuteAsync("CREATE TABLE identity_range_reopen (id INTEGER PRIMARY KEY IDENTITY, writer INTEGER)", ct); + + await using (var rollbackTx = await db.BeginWriteTransactionAsync(ct)) + { + await rollbackTx.ExecuteAsync("INSERT INTO identity_range_reopen (writer) VALUES (1), (1), (1)", ct); + } + + await using (var commitTx = await db.BeginWriteTransactionAsync(ct)) + { + await commitTx.ExecuteAsync("INSERT INTO identity_range_reopen (writer) VALUES (2)", ct); + await commitTx.CommitAsync(ct); + } + + await db.DisposeAsync(); + db = await Database.OpenAsync(dbPath, ct); + + await db.ExecuteAsync("INSERT INTO identity_range_reopen (writer) VALUES (3)", ct); + + await using var result = await db.ExecuteAsync( + "SELECT id, writer FROM identity_range_reopen ORDER BY id", + ct); + var rows = await result.ToListAsync(ct); + + Assert.Equal(2, rows.Count); + Assert.Equal(4L, rows[0][0].AsInteger); + Assert.Equal(2L, rows[0][1].AsInteger); + Assert.Equal(5L, rows[1][0].AsInteger); + Assert.Equal(3L, rows[1][1].AsInteger); + } + finally + { + if (db is not null) + await db.DisposeAsync(); + await DeleteDatabaseFilesAsync(dbPath); + } + } + [Fact] public async Task ConcurrentExplicitWriteTransactions_ImplicitIdentityInsertBurst_CompletesUnderDurableGroupCommit() { @@ -367,6 +446,66 @@ await tx.ExecuteAsync( } } + [Fact] + public async Task ConcurrentExplicitWriteTransactions_PendingRightEdgeInsertRebasesBeforeDurablePublish() + { + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(TestContext.Current.CancellationToken); + linkedCts.CancelAfter(TimeSpan.FromSeconds(10)); + CancellationToken ct = linkedCts.Token; + + string dbPath = Path.Combine(Path.GetTempPath(), $"csharpdb_pending_leaf_rebase_{Guid.NewGuid():N}.db"); + var options = new DatabaseOptions() + .ConfigureStorageEngine(builder => builder.UseDurableGroupCommit(TimeSpan.FromMilliseconds(250))); + + Database? db = null; + try + { + db = await Database.OpenAsync(dbPath, options, ct); + await db.ExecuteAsync("CREATE TABLE pending_rebase (id INTEGER PRIMARY KEY, writer INTEGER)", ct); + db.ResetCommitPathDiagnostics(); + + await using var tx1 = await db.BeginWriteTransactionAsync(ct); + await using var tx2 = await db.BeginWriteTransactionAsync(ct); + + await tx1.ExecuteAsync("INSERT INTO pending_rebase VALUES (1, 1)", ct); + await tx2.ExecuteAsync("INSERT INTO pending_rebase VALUES (2, 2)", ct); + + Task tx1Commit = tx1.CommitAsync(ct).AsTask(); + await WaitForConditionAsync( + () => + { + CommitPathDiagnosticsSnapshot diagnostics = db.GetCommitPathDiagnosticsSnapshot(); + return diagnostics.ExplicitPendingCommitReservationCount >= 1 && !tx1Commit.IsCompleted; + }, + TimeSpan.FromSeconds(2), + ct); + + await tx2.CommitAsync(ct); + await tx1Commit; + + await using var result = await db.ExecuteAsync( + "SELECT id, writer FROM pending_rebase ORDER BY id", + ct); + var rows = await result.ToListAsync(ct); + + Assert.Equal(2, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal(1L, rows[0][1].AsInteger); + Assert.Equal(2L, rows[1][0].AsInteger); + Assert.Equal(2L, rows[1][1].AsInteger); + + CommitPathDiagnosticsSnapshot finalDiagnostics = db.GetCommitPathDiagnosticsSnapshot(); + Assert.True(finalDiagnostics.ExplicitPendingLeafRebaseAttemptCount > 0); + Assert.True(finalDiagnostics.ExplicitPendingLeafRebaseSuccessCount > 0); + } + finally + { + if (db is not null) + await db.DisposeAsync(); + await DeleteDatabaseFilesAsync(dbPath); + } + } + [Fact] public async Task ConcurrentExplicitWriteTransactions_NonMergeableSameLeafUpdatesStillConflictWithoutRetry() { diff --git a/tests/CSharpDB.Tests/IntegrationTests.cs b/tests/CSharpDB.Tests/IntegrationTests.cs index e015e397..82995412 100644 --- a/tests/CSharpDB.Tests/IntegrationTests.cs +++ b/tests/CSharpDB.Tests/IntegrationTests.cs @@ -3180,6 +3180,159 @@ public async Task SimpleIndexNestedLoopJoinWithLimit_UsesBatchLimitOverIndexNest Assert.Equal(2L, rows[1][0].AsInteger); } + [Fact] + public async Task MultiLookupInnerJoinChainWithLimit_PreservesStreamingLookupOrder() + { + var ct = TestContext.Current.CancellationToken; + await CreateLowStockJoinShapeAsync("limit_lookup", createView: false, ct); + + var planner = GetPlanner(); + var statement = Parser.Parse( + """ + SELECT ip.id, w.name, p.sku, s.name + FROM limit_lookup_inventory_positions ip + INNER JOIN limit_lookup_warehouses w ON w.id = ip.warehouse_id + INNER JOIN limit_lookup_products p ON p.id = ip.product_id + INNER JOIN limit_lookup_suppliers s ON s.id = p.preferred_supplier_id + LIMIT 3 + """) as SelectStatement + ?? throw new InvalidOperationException("Expected SELECT statement."); + + await using var result = await planner.ExecuteAsync(statement, ct); + Assert.True(UsesDirectBatchStorage(result)); + + var rootOperator = Assert.IsType(GetRootOperator(result)); + var lookupJoin = FindOperatorInUnaryChain( + GetPrivateField(rootOperator, "_source")); + Assert.IsAssignableFrom(lookupJoin); + Assert.Null(FindOperatorInTree(rootOperator)); + + var rows = await result.ToListAsync(ct); + Assert.Equal(3, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal("wh_1", rows[0][1].AsText); + Assert.Equal("sku_1", rows[0][2].AsText); + } + + [Fact] + public async Task SimpleViewJoinChainWithLimit_PreservesStreamingLookupOrder() + { + var ct = TestContext.Current.CancellationToken; + await CreateLowStockJoinShapeAsync("limit_view", createView: true, ct); + + var planner = GetPlanner(); + var statement = Parser.Parse("SELECT * FROM limit_view_low_stock_watch LIMIT 3") as SelectStatement + ?? throw new InvalidOperationException("Expected SELECT statement."); + + await using var result = await planner.ExecuteAsync(statement, ct); + Assert.True(UsesDirectBatchStorage(result)); + + var rootOperator = Assert.IsType(GetRootOperator(result)); + var lookupJoin = FindOperatorInUnaryChain( + GetPrivateField(rootOperator, "_source")); + Assert.IsAssignableFrom(lookupJoin); + Assert.Null(FindOperatorInTree(rootOperator)); + + var rows = await result.ToListAsync(ct); + Assert.Equal(3, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal("wh_1", rows[0][1].AsText); + Assert.Equal("sku_1", rows[0][2].AsText); + } + + [Fact] + public async Task SimpleViewJoinChainWithSmallLimitOffset_PreservesStreamingLookupOrder() + { + var ct = TestContext.Current.CancellationToken; + await CreateLowStockJoinShapeAsync("limit_offset_view", createView: true, ct); + + var planner = GetPlanner(); + var statement = Parser.Parse("SELECT * FROM limit_offset_view_low_stock_watch LIMIT 3 OFFSET 6") as SelectStatement + ?? throw new InvalidOperationException("Expected SELECT statement."); + + await using var result = await planner.ExecuteAsync(statement, ct); + Assert.True(UsesDirectBatchStorage(result)); + + var rootOperator = Assert.IsType(GetRootOperator(result)); + var offsetOperator = Assert.IsType(GetPrivateField(rootOperator, "_source")); + var lookupJoin = FindOperatorInUnaryChain( + GetPrivateField(offsetOperator, "_source")); + Assert.IsAssignableFrom(lookupJoin); + Assert.Null(FindOperatorInTree(rootOperator)); + + var rows = await result.ToListAsync(ct); + Assert.Equal(3, rows.Count); + Assert.Equal(7L, rows[0][0].AsInteger); + Assert.Equal("wh_7", rows[0][1].AsText); + Assert.Equal("sku_7", rows[0][2].AsText); + } + + [Fact] + public async Task PurchaseOrderLineJoinChainWithLimit_ChoosesStreamingLookupOrder() + { + var ct = TestContext.Current.CancellationToken; + await CreatePurchaseOrderJoinShapeAsync("limit_po", ct); + + var planner = GetPlanner(); + var statement = Parser.Parse( + """ + SELECT po.id AS purchase_order_id, po.po_number, s.name AS supplier_name, w.warehouse_code, + p.sku, p.name AS product_name, pol.ordered_qty, pol.received_qty + FROM limit_po_purchase_orders po + INNER JOIN limit_po_suppliers s ON s.id = po.supplier_id + INNER JOIN limit_po_warehouses w ON w.id = po.warehouse_id + INNER JOIN limit_po_purchase_order_lines pol ON pol.purchase_order_id = po.id + INNER JOIN limit_po_products p ON p.id = pol.product_id + LIMIT 3 + """) as SelectStatement + ?? throw new InvalidOperationException("Expected SELECT statement."); + + await using var result = await planner.ExecuteAsync(statement, ct); + Assert.True(UsesDirectBatchStorage(result)); + + var rootOperator = Assert.IsType(GetRootOperator(result)); + var lookupJoin = FindOperatorInUnaryChain( + GetPrivateField(rootOperator, "_source")); + Assert.IsAssignableFrom(lookupJoin); + Assert.Null(FindOperatorInTree(rootOperator)); + + var rows = await result.ToListAsync(ct); + Assert.Equal(3, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal("PO-1", rows[0][1].AsText); + Assert.Equal("supplier_1", rows[0][2].AsText); + Assert.Equal("WH-1", rows[0][3].AsText); + Assert.Equal("sku_1", rows[0][4].AsText); + } + + [Fact] + public async Task SimpleViewLateUnindexedDetailJoinWithLimit_ChoosesStreamingLookupOrder() + { + var ct = TestContext.Current.CancellationToken; + await CreateShipmentManifestJoinShapeAsync("limit_manifest", ct); + + var planner = GetPlanner(); + var statement = Parser.Parse("SELECT * FROM limit_manifest_shipment_manifest_report_source LIMIT 3 OFFSET 6") as SelectStatement + ?? throw new InvalidOperationException("Expected SELECT statement."); + + await using var result = await planner.ExecuteAsync(statement, ct); + Assert.True(UsesDirectBatchStorage(result)); + + var rootOperator = Assert.IsType(GetRootOperator(result)); + var offsetOperator = Assert.IsType(GetPrivateField(rootOperator, "_source")); + var lookupJoin = FindOperatorInUnaryChain( + GetPrivateField(offsetOperator, "_source")); + Assert.IsAssignableFrom(lookupJoin); + Assert.Null(FindOperatorInTree(rootOperator)); + + var rows = await result.ToListAsync(ct); + Assert.Equal(3, rows.Count); + Assert.Equal(7L, rows[0][0].AsInteger); + Assert.Equal("SHIP-7", rows[0][1].AsText); + Assert.Equal("carrier_7", rows[0][5].AsText); + Assert.Equal("sku_7", rows[0][9].AsText); + } + [Fact] public async Task IndexNestedLoopJoinedColumnProjection_UsesBatchJoinOperatorAfterLeafPushdown() { @@ -5468,6 +5621,76 @@ public async Task Join_WithWhere() Assert.Equal(2, rows.Count); // Alice+Widget, Bob+Widget } + [Fact] + public async Task Join_WithWhereOnRightPrimaryKeyLookupSide_AppliesPredicate() + { + var ct = TestContext.Current.CancellationToken; + await _db.ExecuteAsync("CREATE TABLE lookup_orders (id INTEGER PRIMARY KEY, customer_id INTEGER, order_number TEXT)", ct); + await _db.ExecuteAsync("CREATE TABLE lookup_customers (id INTEGER PRIMARY KEY, customer_code TEXT)", ct); + + await _db.ExecuteAsync("INSERT INTO lookup_customers VALUES (1, 'TARGET')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_customers VALUES (2, 'OTHER')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_orders VALUES (10, 2, 'SO-OTHER-1')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_orders VALUES (11, 1, 'SO-TARGET')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_orders VALUES (12, 2, 'SO-OTHER-2')", ct); + + await using var result = await _db.ExecuteAsync( + """ + SELECT o.id, c.customer_code + FROM lookup_orders AS o + JOIN lookup_customers AS c ON c.id = o.customer_id + WHERE c.customer_code = 'TARGET' + ORDER BY o.id + """, + ct); + + var rows = await result.ToListAsync(ct); + var row = Assert.Single(rows); + Assert.Equal(11L, row[0].AsInteger); + Assert.Equal("TARGET", row[1].AsText); + } + + [Fact] + public async Task Join_WithUniqueTextFilterAndIndexedDependentSide_UsesLookupJoins() + { + var ct = TestContext.Current.CancellationToken; + await _db.ExecuteAsync("CREATE TABLE lookup_plan_orders (id INTEGER PRIMARY KEY, customer_id INTEGER, warehouse_id INTEGER)", ct); + await _db.ExecuteAsync("CREATE TABLE lookup_plan_customers (id INTEGER PRIMARY KEY, customer_code TEXT)", ct); + await _db.ExecuteAsync("CREATE TABLE lookup_plan_warehouses (id INTEGER PRIMARY KEY, warehouse_code TEXT)", ct); + await _db.ExecuteAsync("CREATE UNIQUE INDEX idx_lookup_plan_customers_code ON lookup_plan_customers(customer_code)", ct); + await _db.ExecuteAsync("CREATE INDEX idx_lookup_plan_orders_customer ON lookup_plan_orders(customer_id)", ct); + + await _db.ExecuteAsync("INSERT INTO lookup_plan_customers VALUES (1, 'TARGET')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_customers VALUES (2, 'OTHER-1')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_customers VALUES (3, 'OTHER-2')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_warehouses VALUES (10, 'WH-A')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_warehouses VALUES (11, 'WH-B')", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_orders VALUES (100, 2, 10)", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_orders VALUES (101, 1, 11)", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_orders VALUES (102, 3, 10)", ct); + await _db.ExecuteAsync("INSERT INTO lookup_plan_orders VALUES (103, 2, 11)", ct); + + await using var result = await _db.ExecuteAsync( + """ + SELECT o.id, c.customer_code, w.warehouse_code + FROM lookup_plan_orders AS o + JOIN lookup_plan_customers AS c ON c.id = o.customer_id + JOIN lookup_plan_warehouses AS w ON w.id = o.warehouse_id + WHERE c.customer_code = 'TARGET' + """, + ct); + + var root = GetRootOperator(result); + Assert.NotNull(FindOperatorInTree(root)); + Assert.NotNull(FindOperatorInTree(root)); + Assert.Null(FindOperatorInTree(root)); + + var row = Assert.Single(await result.ToListAsync(ct)); + Assert.Equal(101L, row[0].AsInteger); + Assert.Equal("TARGET", row[1].AsText); + Assert.Equal("WH-B", row[2].AsText); + } + [Fact] public async Task Join_WithResidualOnPredicate() { @@ -8058,6 +8281,148 @@ private async Task CreateInnerJoinChainAsync(int joinCount, bool createLookupInd } } + private async Task CreateLowStockJoinShapeAsync(string prefix, bool createView, CancellationToken ct) + { + string inventory = $"{prefix}_inventory_positions"; + string warehouses = $"{prefix}_warehouses"; + string products = $"{prefix}_products"; + string suppliers = $"{prefix}_suppliers"; + + await _db.ExecuteAsync($"CREATE TABLE {warehouses} (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {suppliers} (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync( + $"CREATE TABLE {products} (id INTEGER PRIMARY KEY, sku TEXT NOT NULL, preferred_supplier_id INTEGER NOT NULL)", + ct); + await _db.ExecuteAsync( + $"CREATE TABLE {inventory} (id INTEGER PRIMARY KEY, warehouse_id INTEGER NOT NULL, product_id INTEGER NOT NULL)", + ct); + + for (int i = 1; i <= 32; i++) + { + await _db.ExecuteAsync($"INSERT INTO {warehouses} VALUES ({i}, 'wh_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {suppliers} VALUES ({i}, 'supplier_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {products} VALUES ({i}, 'sku_{i}', {i})", ct); + await _db.ExecuteAsync($"INSERT INTO {inventory} VALUES ({i}, {i}, {i})", ct); + } + + await _db.ExecuteAsync($"ANALYZE {warehouses}", ct); + await _db.ExecuteAsync($"ANALYZE {suppliers}", ct); + await _db.ExecuteAsync($"ANALYZE {products}", ct); + await _db.ExecuteAsync($"ANALYZE {inventory}", ct); + + if (createView) + { + await _db.ExecuteAsync( + $""" + CREATE VIEW {prefix}_low_stock_watch AS + SELECT ip.id AS inventory_position_id, w.name AS warehouse_name, p.sku, s.name AS supplier_name + FROM {inventory} ip + INNER JOIN {warehouses} w ON w.id = ip.warehouse_id + INNER JOIN {products} p ON p.id = ip.product_id + INNER JOIN {suppliers} s ON s.id = p.preferred_supplier_id + """, + ct); + } + } + + private async Task CreatePurchaseOrderJoinShapeAsync(string prefix, CancellationToken ct) + { + string purchaseOrders = $"{prefix}_purchase_orders"; + string purchaseOrderLines = $"{prefix}_purchase_order_lines"; + string suppliers = $"{prefix}_suppliers"; + string warehouses = $"{prefix}_warehouses"; + string products = $"{prefix}_products"; + + await _db.ExecuteAsync($"CREATE TABLE {suppliers} (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {warehouses} (id INTEGER PRIMARY KEY, warehouse_code TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {products} (id INTEGER PRIMARY KEY, sku TEXT NOT NULL, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync( + $"CREATE TABLE {purchaseOrders} (id INTEGER PRIMARY KEY, po_number TEXT NOT NULL, supplier_id INTEGER NOT NULL, warehouse_id INTEGER NOT NULL)", + ct); + await _db.ExecuteAsync( + $"CREATE TABLE {purchaseOrderLines} (id INTEGER PRIMARY KEY, purchase_order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, ordered_qty INTEGER NOT NULL, received_qty INTEGER NOT NULL, unit_cost REAL NOT NULL)", + ct); + await _db.ExecuteAsync( + $"CREATE INDEX idx_{prefix}_purchase_order_lines_order_product ON {purchaseOrderLines}(purchase_order_id, product_id)", + ct); + + for (int i = 1; i <= 32; i++) + { + await _db.ExecuteAsync($"INSERT INTO {suppliers} VALUES ({i}, 'supplier_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {warehouses} VALUES ({i}, 'WH-{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {products} VALUES ({i}, 'sku_{i}', 'product_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {purchaseOrders} VALUES ({i}, 'PO-{i}', {i}, {i})", ct); + await _db.ExecuteAsync($"INSERT INTO {purchaseOrderLines} VALUES ({i}, {i}, {i}, {10 + i}, {i % 3}, {1.5 + i})", ct); + } + + await _db.ExecuteAsync($"ANALYZE {suppliers}", ct); + await _db.ExecuteAsync($"ANALYZE {warehouses}", ct); + await _db.ExecuteAsync($"ANALYZE {products}", ct); + await _db.ExecuteAsync($"ANALYZE {purchaseOrders}", ct); + await _db.ExecuteAsync($"ANALYZE {purchaseOrderLines}", ct); + } + + private async Task CreateShipmentManifestJoinShapeAsync(string prefix, CancellationToken ct) + { + string shipments = $"{prefix}_shipments"; + string shipmentLines = $"{prefix}_shipment_lines"; + string carriers = $"{prefix}_carriers"; + string orders = $"{prefix}_orders"; + string customers = $"{prefix}_customers"; + string warehouses = $"{prefix}_warehouses"; + string products = $"{prefix}_products"; + + await _db.ExecuteAsync($"CREATE TABLE {carriers} (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {customers} (id INTEGER PRIMARY KEY, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {warehouses} (id INTEGER PRIMARY KEY, warehouse_code TEXT NOT NULL)", ct); + await _db.ExecuteAsync($"CREATE TABLE {products} (id INTEGER PRIMARY KEY, sku TEXT NOT NULL, name TEXT NOT NULL)", ct); + await _db.ExecuteAsync( + $"CREATE TABLE {orders} (id INTEGER PRIMARY KEY, order_number TEXT NOT NULL, customer_id INTEGER NOT NULL)", + ct); + await _db.ExecuteAsync( + $"CREATE TABLE {shipments} (id INTEGER PRIMARY KEY, shipment_number TEXT NOT NULL, status TEXT NOT NULL, shipped_date TEXT NOT NULL, tracking_number TEXT NOT NULL, carrier_id INTEGER NOT NULL, order_id INTEGER NOT NULL, warehouse_id INTEGER NOT NULL)", + ct); + await _db.ExecuteAsync( + $"CREATE TABLE {shipmentLines} (id INTEGER PRIMARY KEY, shipment_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity_shipped INTEGER NOT NULL, line_total REAL NOT NULL)", + ct); + + for (int i = 1; i <= 32; i++) + { + await _db.ExecuteAsync($"INSERT INTO {carriers} VALUES ({i}, 'carrier_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {customers} VALUES ({i}, 'customer_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {warehouses} VALUES ({i}, 'WH-{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {products} VALUES ({i}, 'sku_{i}', 'product_{i}')", ct); + await _db.ExecuteAsync($"INSERT INTO {orders} VALUES ({i}, 'ORD-{i}', {i})", ct); + await _db.ExecuteAsync($"INSERT INTO {shipments} VALUES ({i}, 'SHIP-{i}', 'shipped', '2026-01-{(i % 28) + 1}', 'TRK-{i}', {i}, {i}, {i})", ct); + await _db.ExecuteAsync($"INSERT INTO {shipmentLines} VALUES ({i}, {i}, {i}, {i + 1}, {10.5 + i})", ct); + } + + await _db.ExecuteAsync($"ANALYZE {carriers}", ct); + await _db.ExecuteAsync($"ANALYZE {customers}", ct); + await _db.ExecuteAsync($"ANALYZE {warehouses}", ct); + await _db.ExecuteAsync($"ANALYZE {products}", ct); + await _db.ExecuteAsync($"ANALYZE {orders}", ct); + await _db.ExecuteAsync($"ANALYZE {shipments}", ct); + await _db.ExecuteAsync($"ANALYZE {shipmentLines}", ct); + + await _db.ExecuteAsync( + $""" + CREATE VIEW {prefix}_shipment_manifest_report_source AS + SELECT sh.id AS shipment_id, sh.shipment_number, sh.status AS shipment_status, + sh.shipped_date, sh.tracking_number, c2.name AS carrier_name, + o.order_number, cu.name AS customer_name, w.warehouse_code, + p.sku, p.name AS product_name, sl.quantity_shipped, sl.line_total + FROM {shipments} sh + INNER JOIN {carriers} c2 ON c2.id = sh.carrier_id + INNER JOIN {orders} o ON o.id = sh.order_id + INNER JOIN {customers} cu ON cu.id = o.customer_id + INNER JOIN {warehouses} w ON w.id = sh.warehouse_id + INNER JOIN {shipmentLines} sl ON sl.shipment_id = sh.id + INNER JOIN {products} p ON p.id = sl.product_id + """, + ct); + } + private static string BuildInnerJoinChainQuery(int joinCount, string whereClause) { var sql = new StringBuilder($"SELECT t1.id, t{joinCount}.lookup_key FROM t1"); diff --git a/tests/CSharpDB.Tests/ParserTests.cs b/tests/CSharpDB.Tests/ParserTests.cs index f08a2352..22120d9b 100644 --- a/tests/CSharpDB.Tests/ParserTests.cs +++ b/tests/CSharpDB.Tests/ParserTests.cs @@ -360,6 +360,41 @@ public void Parse_AnalyzeSingleTable() Assert.Equal("sys.table_stats", analyze.TableName); } + [Fact] + public void Parse_ExplainEstimateSelect() + { + var stmt = Parser.Parse("EXPLAIN ESTIMATE FOR SELECT * FROM users WHERE id = 1;"); + var explain = Assert.IsType(stmt); + var select = Assert.IsType(explain.Target); + Assert.IsType(select.Where); + } + + [Fact] + public void Parse_ExplainEstimateWithQuery() + { + var stmt = Parser.Parse("EXPLAIN ESTIMATE FOR WITH c AS (SELECT 1) SELECT * FROM c;"); + var explain = Assert.IsType(stmt); + var with = Assert.IsType(explain.Target); + Assert.Single(with.Ctes); + } + + [Fact] + public void Parse_ExplainEstimateCompoundSelect() + { + var stmt = Parser.Parse("EXPLAIN ESTIMATE FOR SELECT 1 UNION SELECT 2;"); + var explain = Assert.IsType(stmt); + Assert.IsType(explain.Target); + } + + [Fact] + public void Parse_ExplainEstimateRejectsMutationTarget() + { + var ex = Assert.Throws(() => + Parser.Parse("EXPLAIN ESTIMATE FOR INSERT INTO users VALUES (1);")); + + Assert.Contains("supports SELECT", ex.Message); + } + [Fact] public void Parse_ComplexExpression() { diff --git a/tests/CSharpDB.Tests/PlannerStatisticsTests.cs b/tests/CSharpDB.Tests/PlannerStatisticsTests.cs index 0a6eb617..1c285f09 100644 --- a/tests/CSharpDB.Tests/PlannerStatisticsTests.cs +++ b/tests/CSharpDB.Tests/PlannerStatisticsTests.cs @@ -216,6 +216,52 @@ public async Task FilteredRowEstimate_UsesCompositePrefixStatisticsForCorrelated Assert.Equal(500, estimatedRows); } + [Fact] + public async Task ExplainEstimate_ReportsHistogramHeavyHitterAndCompositePrefixSources() + { + var ct = TestContext.Current.CancellationToken; + await SetupSkewedEqualityTableAsync(ct); + await SetupHistogramRangeTableAsync(ct); + await SetupCompositePrefixCorrelationTableAsync(ct); + await _db.ExecuteAsync("ANALYZE planner_skew", ct); + await _db.ExecuteAsync("ANALYZE planner_hist", ct); + await _db.ExecuteAsync("ANALYZE planner_corr", ct); + + await using var heavy = await _db.ExecuteAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_skew WHERE hot_code = 1", + ct); + var heavyRows = await heavy.ToListAsync(ct); + Assert.Contains(heavyRows, row => row[4].AsText == "heavy-hitter" && row[7].AsText == "sys.planner_heavy_hitters"); + + await using var histogram = await _db.ExecuteAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_hist WHERE value BETWEEN 1 AND 10", + ct); + var histogramRows = await histogram.ToListAsync(ct); + Assert.Contains(histogramRows, row => row[4].AsText == "histogram-range" && row[7].AsText == "sys.planner_histograms"); + + await using var composite = await _db.ExecuteAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_corr WHERE region = 'East' AND city = 'EastCity'", + ct); + var compositeRows = await composite.ToListAsync(ct); + Assert.Contains(compositeRows, row => row[4].AsText == "composite-prefix-filter" && row[7].AsText == "sys.planner_index_prefix_stats"); + } + + [Fact] + public async Task ExplainEstimate_ReportsStaleStatsAsIgnored() + { + var ct = TestContext.Current.CancellationToken; + await SetupSkewedEqualityTableAsync(ct); + await _db.ExecuteAsync("ANALYZE planner_skew", ct); + await _db.ExecuteAsync("INSERT INTO planner_skew VALUES (1001, 1)", ct); + + await using var explain = await _db.ExecuteAsync( + "EXPLAIN ESTIMATE FOR SELECT * FROM planner_skew WHERE hot_code = 1", + ct); + var rows = await explain.ToListAsync(ct); + + Assert.Contains(rows, row => row[4].AsText == "ignored-stale-stats" && row[8].AsText == "stale-ignored"); + } + [Fact] public async Task FreshColumnStats_NonUniqueJoin_PrefersIndexLookupWhenExpectedMatchesAreLow() { diff --git a/tests/CSharpDB.Tests/SqlScriptSplitterTests.cs b/tests/CSharpDB.Tests/SqlScriptSplitterTests.cs index 42864560..9fb8d790 100644 --- a/tests/CSharpDB.Tests/SqlScriptSplitterTests.cs +++ b/tests/CSharpDB.Tests/SqlScriptSplitterTests.cs @@ -96,6 +96,7 @@ public void SplitExecutableStatements_IgnoresTrailingCommentOnlyRemainder() [Theory] [InlineData("SELECT * FROM t;", true)] + [InlineData("EXPLAIN ESTIMATE FOR SELECT * FROM t;", true)] [InlineData("INSERT INTO t VALUES (1);", false)] [InlineData("UPDATE t SET n = 1;", false)] [InlineData("DELETE FROM t WHERE id = 1;", false)] diff --git a/tests/CSharpDB.Tests/SystemCatalogTests.cs b/tests/CSharpDB.Tests/SystemCatalogTests.cs index 3faf52be..a914ab3e 100644 --- a/tests/CSharpDB.Tests/SystemCatalogTests.cs +++ b/tests/CSharpDB.Tests/SystemCatalogTests.cs @@ -526,6 +526,115 @@ ORDER BY ordinal_position Assert.Equal(0L, rows[2][6].AsInteger); } + [Fact] + public async Task SystemCatalog_PlannerStats_ExposeHistogramsHeavyHittersAndPrefixes() + { + var ct = TestContext.Current.CancellationToken; + + await _db.ExecuteAsync("CREATE TABLE planner_public_stats (id INTEGER PRIMARY KEY, region TEXT, city TEXT, score INTEGER)", ct); + await _db.ExecuteAsync("CREATE INDEX idx_public_region_city ON planner_public_stats(region, city)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (1, 'East', 'Seattle', 10)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (2, 'East', 'Seattle', 11)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (3, 'East', 'Bellevue', 12)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (4, 'West', 'Portland', 100)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (5, 'West', 'Portland', 101)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (6, 'West', 'Salem', 102)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (7, 'West', 'Salem', 103)", ct); + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (8, 'West', 'Salem', 104)", ct); + await _db.ExecuteAsync("ANALYZE planner_public_stats", ct); + + await using (var histogramCount = await _db.ExecuteAsync( + """ + SELECT COUNT(*) + FROM sys.planner_histograms + WHERE table_name = 'planner_public_stats' AND column_name = 'score' AND is_stale = 0 + """, + ct)) + { + Assert.True(Assert.Single(await histogramCount.ToListAsync(ct))[0].AsInteger > 0); + } + + await using (var heavyHitters = await _db.ExecuteAsync( + """ + SELECT value, row_count, frequency_ppm, is_stale + FROM sys.planner_heavy_hitters + WHERE table_name = 'planner_public_stats' AND column_name = 'region' + ORDER BY row_count DESC + LIMIT 1 + """, + ct)) + { + var row = Assert.Single(await heavyHitters.ToListAsync(ct)); + Assert.Equal("West", row[0].AsText); + Assert.Equal(5L, row[1].AsInteger); + Assert.Equal(625000L, row[2].AsInteger); + Assert.Equal(0L, row[3].AsInteger); + } + + await using (var prefixStats = await _db.ExecuteAsync( + """ + SELECT prefix_length, prefix_columns, distinct_count, table_row_count, is_stale + FROM sys_planner_index_prefix_stats + WHERE index_name = 'idx_public_region_city' + ORDER BY prefix_length + """, + ct)) + { + var rows = await prefixStats.ToListAsync(ct); + Assert.Equal(2, rows.Count); + Assert.Equal(1L, rows[0][0].AsInteger); + Assert.Equal("region", rows[0][1].AsText); + Assert.Equal(2L, rows[0][2].AsInteger); + Assert.Equal(8L, rows[0][3].AsInteger); + Assert.Equal(0L, rows[0][4].AsInteger); + Assert.Equal(2L, rows[1][0].AsInteger); + Assert.Equal("region,city", rows[1][1].AsText); + Assert.Equal(4L, rows[1][2].AsInteger); + } + + await _db.DisposeAsync(); + _db = await Database.OpenAsync(_dbPath, ct); + + await using (var persistedCount = await _db.ExecuteAsync( + "SELECT COUNT(*) FROM sys.planner_histograms WHERE table_name = 'planner_public_stats'", + ct)) + { + Assert.True(Assert.Single(await persistedCount.ToListAsync(ct))[0].AsInteger > 0); + } + + await _db.ExecuteAsync("INSERT INTO planner_public_stats VALUES (9, 'West', 'Salem', 105)", ct); + + await using (var staleRows = await _db.ExecuteAsync( + """ + SELECT + (SELECT is_stale FROM sys.planner_histograms WHERE table_name = 'planner_public_stats' LIMIT 1), + (SELECT is_stale FROM sys.planner_heavy_hitters WHERE table_name = 'planner_public_stats' LIMIT 1), + (SELECT is_stale FROM sys.planner_index_prefix_stats WHERE table_name = 'planner_public_stats' LIMIT 1) + """, + ct)) + { + var row = Assert.Single(await staleRows.ToListAsync(ct)); + Assert.Equal(1L, row[0].AsInteger); + Assert.Equal(1L, row[1].AsInteger); + Assert.Equal(1L, row[2].AsInteger); + } + + await _db.ExecuteAsync("DROP TABLE planner_public_stats", ct); + + await using var afterDrop = await _db.ExecuteAsync( + """ + SELECT + (SELECT COUNT(*) FROM sys.planner_histograms WHERE table_name = 'planner_public_stats'), + (SELECT COUNT(*) FROM sys.planner_heavy_hitters WHERE table_name = 'planner_public_stats'), + (SELECT COUNT(*) FROM sys.planner_index_prefix_stats WHERE table_name = 'planner_public_stats') + """, + ct); + var afterDropRow = Assert.Single(await afterDrop.ToListAsync(ct)); + Assert.Equal(0L, afterDropRow[0].AsInteger); + Assert.Equal(0L, afterDropRow[1].AsInteger); + Assert.Equal(0L, afterDropRow[2].AsInteger); + } + [Fact] public async Task SystemCatalog_ColumnStats_BecomeStaleOnWriteAndRollbackRestoresFreshState() { diff --git a/www/roadmap-reference.html b/www/roadmap-reference.html index c1dd0002..a1b5c01a 100644 --- a/www/roadmap-reference.html +++ b/www/roadmap-reference.html @@ -337,8 +337,18 @@

Long-Term

Source-generated collection fast path - In progress: GetGeneratedCollectionAsync<T>(...), generated field descriptors/index bindings, analyzer-packaged collection model/codecs, trim/NativeAOT smoke coverage, and a dedicated sample are now in place while broader package ergonomics and remaining generator coverage continue - In Progress + Done for the current phase: opt-in generated collection models now provide GetGeneratedCollectionAsync<T>(...), generated field descriptors/index bindings, analyzer-packaged collection model/codecs, generated binary direct-payload encode/decode for supported document graphs, source-generated JSON fallback for unsupported shapes, trim/NativeAOT smoke coverage, and a dedicated sample + Done + + + Generated collection package ergonomics + Streamline NuGet/analyzer packaging, templates, onboarding docs, and generated-collection setup so consumers can adopt the opt-in path with less project wiring + Planned + + + Broader generated collection model coverage + Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes; unsupported shapes currently warn and fall back to source-generated JSON instead of binary direct payloads + Planned Page-level compression @@ -352,13 +362,23 @@

Long-Term

Advanced cost-based query optimizer - In progress: phase-2 stats-guided costing is now in place through internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, and bounded DP reordering for small inner-join chains; adaptive re-optimization and public histogram inspection remain future work - In Progress + Done for the current phase: ANALYZE-driven stats-guided costing now uses internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP reordering for small inner-join chains + Done + + + Adaptive query re-optimization + Re-plan or adapt at runtime when observed cardinality diverges materially from persisted statistics, especially after stale stats or parameter-sensitive predicates + Planned + + + Public planner histogram inspection + Expose histogram, heavy-hitter, composite-prefix, and estimate explanation diagnostics through a public surface; current histogram/prefix stats remain internal implementation details + Planned Async I/O batching - In progress: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, and reusable B-tree copy utilities now cover the main storage and maintenance write paths; remaining auditing is outside the WAL hot path - In Progress + Done for the current phase: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit now cover the main storage and maintenance write paths; future work is limited to specialized diagnostics or maintenance-path tuning + Done Low-latency durable writes @@ -438,7 +458,7 @@

Current Limitations

Collections - FindByIndexAsync supports declared field-equality lookups; FindByPathAsync and FindByPathRangeAsync support path-based queries on indexed paths; FindAsync remains a full scan for unindexed predicates + FindByIndexAsync supports declared field-equality lookups; FindByPathAsync and FindByPathRangeAsync support path-based queries on indexed paths; FindAsync remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads Networking @@ -530,7 +550,7 @@

Completed Milestones

  • Binary direct-payload collection storage with direct hydration and field/path extraction
  • Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text
  • Collection path query APIs: FindByPathAsync and FindByPathRangeAsync
  • -
  • Source-generated typed collection fast path foundations: generated collection models/codecs/field descriptors, trim-safe GetGeneratedCollectionAsync<T>(...), generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample
  • +
  • Source-generated typed collection fast path: generated collection models/codecs/field descriptors, generated binary direct payloads for supported shapes, trim-safe GetGeneratedCollectionAsync<T>(...), generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample
  • Full-text search with tokenization, stemming, and relevance ranking
  • Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache
  • Client-wide BackupAsync / RestoreAsync across direct, HTTP, gRPC, CLI, and Admin
  • diff --git a/www/roadmap.html b/www/roadmap.html index a2863c36..5ca9612d 100644 --- a/www/roadmap.html +++ b/www/roadmap.html @@ -150,7 +150,7 @@

    Source-Generated Collections

    Done
    -

    No-reflection, trim-safe typed collection API via CSharpDB.Generators with GetGeneratedCollectionAsync<T>(), GeneratedCollection<T>, generated field metadata, and NativeAOT-friendly model registration.

    +

    No-reflection, trim-safe typed collection API via CSharpDB.Generators with GetGeneratedCollectionAsync<T>(), GeneratedCollection<T>, generated field metadata, binary direct payloads for supported shapes, and NativeAOT-friendly model registration.

    @@ -391,7 +391,21 @@

    Full-Text Search

    Source-Generated Collections

    Done
    -

    No-reflection, trim-safe typed collection API via CSharpDB.Generators with generated field metadata and NativeAOT-friendly model registration.

    +

    Current phase is complete: opt-in generated models provide GetGeneratedCollectionAsync<T>, generated descriptors/index bindings, binary direct payloads for supported shapes, JSON fallback for unsupported shapes, and trim/NativeAOT smoke coverage.

    +
    +
    +
    +

    Generated Collection Package Ergonomics

    + Planned +
    +

    Streamline NuGet/analyzer packaging, templates, onboarding docs, and project setup for the opt-in generated collection path.

    +
    +
    +
    +

    Broader Generated Model Coverage

    + Planned +
    +

    Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes.

    @@ -417,16 +431,30 @@

    At-Rest Encryption

    Cost-Based Query Optimizer

    - In Progress + Done
    -

    Phase-2 stats-guided costing is in place: internal histograms, heavy hitters, composite-index prefix summaries, skew-aware estimates, correlation-aware filters/joins, and bounded DP join reordering. Adaptive re-optimization and public histogram inspection remain future work.

    +

    Current phase is complete: ANALYZE-driven stats-guided costing uses internal histograms, heavy hitters, composite-prefix summaries, skew-aware estimates, correlation-aware filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP join reordering.

    +
    +
    +
    +

    Adaptive Query Re-Optimization

    + Planned +
    +

    Re-plan or adapt when observed cardinality diverges materially from persisted statistics, especially after stale stats or parameter-sensitive predicates.

    +
    +
    +
    +

    Public Planner Histogram Inspection

    + Planned +
    +

    Expose histogram, heavy-hitter, composite-prefix, and estimate diagnostics through a public surface while keeping current stats internals private.

    Async I/O Batching

    - In Progress + Done
    -

    WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, and reusable B-tree copy utilities now cover the main storage and maintenance write paths; remaining auditing is outside the WAL hot path.

    +

    Current phase is complete: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit cover the main storage and maintenance write paths.