Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
30d8ee2
refactor: rename mongo→data-api, MongoBSONTypes→BSONTypes + fix array…
tnaum-ms Feb 16, 2026
50dabfb
refactor: convert SchemaAnalyzer to class with addDocument/getSchema API
tnaum-ms Feb 16, 2026
bd70865
refactor: update JSONSchema interface with typed x- properties and fi…
tnaum-ms Feb 16, 2026
99c0234
feat: enhance FieldEntry with bsonType, bsonTypes, isOptional, arrayI…
tnaum-ms Feb 16, 2026
4891463
feat: add transformers (generateDescriptions, toTypeScriptDefinition,…
tnaum-ms Feb 16, 2026
7a54b8c
test: add SchemaAnalyzer class method tests + update plan checklist
tnaum-ms Feb 16, 2026
d4380f4
feat: add version-based caching to SchemaAnalyzer + trace logging
tnaum-ms Feb 16, 2026
633f0b4
Initial plan
Copilot Feb 17, 2026
74eeeac
refactor: remove debug console.log statements from tests
Copilot Feb 17, 2026
eb71916
refactor: remove debug console.log statements from SchemaAnalyzer tes…
tnaum-ms Feb 17, 2026
d8d0709
test: add comprehensive tests for SchemaAnalyzer versioning and cachi…
tnaum-ms Feb 17, 2026
ebdde30
refactor: remove console.log statements from test files for cleaner o…
tnaum-ms Feb 17, 2026
c23b604
refactor: enhance handling of special characters in field names for T…
tnaum-ms Feb 17, 2026
3ca6951
refactor: extract schema-analyzer into standalone npm workspace package
tnaum-ms Feb 17, 2026
2fec69d
docs: add README and bump schema-analyzer to v1.0.0
tnaum-ms Feb 17, 2026
a667c35
build: add prebuild and prejesttest scripts for workspace package builds
tnaum-ms Feb 17, 2026
cbaa573
chore: bump schema-analyzer version to 1.0.0 in package-lock.json
tnaum-ms Feb 17, 2026
f1d006d
Refactor code structure for improved readability and maintainability
tnaum-ms Feb 17, 2026
62a3e57
docs: add future work item for undeclared BSON type names in TS defin…
tnaum-ms Feb 17, 2026
4b3dd00
fix: toInterfaceName handles digit-leading and separator-only collect…
tnaum-ms Feb 17, 2026
37e64e5
docs: clarify boolean JSONSchemaRef safety in getKnownFields
tnaum-ms Feb 17, 2026
b17e9cc
fix: insertText escaping — use identifier check + escape quotes/backs…
tnaum-ms Feb 17, 2026
35a13a1
refactor: streamline TypeScript definition tests for improved readabi…
tnaum-ms Feb 17, 2026
1cb9e5b
docs: add terminology guidelines for DocumentDB and MongoDB API usage
tnaum-ms Feb 17, 2026
43915a5
refactor: replace 'console' assert with 'node:assert/strict' for impr…
tnaum-ms Feb 17, 2026
75536e9
refactor: update documentation to consistently reference DocumentDB A…
tnaum-ms Feb 17, 2026
5094ca6
refactor: SchemaAnalyzer class + enhanced FieldEntry + new schema tra…
tnaum-ms Feb 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GitHub Copilot Instructions for vscode-documentdb

VS Code Extension for Azure Cosmos DB and MongoDB. TypeScript (strict mode), React webviews, Jest testing.
VS Code Extension for Azure Cosmos DB and the MongoDB API. TypeScript (strict mode), React webviews, Jest testing.

## Critical Build Commands

Expand Down Expand Up @@ -178,6 +178,22 @@ For Discovery View, both `treeId` and `clusterId` are sanitized (all `/` replace

See `src/tree/models/BaseClusterModel.ts` and `docs/analysis/08-cluster-model-simplification-plan.md` for details.

## Terminology

This is a **DocumentDB** extension that uses the **MongoDB-compatible wire protocol**.

- Use **"DocumentDB"** when referring to the database service itself.
- Use **"MongoDB API"** or **"DocumentDB API"** when referring to the wire protocol, query language, or API compatibility layer.
- **Never use "MongoDB" alone** as a product name in code, comments, docs, or user-facing strings.

| ✅ Do | ❌ Don't |
| ---------------------------------------------------- | -------------------------------- |
| `// Query operators supported by the DocumentDB API` | `// MongoDB query operators` |
| `// BSON types per the MongoDB API spec` | `// Uses MongoDB's $match stage` |
| `documentdbQuery` (variable name) | `mongoQuery` |

This applies to: code comments, JSDoc/TSDoc, naming (prefer `documentdb` prefix), user-facing strings, docs, and test descriptions.

## Additional Patterns

For detailed patterns, see:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ jobs:
- name: 📦 Install Dependencies (npm ci)
run: npm ci --prefer-offline --no-audit --no-fund --progress=false --verbose

- name: 🔨 Build Workspace Packages
run: npm run build --workspaces --if-present

- name: 🌐 Check Localization Files
run: npm run l10n:check

Expand Down
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.

/docs/analysis/
/docs/plan/

# User-specific files
*.suo
*.user
Expand Down Expand Up @@ -157,6 +160,9 @@ PublishScripts/
**/packages/*
# except build/, which is used as an MSBuild target.
!**/packages/build/
# Include our monorepo packages at the root
!/packages/
!/packages/**
# Uncomment if necessary however generally it will be regenerated when needed
#!**/packages/repositories.config
# NuGet v3's project.json files produces more ignoreable files
Expand Down Expand Up @@ -268,6 +274,7 @@ dist
stats.json
*.tgz
*.zip
*.tsbuildinfo

# Scrapbooks
*.mongo
Expand Down
16 changes: 11 additions & 5 deletions jest.config.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
/** @type {import('ts-jest').JestConfigWithTsJest} **/
module.exports = {
testEnvironment: 'node',
testMatch: ['<rootDir>/src/**/*.test.ts'],
transform: {
'^.+.tsx?$': ['ts-jest', {}],
},
// Limit workers to avoid OOM kills on machines with many cores.
// Each ts-jest worker loads the TypeScript compiler and consumes ~500MB+.
maxWorkers: '50%',
projects: [
{
displayName: 'extension',
testEnvironment: 'node',
testMatch: ['<rootDir>/src/**/*.test.ts'],
transform: {
'^.+\\.tsx?$': ['ts-jest', {}],
},
},
'<rootDir>/packages/schema-analyzer',
],
};
1 change: 0 additions & 1 deletion l10n/bundle.l10n.json
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,6 @@
"No matching resources found.": "No matching resources found.",
"No node selected.": "No node selected.",
"No parent folder selected.": "No parent folder selected.",
"No properties found in the schema at path \"{0}\"": "No properties found in the schema at path \"{0}\"",
"No public connectivity": "No public connectivity",
"No result returned from the MongoDB shell.": "No result returned from the MongoDB shell.",
"No results found": "No results found",
Expand Down
19 changes: 19 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@
"type": "git",
"url": "https://github.com/microsoft/vscode-documentdb"
},
"workspaces": [
"packages/*"
],
"main": "./main",
"l10n": "./l10n",
"activationEvents": [
Expand All @@ -55,8 +58,9 @@
"onUri"
],
"scripts": {
"prebuild": "npm run build --workspaces --if-present",
"build": "tsc",
"clean": "git clean -dfx",
"clean": "rimraf out dist coverage && npm run clean --workspaces --if-present",
"compile": "tsc -watch",
"package": "run-script-os",
"package:win32": "npm run webpack-prod && cd dist && npm pkg delete \"scripts.vscode:prepublish\" && npx vsce package --no-dependencies --out ../%npm_package_name%-%npm_package_version%.vsix",
Expand All @@ -67,9 +71,10 @@
"lint": "eslint --quiet .",
"lint-fix": "eslint . --fix",
"prettier": "prettier -c \"(src|test|l10n|grammar|docs)/**/*.@(js|ts|jsx|tsx|json)\" \"./*.@(js|ts|jsx|tsx|json)\"",
"prettier-fix": "prettier -w \"(src|test|l10n|grammar|docs)/**/*.@(js|ts|jsx|tsx|json)\" \"./*.@(js|ts|jsx|tsx|json)\"",
"prettier-fix": "prettier -w \"(src|test|l10n|grammar|docs|packages)/**/*.@(js|ts|jsx|tsx|json)\" \"./*.@(js|ts|jsx|tsx|json)\"",
"pretest": "npm run build",
"test": "vscode-test",
"prejesttest": "npm run build --workspaces --if-present",
"jesttest": "jest",
"update-grammar": "antlr4ts -visitor ./grammar/mongo.g4 -o src/documentdb/grammar",
"webpack-dev": "rimraf ./dist && npm run webpack-dev-ext && npm run webpack-dev-wv",
Expand Down Expand Up @@ -165,6 +170,7 @@
"@trpc/client": "~11.10.0",
"@trpc/server": "~11.10.0",
"@vscode/l10n": "~0.0.18",
"@vscode-documentdb/schema-analyzer": "*",
"antlr4ts": "^0.5.0-alpha.4",
"bson": "~7.0.0",
"denque": "~2.1.0",
Expand Down
43 changes: 43 additions & 0 deletions packages/schema-analyzer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# @vscode-documentdb/schema-analyzer

Incremental JSON Schema analyzer for DocumentDB API and MongoDB API documents. Processes documents one at a time (or in batches) and produces an extended JSON Schema with statistical metadata — field occurrence counts, BSON type distributions, min/max values, and array length stats.

> **Note:** This package is not yet published to npm. We plan to publish it once the API stabilizes. For now, it is consumed internally via npm workspaces within the [vscode-documentdb](https://github.com/microsoft/vscode-documentdb) repository.

## Overview

The `SchemaAnalyzer` incrementally builds a JSON Schema by inspecting DocumentDB API / MongoDB API documents. It is designed for scenarios where documents arrive over time (streaming, pagination) and the schema needs to evolve as new documents are observed.

Key capabilities:

- **Incremental analysis** — add documents one at a time or in batches; the schema updates in place.
- **BSON type awareness** — recognizes BSON types defined by the MongoDB API (`ObjectId`, `Decimal128`, `Binary`, `UUID`, etc.) and annotates them with `x-bsonType`.
- **Statistical extensions** — tracks field occurrence (`x-occurrence`), type frequency (`x-typeOccurrence`), min/max values, string lengths, array sizes, and document counts (`x-documentsInspected`).
- **Known fields extraction** — derives a flat list of known field paths with their types and occurrence probabilities, useful for autocomplete and UI rendering.
- **Version tracking & caching** — a monotonic version counter enables efficient cache invalidation for derived data like `getKnownFields()`.

## Usage

```typescript
import { SchemaAnalyzer } from '@vscode-documentdb/schema-analyzer';

// Create an analyzer and feed it documents
const analyzer = new SchemaAnalyzer();
analyzer.addDocument(doc1);
analyzer.addDocuments([doc2, doc3, doc4]);

// Get the JSON Schema with statistical extensions
const schema = analyzer.getSchema();

// Get a flat list of known fields (cached, version-aware)
const fields = analyzer.getKnownFields();
```

## Requirements

- **Node.js** ≥ 18
- **mongodb** driver ≥ 6.0.0 (peer dependency)

## License

[MIT](../../LICENSE.md)
8 changes: 8 additions & 0 deletions packages/schema-analyzer/jest.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/** @type {import('ts-jest').JestConfigWithTsJest} **/
module.exports = {
testEnvironment: 'node',
testMatch: ['<rootDir>/test/**/*.test.ts'],
transform: {
'^.+\\.tsx?$': ['ts-jest', {}],
},
};
27 changes: 27 additions & 0 deletions packages/schema-analyzer/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"name": "@vscode-documentdb/schema-analyzer",
"version": "1.0.0",
"description": "Incremental JSON Schema analyzer for DocumentDB API / MongoDB API documents with statistical extensions",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"files": [
"dist"
],
"scripts": {
"build": "tsc -p .",
"clean": "rimraf dist tsconfig.tsbuildinfo",
"test": "jest --config jest.config.js"
},
"repository": {
"type": "git",
"url": "https://github.com/microsoft/vscode-documentdb",
"directory": "packages/schema-analyzer"
},
"license": "MIT",
"peerDependencies": {
"mongodb": ">=6.0.0"
},
"dependencies": {
"denque": "~2.1.0"
}
}
Loading