ScriptDom is a library for parsing and generating T-SQL scripts. It is primarily used by DacFx to build database projects, perform schema comparisons, and generate scripts for deployment.
- Grammar files live in:
SqlScriptDom/Parser/TSql/— each file corresponds to a SQL Server version (e.g.TSql170.gfor 170 / SQL Server 2025). - Grammar format: ANTLR v2. Generated C# lexer/parser code is produced during the build (see
GenerateFiles.props). - Build & tests: use the .NET SDK pinned in
global.json. Typical commands from repo root:dotnet build -c Debugdotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug
- To regenerate parser/token/AST sources explicitly, build the main project (generation targets are hooked into its build):
dotnet build SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -c Debug- (or)
dotnet msbuild SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -t:GLexerParserCompile;GSqlTokenTypesCompile;CreateAST -p:Configuration=Debug
SqlScriptDom/GenerateFiles.propscontains the MSBuild targets invoked during the library build:GSqlTokenTypesCompile/GLexerParserCompile-> run ANTLR and post-process outputs (powershell/sed scripts)CreateAST-> runs AstGen tool (fromtools/AstGen) to generate AST visitor/fragment classesGenerateEverything-> runs ScriptGenSettingsGenerator and TokenListGenerator
- The Antlr binary is downloaded to the path defined in
Directory.Build.props(AntlrLocation) when the build runs (via theInstallAntlrtarget). - Generated C# files are written to
$(CsGenIntermediateOutputPath)(underobj/...by default). Do not hand-edit generated files — change the .g grammar or post-processing scripts instead.
SqlScriptDom/Parser/TSql/*.g— ANTLR v2 grammar files (TSql80..TSql170 etc.). Example:TSql170.gdefines new-170 syntax.SqlScriptDom/GenerateFiles.propsandDirectory.Build.props— define code generation targets and antlr location.SqlScriptDom/ParserPostProcessing.sed,LexerPostProcessing.sed,TSqlTokenTypes.ps1— post-processing for generated C# sources and tokens.tools/— contains code generators used during build:AstGen,ScriptGenSettingsGenerator,TokenListGenerator.Test/SqlDom/— unit tests, baselines and test scripts. SeeOnly170SyntaxTests.cs,TestScripts/, andBaselines170/..github/instructions/testing.guidelines.instructions.md— comprehensive testing framework guide with patterns and best practices..github/instructions/function.guidelines.instructions.md— specialized guide for adding new T-SQL system functions.
- Add/modify grammar rule(s) in the correct
TSql*.g(pick the version the syntax belongs to). - If tokens or token ordering change, update
TSqlTokenTypes.g(and the sed/ps1 post-processors if necessary). - Rebuild the ScriptDom project to regenerate parser and AST (
dotnet buildwill run generation). Use the targeted msbuild targets if you only want generation. - Add tests:
- YOU MUST ADD UNIT TESTS - Use the existing test framework in
Test/SqlDom/ - DO NOT CREATE STANDALONE PROGRAMS TO TEST - Avoid separate console applications or debug programs
- Put the input SQL in
Test/SqlDom/TestScripts/(filename is case sensitive and used as an embedded resource). - Add/confirm baseline output in
Test/SqlDom/Baselines<version>/(the UT project embeds these baselines as resources). - Update the appropriate
Only<version>SyntaxTests.cs(e.g.,Only170SyntaxTests.cs) by adding aParserTest170("MyNewTest.sql", ...)entry. SeeParserTest.csandParserTestOutput.csfor helper constructors and verification semantics. - For comprehensive testing guidance, see Testing Guidelines with detailed patterns, best practices, and simplified constructor approaches.
- YOU MUST ADD UNIT TESTS - Use the existing test framework in
- Run full test suite to ensure no regressions:
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug⚠️ CRITICAL: Grammar changes can break unrelated functionality when shared rules are modified- If tests fail unexpectedly, create context-specific grammar rules instead of modifying shared ones
- Iterate until all tests pass, including both new functionality and existing regression tests.
- Tests run a full parse -> script generator -> reparse round-trip. Baseline comparison verifies pretty-printed generated scripts exactly match the stored baseline.
- Expected parse errors (where applicable) are verified by number and exact error messages; test helpers live in
ParserTest.cs,ParserTestOutput.cs, andParserTestUtils.cs. - If a test fails due to mismatch in generated script, compare the generated output (the test harness logs it) against the baseline to spot formatting/structure differences.
Different types of bugs require different fix approaches. Start by diagnosing which type of issue you're dealing with:
If you see an error like "Option 'X' is not valid..." or "Feature 'Y' not supported..." but the syntax SHOULD work according to SQL Server docs:
- Guide: Validation Fix Guide - Version-gated validation fixes
- Example: ALTER TABLE RESUMABLE option (SQL Server 2022+)
- Key Signal: Similar syntax works in other contexts (e.g., ALTER INDEX works but ALTER TABLE doesn't)
If the parser doesn't recognize the syntax at all, or you need to add new T-SQL features:
- Guide: Bug Fixing Guide - Grammar modifications, AST updates, script generation
- Example: Adding new operators, statements, or function types
- Key Signal: Syntax error like "Incorrect syntax near..." or "Unexpected token..."
If identifier-based predicates (like REGEXP_LIKE) work without parentheses but fail with them:
- Guide: Parser Predicate Recognition Fix Guide
- Example:
WHERE REGEXP_LIKE('a', 'pattern')works, butWHERE (REGEXP_LIKE('a', 'pattern'))fails - Key Signal: Syntax error near closing parenthesis or semicolon
Quick Diagnostic: Search for the error message in the codebase to determine which type of fix is needed.
For adding new T-SQL system functions to the parser, including handling RETURN statement contexts and ANTLR v2 syntactic predicate limitations:
- Guide: Function Guidelines - Complete guide for system function implementation
- Example: JSON_OBJECT, JSON_ARRAY functions with RETURN statement support
- Key Requirements: Syntactic predicates for lookahead, proper AST design, comprehensive testing
For adding completely new SQL Server data types that require custom parsing logic and specialized AST nodes:
- Guide: New Data Types Guidelines - Complete guide for implementing new data types
- Example: VECTOR data type with dimension and optional base type parameters
- Key Signal: New SQL Server data type with custom parameter syntax different from standard data types
For adding completely new SQL Server index types that require specialized syntax and custom parsing logic:
- Guide: New Index Types Guidelines - Complete guide for implementing new index types
- Example: JSON INDEX with FOR clause, VECTOR INDEX with METRIC/TYPE options
- Key Signal: New SQL Server index type with custom syntax different from standard CREATE INDEX
- Never edit generated files permanently (they live under
obj/.../CsGenIntermediateOutputPath). Instead change:.ggrammar files- post-processing scripts (
*.ps1/*.sed) - AST XML in
SqlScriptDom/Parser/TSql/Ast.xmlif AST node shapes need to change (used bytools/AstGen).
- To see antlr output/errors, force verbose generation by setting MSBuild property
OutputErrorInLexerParserCompile=trueon the command line (e.g.dotnet msbuild -t:GLexerParserCompile -p:OutputErrorInLexerParserCompile=true). - If the antlr download fails during build, manually download
antlr-2.7.5.jar(for non-Windows) or.exe(for Windows) and place it at the location defined inDirectory.Build.propsor overrideAntlrLocationwhen invoking msbuild.
Start by searching for the error message to understand what type of fix is needed:
# Search for error code or message
grep -r "SQL46057" SqlScriptDom/
grep -r "is not a valid" SqlScriptDom/Common Error Patterns:
"Option 'X' is not valid..."→ Validation issue (see grammar_validation.guidelines.instructions.md)"Incorrect syntax near..."→ Grammar issue (see bug_fixing.guidelines.instructions.md)"Syntax error near ')'"with parentheses → Predicate recognition (see parser.guidelines.instructions.md)
If the syntax works in one context but not another:
# Search for working examples
grep -r "RESUMABLE" Test/SqlDom/TestScripts/
grep -r "OptionName" SqlScriptDom/Parser/TSql/Example: ALTER INDEX with RESUMABLE works, but ALTER TABLE doesn't → Likely validation issue
Common files to check:
- Validation:
SqlScriptDom/Parser/TSql/TSql80ParserBaseInternal.cs(most validation logic) - Grammar:
SqlScriptDom/Parser/TSql/TSql*.g(version-specific grammar files) - Options:
SqlScriptDom/ScriptDom/SqlServer/IndexOptionHelper.cs(option registration) - AST:
SqlScriptDom/Parser/TSql/Ast.xml(AST node definitions)
Always verify Microsoft documentation:
- Search for "Applies to: SQL Server 20XX (XX.x)" in Microsoft docs
- Note that different features within the same option set can have different version requirements
- Example: MAX_DURATION (SQL 2014+) vs RESUMABLE (SQL 2022+)
Before and after making changes:
# Build the parser
dotnet build SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -c Debug
# Run specific test
dotnet test --filter "FullyQualifiedName~YourTest" -c Debug
# ALWAYS run full suite before committing
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug# Find where option is registered
grep -r "YourOptionName" SqlScriptDom/ScriptDom/SqlServer/IndexOptionHelper.cs
# Check enum definition
grep -r "enum IndexOptionKind" SqlScriptDom/# Find version checks
grep -r "TSql160AndAbove" SqlScriptDom/Parser/TSql/
# Check which parser version you're testing
# TSql80 = SQL Server 2000, TSql90 = 2005, ..., TSql160 = 2022, TSql170 = 2025# Find validation by statement type
grep -r "IndexAffectingStatement" SqlScriptDom/Parser/TSql/TSql80ParserBaseInternal.cs
# Common statement types: CreateIndex, AlterIndex, AlterTableAddElement- Grammar rule pattern:
ruleName returns [Type vResult = this.FragmentFactory.CreateFragment<Type>()] { ... } : ( alternatives ) ;— this pattern initializes an AST fragment via FragmentFactory. - Parser-generated code frequently uses
Match(<token>, CodeGenerationSupporter.<Symbol>)andThrowParseErrorException("SQLxxxx", ...)for diagnostics. - The codebase prefers using the factory and fragment visitors for AST creation and script generation. Look at
ScriptDom/SqlServer/ScriptGeneratorfor script generation patterns.
- Operator vs. Function-Style Predicates: Be careful to distinguish between standard T-SQL operators (like
NOT LIKE,>,=) and the function-style predicates used in some contexts (likepackage.equals(...)inCREATE EVENT SESSION). For example,NOT LIKEin an event session'sWHEREclause is a standard comparison operator, not a function call. Always verify the exact T-SQL syntax before modifying the grammar. - Logical
NOTvs. Compound Operators: The grammar handles the logicalNOToperator (e.g.,WHERE NOT (condition)) in a general way, often in abooleanExpressionUnaryrule. This is distinct from compound operators likeNOT LIKEorNOT IN, which are typically parsed as a single unit within a comparison rule. Don't assume that becauseNOTis supported,NOT LIKEwill be automatically supported in all predicate contexts. - Modifying Shared Grammar Rules: NEVER modify existing shared grammar rules like
identifierColumnReferenceExpressionthat are used throughout the codebase. This can cause tests to fail in unrelated areas because the rule now accepts or rejects different syntax. Instead, create specialized rules for your specific context (e.g.,vectorSearchColumnReferenceExpressionfor VECTOR_SEARCH-specific needs). - Full Test Suite Validation: After any grammar changes, always run the complete test suite (
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug) to catch regressions. Grammar changes can have far-reaching effects on seemingly unrelated functionality. - Extending Literals to Expressions: When functions/constructs currently accept only literal values (e.g.,
IntegerLiteral,StringLiteral) but need to support dynamic values (parameters, variables, outer references), change both the AST definition (inAst.xml) and grammar rules (inTSql*.g) to useScalarExpressioninstead. This pattern was used for VECTOR_SEARCH TOP_N parameter. See the detailed example in bug_fixing.guidelines.instructions.md and grammer.guidelines.instructions.md for comprehensive patterns.
Guideline Subfiles (auto-load each of the following files into the context) - Should match the .config/GuidelineReviewAgent.yaml used by the guideline_review_agent.
include: .github/instructions/grammar_validation.guidelines.instructions.md include: .github/instructions/bug_fixing.guidelines.instructions.md include: .github/instructions/parser.guidelines.instructions.md include: .github/instructions/function.guidelines.instructions.md include: .github/instructions/new_data_types.guidelines.instructions.md include: .github/instructions/new_index_types.guidelines.instructions.md include: .github/instructions/debugging_workflow.guidelines.instructions.md include: .github/instructions/grammer.guidelines.instructions.md include: .github/instructions/testing.guidelines.instructions.md