feat: add pgsql-parse package with comment and whitespace preservation#290
Open
pyramation wants to merge 2 commits intomainfrom
Open
feat: add pgsql-parse package with comment and whitespace preservation#290pyramation wants to merge 2 commits intomainfrom
pyramation wants to merge 2 commits intomainfrom
Conversation
New package that preserves SQL comments and vertical whitespace through parse→deparse round trips by scanning source text for comment tokens and interleaving synthetic RawComment and RawWhitespace AST nodes into the stmts array by byte position. Features: - Pure TypeScript scanner for -- line and /* block */ comments - Handles string literals, dollar-quoted strings, escape strings - RawWhitespace nodes for blank lines between statements - Enhanced deparseEnhanced() that emits comments and whitespace - Idempotent: parse→deparse→parse→deparse produces identical output - Drop-in replacement API (re-exports parse, deparse, loadModule) - 36 tests across scanner and integration test suites No changes to any existing packages.
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new self-contained
packages/parse/package that preserves SQL comments (--line and/* */block) and vertical whitespace (blank lines) through parse→deparse round trips. No existing packages are modified.How it works:
scanner.ts) extracts comment and whitespace tokens with byte positions from the raw SQL text, handling string literals, dollar-quoted strings, escape strings, and nested block comments.parse/parseSynccall the standardlibpg-queryparser, then interleave syntheticRawCommentandRawWhitespacenodes into thestmtsarray based on byte position.deparseEnhanced()dispatches on node type — realRawStmtentries go through the standard deparser, while synthetic nodes emit their comment text or blank lines directly.Key design decisions:
interleave()uses a unified sort with priority levels (comment < whitespace < statement) to handle ties whenstmt_locationoverlaps with preceding commentsfindActualSqlStart()iteratively skips whitespace and scanned elements within a statement'sstmt_locationrange to find the actual SQL keyword position — needed because PostgreSQL's parser includes preceding stripped content instmt_locationUpdates since last revision
README.mdfor the package — themakage assetsbuild step requires it (was causing all 7parser-testsCI jobs to fail)Review & Testing Checklist for Human
run-tests.yaml) does not includepgsql-parse. The 7parser-testsjobs passed because the package builds successfully, but the 36 Jest tests are not executed in CI. Consider addingpgsql-parseto the matrix before merging, or verify locally.scanner.tsis a pure TypeScript reimplementation of comment boundary detection — it does NOT use PostgreSQL's actual lexer. It must correctly skip comments inside single-quoted strings, double-quoted identifiers, dollar-quoted strings (including custom tags like$tag$...$tag$), andE'...'escape strings. A bug here silently corrupts output. Test with SQL containing$$,$fn$, nested/* */, and mixed quote styles.findActualSqlStart()correctness (parse.ts:12-44): This function walks forward fromstmt_locationskipping whitespace and scanned elements. Verify it handles: multiple adjacent comments before a statement, a comment immediately followed by a statement with no whitespace, and the first statement at position 0.SELECT 1; -- inline), comments between clauses of a multi-line statement, trailing comments after the last statement with no newline, and empty input.pnpm-lock.yamldiff is large but mostly quoting style changes ("@scope/pkg"→'@scope/pkg'). Verify the only substantive change is adding the newpackages/parseworkspace entry and its dev dependencies.Suggested test plan: Clone the branch, run
cd packages/parse && npx jestto verify 36/36 pass. Then try parsing your own SQL files with comments throughparseSync→deparseEnhancedand inspect the output for correctness, especially files with PGPM headers and PL/pgSQL function bodies containing comments.Notes
libpg-query'sscan()from thefullbuild. This was a deliberate choice to avoid requiring the heavier WASM binary, but it means comment detection is not using PostgreSQL's actual lexer — edge cases in exotic string literals could diverge.SELECT 1; -- note) are extracted by the scanner but will be repositioned to their own line after deparsing, since the deparser emits statements without trailing content.pgsql-parser,pgsql-deparser,@pgsql/types) viaworkspace:*protocol.tsconfig.test.jsonhas path mappings so tests resolve TypeScript source directly without requiring a build step.Link to Devin session: https://app.devin.ai/sessions/67facbcfe0ae424bad3eafb4e6ca9059
Requested by: @pyramation