feat: add oq pipeline query language for OpenAPI schema graphs#177
feat: add oq pipeline query language for OpenAPI schema graphs#177vishalg0wda wants to merge 17 commits intomainfrom
Conversation
Implement a domain-specific pipeline query language (oq) that enables agents and humans to construct ad-hoc structural queries over OpenAPI documents. The query engine operates over a pre-computed directed graph materialized from openapi.Index. New packages: - graph/: SchemaGraph type with node/edge types, Build() constructor, reachability/ancestor traversal, and pre-computed metrics - oq/expr/: Predicate expression parser and evaluator supporting ==, !=, >, <, >=, <=, and, or, not, has(), matches() - oq/: Pipeline parser, AST, executor with source/traversal/filter stages, and table/JSON formatters New CLI command: openapi spec query <file> '<pipeline>' Example queries: schemas.components | sort depth desc | take 10 | select name, depth schemas | where union_width > 0 | sort union_width desc | take 10 schemas.components | where in_degree == 0 | select name operations | sort schema_count desc | take 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The cmd/openapi module needs a replace directive pointing to the root module so that go mod tidy can resolve the new graph/ and oq/ packages that aren't yet published. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use require.Error for error assertions and assert.Positive for count checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace fmt.Errorf with errors.New where no format args (perfsprint) - Convert if-else chain to switch statement (gocritic) - Use assert.Len and assert.Positive in tests (testifylint) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use t.Context() instead of context.Background() in tests - Replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf - Remove development replace directive from cmd/openapi/go.mod - Fix trailing newline for count results in table format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
oq/oq.go
Outdated
| func formatGroups(result *Result) string { | ||
| var sb strings.Builder | ||
| for _, g := range result.Groups { | ||
| fmt.Fprintf(&sb, "%s: count=%d", g.Key, g.Count) |
There was a problem hiding this comment.
🔴 AGENTS.md violation: sb.WriteString(fmt.Sprintf(...)) instead of fmt.Fprintf in formatGroups
AGENTS.md's staticcheck QF1012 rule states: "Use fmt.Fprintf(w, ...) instead of w.WriteString(fmt.Sprintf(...))." Line 812 uses sb.WriteString(fmt.Sprintf("%s: count=%d", g.Key, g.Count)) which should use fmt.Fprintf.
| fmt.Fprintf(&sb, "%s: count=%d", g.Key, g.Count) | |
| fmt.Fprintf(&sb, "%s: count=%d", g.Key, g.Count) |
Was this helpful? React with 👍 or 👎 to provide feedback.
oq/oq.go
Outdated
| names = names[:5] | ||
| names = append(names, "...") | ||
| } | ||
| fmt.Fprintf(&sb, " names=[%s]", strings.Join(names, ", ")) |
There was a problem hiding this comment.
🔴 AGENTS.md violation: sb.WriteString(fmt.Sprintf(...)) instead of fmt.Fprintf in formatGroups names
AGENTS.md's staticcheck QF1012 rule states: "Use fmt.Fprintf(w, ...) instead of w.WriteString(fmt.Sprintf(...))." Line 819 uses sb.WriteString(fmt.Sprintf(" names=[%s]", strings.Join(names, ", "))) which should use fmt.Fprintf.
| fmt.Fprintf(&sb, " names=[%s]", strings.Join(names, ", ")) | |
| fmt.Fprintf(&sb, " names=[%s]", strings.Join(names, ", ")) |
Was this helpful? React with 👍 or 👎 to provide feedback.
oq/oq.go
Outdated
| if i > 0 { | ||
| sb.WriteString(",\n") | ||
| } | ||
| fmt.Fprintf(&sb, ` {"key": %q, "count": %d, "names": [`, g.Key, g.Count) |
There was a problem hiding this comment.
🔴 AGENTS.md violation: sb.WriteString(fmt.Sprintf(...)) instead of fmt.Fprintf in formatGroupsJSON
AGENTS.md's staticcheck QF1012 rule states: "Use fmt.Fprintf(w, ...) instead of w.WriteString(fmt.Sprintf(...))." Line 833 uses sb.WriteString(fmt.Sprintf(...)) which should use fmt.Fprintf.
| fmt.Fprintf(&sb, ` {"key": %q, "count": %d, "names": [`, g.Key, g.Count) | |
| fmt.Fprintf(&sb, ` {"key": %q, "count": %d, "names": [`, g.Key, g.Count) |
Was this helpful? React with 👍 or 👎 to provide feedback.
oq/oq.go
Outdated
| if j > 0 { | ||
| sb.WriteString(", ") | ||
| } | ||
| fmt.Fprintf(&sb, "%q", n) |
There was a problem hiding this comment.
🔴 AGENTS.md violation: sb.WriteString(fmt.Sprintf(...)) instead of fmt.Fprintf in formatGroupsJSON names
AGENTS.md's staticcheck QF1012 rule states: "Use fmt.Fprintf(w, ...) instead of w.WriteString(fmt.Sprintf(...))." Line 838 uses sb.WriteString(fmt.Sprintf("%q", n)) which should use fmt.Fprintf.
| fmt.Fprintf(&sb, "%q", n) | |
| fmt.Fprintf(&sb, "%q", n) |
Was this helpful? React with 👍 or 👎 to provide feedback.
📊 Test Coverage ReportCurrent Coverage: Coverage Change: 📉 -.6% (decreased) Coverage by Package
📋 Detailed Coverage by Function (click to expand)
Generated by GitHub Actions |
New stages: explain, fields, head (alias), sample, path, top, bottom, format New operation fields: tag, parameter_count, deprecated, description, summary New graph method: ShortestPath for BFS pathfinding New formatter: FormatMarkdown for markdown table output Restore replace directive in cmd/openapi/go.mod (required for CI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- Fix stdinOrFileArgs(2,2) -> (1,2) so -f flag works with 1 positional arg - Fix OOB panic in expr tokenizer on unterminated backslash-terminated strings - Add tests for refs-out, refs-in, items, format groups, field coverage, empty/count edge cases bringing oq coverage from 72% to 83% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement FormatToon following the TOON (Token-Oriented Object Notation)
spec: tabular array syntax with header[N]{fields}: and comma-delimited
data rows. Includes proper string escaping per TOON quoting rules.
See https://github.com/toon-format/toon
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…panic Add `openapi spec query-reference` subcommand that prints the complete oq language reference. Add README.md for the oq package. Fix OOB panic in expr parser's expect() method when tokens are exhausted mid-parse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Edge annotations: 1-hop traversal stages (refs-out, refs-in, properties, union-members, items) now populate edge_kind, edge_label, and edge_from fields on result rows, making relationship types visible in query output. New traversal stages: connected, blast-radius, neighbors <n> New analysis stages: orphans, leaves, cycles, clusters, tag-boundary, shared-refs New schema fields: op_count, tag_count Graph layer additions: Neighbors (depth-limited bidirectional BFS), StronglyConnectedComponents (Tarjan's SCC), SchemaOpCount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change `openapi spec query <file> <query>` to `openapi spec query <query> [file]`. The query is the primary argument; the input file is optional and defaults to stdin when omitted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TristanSpeakEasy
left a comment
There was a problem hiding this comment.
LGTM — well-structured, well-tested addition (135 tests pass). Clean pipeline language design with good composability. A few items flagged inline.
| Meta: explain, fields, format <table|json|markdown|toon> | ||
|
|
||
| Where expressions support: ==, !=, >, <, >=, <=, and, or, not, has(), matches`, | ||
| Args: stdinOrFileArgs(1, 2), |
There was a problem hiding this comment.
Bug: stdinOrFileArgs(1, 2) requires at least 1 positional arg, but when using -f to read the query from a file, there may be 0 positional args (e.g. openapi spec query -f query.oq reading spec from stdin). This will fail Cobra arg validation before runQuery even runs. The minimum should be 0 when -f is used, or use a custom args validator that accounts for the flag.
oq/oq.go
Outdated
| } | ||
| return cmp < 0 | ||
| }) | ||
| return result, nil |
There was a problem hiding this comment.
Bug risk: execSort sorts result.Rows in-place and returns the same *Result. Other stages (execWhere, execUnique, execGroupBy) create a new *Result. This inconsistency means callers holding a reference to the pre-sort result see it mutated. Same applies to execTake (line 492) which truncates the slice in-place. Consider creating a new *Result with a sorted/sliced copy of Rows for consistency.
graph/graph.go
Outdated
| // Detect circular nodes | ||
| circularNodes := make(map[NodeID]bool) | ||
| for i := range g.Schemas { | ||
| visited := make(map[NodeID]bool) |
There was a problem hiding this comment.
Perf: This allocates fresh visited and inStack maps for every schema node, giving O(V × (V+E)) worst case. The visited map should be shared across loop iterations — once a node is fully explored (popped from the recursion stack), it never needs re-exploration. A single shared visited map brings this down to the expected O(V+E) for DFS-based cycle detection.
oq/oq.go
Outdated
| } | ||
|
|
||
| // Add schema rows | ||
| for idx := range seenSchemas { |
There was a problem hiding this comment.
Suggestion: Iterating seenSchemas (a map) produces nondeterministic row ordering — repeated runs of the same query can yield different output. Consider sorting by SchemaIdx or collecting into a sorted slice. Same issue applies to the operation map iteration on lines 809-816, execSharedRefs (line 1041), and execClusters (line 926 iterating resultNodes).
| @@ -0,0 +1,1914 @@ | |||
| // Package oq implements a pipeline query language for OpenAPI schema graphs. | |||
There was a problem hiding this comment.
Suggestion: This file is ~1,900 lines containing the parser, executor, all stage implementations, field resolution, and four output formatters. Consider splitting into separate files for navigability:
parse.go—Parse,parseStage,splitPipeline,splitFirst,parseCSV,parseTwoArgsexec.go—run,execSource,execStage, and allexec*functionsformat.go—FormatTable,FormatJSON,FormatMarkdown,FormatToonand helpersfield.go—fieldValue,FieldValuePublic,compareValues,valueToString,rowAdapter
oq/oq.go
Outdated
| } | ||
| items := make([]keyed, len(result.Rows)) | ||
| for i, row := range result.Rows { | ||
| h := sha256.Sum256([]byte(rowKey(row))) |
There was a problem hiding this comment.
Suggestion: Using sha256.Sum256 per row for a deterministic shuffle is expensive for large result sets. A seeded math/rand PRNG with Fisher-Yates shuffle using a fixed seed (e.g. derived from row count) would be simpler and faster while remaining deterministic.
oq/expr/expr.go
Outdated
| Str string | ||
| Int int | ||
| Bool bool | ||
| isNull bool |
There was a problem hiding this comment.
Cleanup: The isNull field is redundant with Kind == KindNull. It's always set alongside Kind: KindNull (see NullVal() on line 208) and only checked in hasExpr.Eval (line 106). Since isNull is unexported, external code can't use it — consider removing it and checking v.Kind == KindNull instead to reduce confusion about the canonical null check.
There was a problem hiding this comment.
Done — removed the isNull bool field from Value and replaced all v.isNull checks with v.Kind == KindNull. NullVal() now just sets Kind: KindNull.
| if g.detectCycle(edge.To, visited, inStack, circular) { | ||
| circular[id] = true | ||
| found = true | ||
| } |
There was a problem hiding this comment.
🔴 detectCycle marks non-cycle ancestor nodes as circular
The detectCycle function at graph/graph.go:636-638 propagates circular[id] = true to every ancestor of a cycle, not just nodes that are actually part of a cycle. For example, given edges A → B → C → B, node A is NOT in the cycle (only B and C are), yet detectCycle(A) returns true from its child B, causing circular[A] = true to be set. This means SchemaNode.IsCircular will incorrectly report true for any schema that can reach a cycle, making the is_circular field and where is_circular queries produce false positives.
Trace of the bug
For graph A → B → C → B:
detectCycle(A)sets inStack[A]=true, recurses to BdetectCycle(B)sets inStack[B]=true, recurses to CdetectCycle(C)sets inStack[C]=true, recurses to BdetectCycle(B)finds inStack[B]=true → sets circular[B]=true, returns true- Back in C: circular[C]=true (correct), returns true
- Back in B: circular[B]=true (already set), returns true
- Back in A: circular[A]=true ← WRONG, A is not in the cycle
The fix should use the already-implemented StronglyConnectedComponents() (Tarjan's algorithm at line 770) to identify cycle membership, or modify the DFS to only mark nodes while backtracking within the cycle boundary.
Prompt for agents
In graph/graph.go, replace the detectCycle-based circular detection in computeMetrics() (lines 570-581) with SCC-based detection. The StronglyConnectedComponents() method at line 770 already correctly identifies cycle members using Tarjan's algorithm. Replace:
circularNodes := make(map[NodeID]bool)
visited := make(map[NodeID]bool)
inStack := make(map[NodeID]bool)
for i := range g.Schemas {
nid := NodeID(i)
if !visited[nid] {
if g.detectCycle(nid, visited, inStack, circularNodes) {
circularNodes[nid] = true
}
}
}
With:
circularNodes := make(map[NodeID]bool)
for _, scc := range g.StronglyConnectedComponents() {
for _, id := range scc {
circularNodes[id] = true
}
}
// Also detect self-loops (SCCs of size 1 with a self-edge)
for i := range g.Schemas {
nid := NodeID(i)
for _, edge := range g.outEdges[nid] {
if edge.To == nid {
circularNodes[nid] = true
}
}
}
The detectCycle method can then be removed as it is no longer called.
Was this helpful? React with 👍 or 👎 to provide feedback.
| } | ||
|
|
||
| if result.IsCount { | ||
| return "count: " + strconv.Itoa(result.Count) + "\n" |
There was a problem hiding this comment.
🟡 FormatToon count output includes trailing newline causing double newline from caller
FormatToon at oq/format.go:208 returns the count string with a trailing \n: "count: " + strconv.Itoa(result.Count) + "\n". However, the caller in cmd/openapi/commands/openapi/query.go:162-165 writes the output with fmt.Fprint and then adds another newline via fmt.Fprintln when result.IsCount is true. This produces a double newline (count: 42\n\n) for TOON format only. The other format functions (FormatTable, FormatJSON, FormatMarkdown) return count without a trailing newline, so they are correct.
| return "count: " + strconv.Itoa(result.Count) + "\n" | |
| return "count: " + strconv.Itoa(result.Count) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| } | ||
|
|
||
| // Walk responses | ||
| for _, resp := range op.Responses.All() { |
There was a problem hiding this comment.
🟡 Nil pointer dereference when op.Responses map is nil and has no Default
In graph/graph.go:518, the code calls op.Responses.All() which is safe because the embedded *sequencedmap.Map's All() handles nil. However at line 533, op.Responses.Default is accessed without first checking if op.Responses itself was properly initialized. More critically, if the OpenAPI spec has an operation with no responses block at all, the Responses struct will be zero-valued with a nil embedded *sequencedmap.Map, and while All() is nil-safe, subsequent code at line 518 iterates op.Responses.All() which returns (string, *ReferencedResponse) pairs — but in practice this path is safe. The actual risk is minimal here since the nil checks inside are sufficient. I'm not reporting this as a bug.
However, looking more carefully at findOperationSchemas at graph/graph.go:477-544: the function walks op.Parameters where each parameter is *ReferencedParameter. The GetObject() call at line 497 may return a Parameter whose Schema field is a *oas3.JSONSchemaReferenceable that isn't registered in ptrToNode (e.g. inline parameter schemas from $ref parameters). This is a functional limitation but not a crash bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| result, err := oq.Execute("schemas.components | select name", g) | ||
| require.NoError(t, err) | ||
| assert.NotEmpty(t, result.Rows) |
There was a problem hiding this comment.
🟡 Missing descriptive messages on test assertions violates AGENTS.md rule
AGENTS.md mandates: "Always include descriptive error messages" on assertions. Numerous assertions across the test files (oq/oq_test.go, graph/graph_test.go, oq/expr/expr_test.go) lack descriptive messages. For example, assert.NotEmpty(t, result.Rows) at oq/oq_test.go:115 and many others throughout the test files. This is a pervasive pattern throughout all test files in the PR — nearly every assertion is missing a descriptive message string.
Prompt for agents
Add descriptive error messages to all assertions in oq/oq_test.go, graph/graph_test.go, and oq/expr/expr_test.go. Per AGENTS.md: 'Always include descriptive error messages'. For example, change assert.NotEmpty(t, result.Rows) to assert.NotEmpty(t, result.Rows, "result should contain rows"). Apply this pattern to every assert.* and require.* call across all three test files.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Implements
oq— a pipeline query language for semantic traversal and analysis of OpenAPI documents. Agents and humans can construct ad-hoc structural queries over the schema reference graph at runtime.Docs: oq README · Full reference via
openapi spec query-referenceArchitecture
New packages
graph/—SchemaGraphtype withBuild()constructor, node/edge types, BFS traversal, shortest path, SCC (Tarjan's), connected componentsoq/— Pipeline query language: parser, AST, executor, formatters (table, JSON, markdown, TOON)oq/expr/— Predicate expression parser and evaluator forwhereclausesPipeline stages
schemas,schemas.components,schemas.inline,operationsrefs-out,refs-in,reachable,ancestors,properties,union-members,items,ops,schemas,path <from> <to>,connected,blast-radius,neighbors <n>orphans,leaves,cycles,clusters,tag-boundary,shared-refswhere <expr>,select <fields>,sort <field> [asc|desc],take/head <n>,sample <n>,top <n> <field>,bottom <n> <field>,unique,group-by <field>,countexplain,fields,format <table|json|markdown|toon>Edge annotations
1-hop traversals (
refs-out,refs-in,properties,union-members,items) populateedge_kind,edge_label, andedge_fromfields on result rows, making relationship types visible:openapi spec query 'schemas.components | where name == "Pet" | refs-out | select name, edge_kind, edge_label, edge_from' petstore.yamlSchema fields
name,type,depth,in_degree,out_degree,union_width,property_count,is_component,is_inline,is_circular,has_ref,hash,path,op_count,tag_countOperation fields
name,method,path,operation_id,schema_count,component_count,tag,parameter_count,deprecated,description,summaryExample queries
Scaling verification
Tested against real specs from the Speakeasy specs corpus:
Test plan
🤖 Generated with Claude Code