Skip to content

Add schema path heuristic to suggest correct location for misplaced frontmatter fields#18320

Merged
pelikhan merged 3 commits intomainfrom
copilot/add-heuristic-for-unknown-fields
Feb 25, 2026
Merged

Add schema path heuristic to suggest correct location for misplaced frontmatter fields#18320
pelikhan merged 3 commits intomainfrom
copilot/add-heuristic-for-unknown-fields

Conversation

Copy link
Contributor

Copilot AI commented Feb 25, 2026

  • Add schemaFieldLocation struct to pkg/parser/schema_suggestions.go
  • Add collectSchemaPropertyPaths() to recursively walk the schema and collect all (fieldName, parentPath) pairs
  • Add findFieldLocationsInSchema() for exact + high-confidence fuzzy (distance ≤ 2) search
  • Add generatePathLocationSuggestion() to format path suggestion messages
  • Update generateSchemaBasedSuggestions() to also search for correct field paths in addition to existing typo correction
  • Add formatSchemaPathForDisplay() helper to convert schema paths to user-friendly strings
  • Add tests: TestCollectSchemaPropertyPaths, TestFindFieldLocationsInSchema, TestGeneratePathLocationSuggestion, TestGenerateSchemaBasedSuggestionsWithPathHeuristic
  • Run make lint and make agent-finish to validate — all checks pass
  • Change discussions: false to discussions: true in .github/aw/github-agentic-workflows.md documentation example
Original prompt

When a syntax error is detected for an unknown field, add an heuristic to find that field in the JSON schema and suggest the path to the correct location for that field.

for example, if this is the code

safe-outputs:
create-pull-request:
skip-if-match: ...

suggest that skip-if-match belongs under /on in the yaml.

on:
skip-if-match:

  • use fuzzy search if not direct match, show hits with high confidence

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@pelikhan
Copy link
Contributor

@copilot change discussions to true

… validation

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add heuristic to suggest correct path for unknown fields Add schema path heuristic to suggest correct location for misplaced frontmatter fields Feb 25, 2026
@pelikhan
Copy link
Contributor

@copilot format error message to help an agent fix the yml syntax

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 25, 2026

@copilot change discussions to true

Done in 9128ae6 — changed discussions: false to discussions: true in the add-comment example in .github/aw/github-agentic-workflows.md.

@pelikhan pelikhan marked this pull request as ready for review February 25, 2026 12:22
Copilot AI review requested due to automatic review settings February 25, 2026 12:22
@pelikhan pelikhan merged commit 6867b81 into main Feb 25, 2026
75 of 78 checks passed
@pelikhan pelikhan deleted the copilot/add-heuristic-for-unknown-fields branch February 25, 2026 12:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a schema-walking heuristic to locate where unknown/misplaced frontmatter fields belong in the JSON schema and surface a “belongs under …” suggestion (including high-confidence fuzzy matches), plus accompanying unit tests and a small documentation example tweak.

Changes:

  • Introduces schema traversal utilities to collect property paths and find candidate field locations (exact + fuzzy).
  • Extends additional-properties suggestion generation to include “field belongs under /path” hints.
  • Adds unit tests covering schema path collection, location finding, and suggestion formatting; updates docs example discussions flag.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pkg/parser/schema_suggestions.go Implements schema path collection/search and integrates path-based suggestions into existing schema suggestion flow.
pkg/parser/schema_suggestions_test.go Adds tests for schema traversal and the new path heuristic suggestion output.
.github/aw/github-agentic-workflows.md Updates documentation example to set discussions: true.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

if len(suggestions) > 0 {
return strings.Join(suggestions, ". ")
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateSchemaBasedSuggestions joins multiple suggestion strings with ". ", but generateFieldSuggestions often ends with ? (or no terminal punctuation). This can produce awkward output like "Did you mean 'x'?. 'x' belongs under ..." and then callers append another . prefix. Consider normalizing suggestion punctuation (e.g., join with a single space, or trim trailing punctuation and then join with . , or ensure each component is sentence-safe before concatenation).

Suggested change
return strings.Join(suggestions, ". ")
return strings.Join(suggestions, " ")

Copilot uses AI. Check for mistakes.
Comment on lines +503 to +523
// collectSchemaPropertyPaths recursively collects all (fieldName, parentPath) pairs from a JSON schema document.
// It traverses properties, oneOf/anyOf/allOf, and items to build a complete picture of valid fields across the schema.
func collectSchemaPropertyPaths(schemaDoc any, currentPath string, depth int) []schemaFieldLocation {
if depth > schemaTraversalMaxDepth {
return nil
}

schemaMap, ok := schemaDoc.(map[string]any)
if !ok {
return nil
}

var results []schemaFieldLocation

// Collect fields from properties and recurse into each property's schema
if properties, ok := schemaMap["properties"].(map[string]any); ok {
for fieldName, fieldSchema := range properties {
results = append(results, schemaFieldLocation{FieldName: fieldName, SchemaPath: currentPath})
sub := collectSchemaPropertyPaths(fieldSchema, currentPath+"/"+fieldName, depth+1)
results = append(results, sub...)
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectSchemaPropertyPaths does not follow $ref schemas, so it will miss many valid field locations in schemas that use references (e.g. pkg/parser/schemas/main_workflow_schema.json contains many $ref). This will cause the new path heuristic to frequently fail to find the correct location. Consider resolving local $ref values (JSON Pointer #/...) during traversal (with cycle detection / visited set) so referenced subschemas contribute their properties to the collected paths.

Copilot uses AI. Check for mistakes.
}
}

if len(exactMatches) > 0 {
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact-match results are returned without sorting. Because collectSchemaPropertyPaths iterates over maps, the order of exactMatches (and thus the order of suggested paths when multiple locations exist) can be nondeterministic between runs. Sorting exactMatches (e.g., by SchemaPath) would make output stable and easier to test/debug.

Suggested change
if len(exactMatches) > 0 {
if len(exactMatches) > 0 {
// Sort exact matches by schema path (and field name) for stable output
sort.Slice(exactMatches, func(i, j int) bool {
if exactMatches[i].SchemaPath != exactMatches[j].SchemaPath {
return exactMatches[i].SchemaPath < exactMatches[j].SchemaPath
}
return exactMatches[i].FieldName < exactMatches[j].FieldName
})

Copilot uses AI. Check for mistakes.
Comment on lines +625 to +627
var parts []string
for _, prop := range invalidProps {
locations := findFieldLocationsInSchema(schemaDoc, prop, currentPath)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generatePathLocationSuggestion calls findFieldLocationsInSchema for each invalid property, and findFieldLocationsInSchema traverses the entire schema each time via collectSchemaPropertyPaths. For multiple invalid fields this becomes repeated full-schema walks. Consider collecting all property paths once per generatePathLocationSuggestion call (or building an index map[fieldName][]schemaFieldLocation) and reusing it for each invalid property.

Suggested change
var parts []string
for _, prop := range invalidProps {
locations := findFieldLocationsInSchema(schemaDoc, prop, currentPath)
// Cache field locations per property name within this call to avoid redundant schema walks
locationCache := make(map[string][]schemaFieldLocation)
var parts []string
for _, prop := range invalidProps {
locations, ok := locationCache[prop]
if !ok {
locations = findFieldLocationsInSchema(schemaDoc, prop, currentPath)
locationCache[prop] = locations
}

Copilot uses AI. Check for mistakes.
Comment on lines +641 to +646
for _, loc := range locations {
display := "'" + formatSchemaPathForDisplay(loc.SchemaPath) + "'"
if !seenPaths[display] {
seenPaths[display] = true
pathNames = append(pathNames, display)
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatSchemaPathForDisplay returns "the root level", but generatePathLocationSuggestion always wraps display strings in single quotes. This can produce messages like "... belongs under 'the root level'", which reads a bit oddly. Consider handling the root case specially (e.g., no quotes, or display /).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants