Skip to content

--ask can't index into array elements (FORMAT/AD[1], INFO/AF[0]) #2

@robertlangdonn

Description

@robertlangdonn

Problem

When a user asks a query that needs to reference a specific element of a Number=R or Number=A field, the filter expression language has no syntax for it.

Example: "variants with exactly 20 alt reads" requires FORMAT/AD[1] == 20 (the second element of the allelic depths field). The current expression language only supports whole-field comparisons like FORMAT/AD == 20, which uses any-element semantics and matches if any allele depth equals 20.

The LLM currently handles this in one of two ways:

  • Produces a misleading expression (e.g. FORMAT/DP == 20) that approximates the intent but matches different records
  • Gates at low confidence with a caveat — the correct behavior, but the user has no -e workaround either

Failing queries from GiAB dogfood

  • "variants with exactly 20 reads supporting the alt" → FORMAT/AD has no indexing; expression matches total depth, not alt depth
  • "biallelic SNPs" → cannot combine INFO/varType value check (unknown values) with genotype shape

Possible fixes

  1. Add array indexing to the filter expression languageFORMAT/AD[1] == 20, INFO/AF[0] < 0.01. This is the proper fix but requires parser + evaluator changes. Target v0.4.
  2. Document the limitation in --ask examples — explain in filter.mdx that per-element queries need -e with bcftools-style expressions.
  3. Teach the LLM to always gate on array-indexing queries — add a rule to the system prompt: if the query requires indexing into a multi-value field, set confidence < 0.5. Already partially handled by calibration rules added in v0.3.0-alpha.3.

Workaround (current)

Use bcftools view for queries that need element-level access:

bcftools view -i 'FORMAT/AD[0:1] == 20' input.vcf

Related

  • docs/known_differences.md §3: schema-based grounding limitations
  • drafts/phase3-dogfood-log.md: GiAB HG001 session analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions