AGENTS.md - LLM Development Guide for audit-cli

This document provides essential context for LLMs performing development tasks in the audit-cli repository.

Repository Overview

Purpose: A Go CLI tool for auditing and analyzing MongoDB's reStructuredText (RST) documentation.

Key Capabilities:

Extract code examples and procedures from RST files
Search documentation for patterns
Analyze file dependencies and relationships
Analyze composable definitions and usage across projects
Compare files across documentation versions
Count documentation pages and tested code examples
Generate reports on testable code examples from analytics data

Target Users: MongoDB technical writers performing maintenance, scoping work, and reporting.

Project Structure

audit-cli/
├── main.go                    # CLI entry point using cobra
├── go.mod                     # Module: github.com/grove-platform/audit-cli
├── commands/                  # Command implementations (parent + subcommands)
│   ├── extract/              # Extract content from RST files
│   │   ├── extract.go        # Parent command
│   │   ├── code-examples/    # Extract code examples subcommand
│   │   └── procedures/       # Extract procedures subcommand
│   ├── search/               # Search through files
│   │   └── find-string/      # Find string subcommand
│   ├── analyze/              # Analyze RST structures
│   │   ├── includes/         # Analyze include relationships
│   │   ├── usage/            # Find file usages
│   │   ├── procedures/       # Analyze procedure variations
│   │   └── composables/      # Analyze composable definitions and usage
│   ├── compare/              # Compare files across versions
│   │   └── file-contents/    # Compare file contents
│   ├── count/                # Count documentation content
│   │   ├── tested-examples/  # Count tested code examples
│   │   └── pages/            # Count documentation pages
│   └── report/               # Generate reports from documentation data
│       └── testable-code/    # Analyze testable code examples from analytics
├── internal/                 # Internal packages (not importable externally)
│   ├── config/               # Configuration management
│   │   ├── config.go         # Config loading from file/env/args
│   │   ├── config_test.go    # Config tests
│   │   ├── url_mapping.go    # URL-to-source-file mapping via Snooty Data API
│   │   └── url_mapping_test.go # URL mapping tests
│   ├── language/             # Programming language utilities
│   │   ├── language.go       # Language normalization, extensions, products
│   │   └── language_test.go  # Language tests
│   ├── projectinfo/          # MongoDB docs project structure utilities
│   │   ├── pathresolver.go   # Path resolution
│   │   ├── products.go       # Content directory to product mapping
│   │   ├── source_finder.go  # Source directory detection
│   │   └── version_resolver.go # Version path resolution
│   ├── rst/                  # RST parsing utilities
│   │   ├── parser.go         # Generic parsing with includes
│   │   ├── directive_parser.go # Directive parsing with language resolution
│   │   ├── directive_regex.go  # Regex patterns for directives
│   │   ├── parse_procedures.go # Procedure parsing (core logic)
│   │   ├── get_procedure_variations.go # Variation extraction
│   │   ├── rstspec.go        # Fetch and parse canonical rstspec.toml
│   │   └── yaml_steps_parser.go # Parse YAML steps files for code examples
│   └── snooty/               # Snooty.toml parsing utilities
│       ├── snooty.go         # Parse snooty.toml, find project config
│       └── snooty_test.go    # Snooty tests
├── testdata/                 # Test fixtures (auto-ignored by Go build)
│   ├── input-files/source/   # Test RST files
│   ├── expected-output/      # Expected extraction results
│   ├── compare/              # Compare command test data
│   ├── count-test-monorepo/  # Count command test data
│   └── testable-code-test/   # Testable code report test data
├── bin/                      # Build output directory
├── docs/                     # Additional documentation
│   └── PROCEDURE_PARSING.md  # Detailed procedure parsing logic
└── README.md                 # Comprehensive user documentation

Technology Stack

Language: Go
CLI Framework: spf13/cobra
Diff Library: aymanbagabas/go-udiff
YAML Parsing: gopkg.in/yaml.vX
TOML Parsing: github.com/BurntSushi/toml v1.5.0
Testing: Go standard library (testing package)

Refer to the go.mod for version info.

Domain Knowledge: MongoDB Documentation

RST Directives Supported

Code Example Directives:

.. literalinclude:: - Transclude code from external files
.. code-block:: - Inline code blocks
.. io-code-block:: - Input/output code examples with .. input:: and .. output:: sub-directives

Procedure Directives:

.. procedure:: with .. step:: - Structured procedures
Ordered lists (numbered 1. or lettered a.) - Simple procedures
#. - Continuation marker (auto-numbered)
YAML steps files - Converted to procedures during build

Variation Mechanisms:

.. composable-tutorial:: with .. selected-content:: - Content variations by selection
.. tabs:: with .. tab:: and :tabid: - Tabbed content variations

Content Inclusion:

.. include:: - Include RST content from other files
.. toctree:: - Table of contents (navigation, not content inclusion)

Composables:

Defined in snooty.toml files at project/version root
Canonical definitions also exist in rstspec.toml in the snooty-parser repository
Used in .. composable-tutorial:: directives with :options: parameter
Enable context-specific documentation (e.g., different languages, deployment types)
Each composable has an ID, title, default, and list of options
The internal/rst module provides FetchRstspec() to retrieve canonical definitions

MongoDB Documentation Structure

Versioned Projects: content/{project}/{version}/source/

Versions: manual, current, upcoming, v8.0, v7.0, etc.

Non-versioned Projects: content/{project}/source/

Tested Code Examples: content/code-examples/tested/{language}/{product}/

Products: pymongo, mongosh, go/driver, go/atlas-sdk, javascript/driver, java/driver-sync, csharp/driver

Configuration

Monorepo Path Configuration

Some commands require a monorepo path (analyze composables, count tested-examples, count pages). The path can be configured in three ways, with the following priority (highest to lowest):

Command-line argument - Passed directly to the command
Environment variable - AUDIT_CLI_MONOREPO_PATH
Config file - .audit-cli.yaml in current directory or home directory

Config File Format (.audit-cli.yaml):

monorepo_path: /path/to/docs-monorepo

Config File Locations (searched in order):

Current directory: ./.audit-cli.yaml
Home directory: ~/.audit-cli.yaml

Implementation:

Config loading is handled by internal/config package
Commands use config.GetMonorepoPath(cmdLineArg) to resolve the path
Commands accept 0 or 1 arguments using cobra.MaximumNArgs(1)
If no path is configured, a helpful error message is displayed

Example Usage:

// In command RunE function
var cmdLineArg string
if len(args) > 0 {
    cmdLineArg = args[0]
}
monorepoPath, err := config.GetMonorepoPath(cmdLineArg)
if err != nil {
    return err
}

File Path Resolution

File-based commands support flexible path resolution through config.ResolveFilePath(). This allows users to specify paths in three ways:

Absolute path - Used as-is
Relative to monorepo root - If monorepo is configured and path exists there
Relative to current directory - Fallback if not found in monorepo

Priority Order:

If path is absolute → return as-is (after verifying it exists)
If monorepo is configured and path exists relative to monorepo → use monorepo-relative path
Otherwise → resolve relative to current directory

Implementation:

File path resolution is handled by config.ResolveFilePath(pathArg) in internal/config package
Commands that take file paths should use this function in their RunE function
The function returns an absolute path or an error if the path doesn't exist

Example Usage:

// In command RunE function for file-based commands
RunE: func(cmd *cobra.Command, args []string) error {
    // Resolve file path (supports absolute, monorepo-relative, or cwd-relative)
    filePath, err := config.ResolveFilePath(args[0])
    if err != nil {
        return err
    }
    return runCommand(filePath, ...)
}

Commands Using File Path Resolution:

extract code-examples
extract procedures
analyze includes
analyze usage
search find-string
compare file-contents

Building and Running

Build from Source

cd bin
go build ../
./audit-cli --help

Run Without Building

go run main.go [command] [flags]

Check Version

./audit-cli --version
# Output: audit-cli version 0.1.0

Run Tests

# All tests
go test ./...

# Specific package
go test ./commands/extract/code-examples -v

# Specific test
go test ./commands/extract/code-examples -run TestLiteralIncludeDirective -v

# With coverage
go test ./... -cover

Versioning

This project follows Semantic Versioning (SemVer):

MAJOR version (X.0.0): Incompatible API changes or breaking changes to command behavior
MINOR version (0.X.0): New functionality added in a backward-compatible manner
PATCH version (0.0.X): Backward-compatible bug fixes

Note: While in 0.x.x versions, breaking changes may occur in minor releases. Version 1.0.0 will signal a stable, production-ready release.

When to Increment Versions

MAJOR (e.g., 0.5.0 → 1.0.0):
- Breaking changes to command syntax or flags
- Removal of commands or features
- Changes to output format that break existing scripts
- First stable release (0.x.x → 1.0.0)
MINOR (e.g., 0.1.0 → 0.2.0):
- New commands or subcommands
- New flags or options
- New RST directive support
- New output formats (when existing formats remain unchanged)
- Significant new features
PATCH (e.g., 0.1.0 → 0.1.1):
- Bug fixes
- Performance improvements
- Documentation updates
- Internal refactoring with no user-facing changes

Releasing a New Version

When releasing a new version, follow these steps:

Update the version constant in main.go:

const version = "0.2.0"  // Update this line

Update CHANGELOG.md following the Keep a Changelog format:

## [0.2.0] - YYYY-MM-DD

### Added
- New feature descriptions

### Changed
- Modified behavior descriptions

### Fixed
- Bug fix descriptions

### Removed
- Removed feature descriptions (if any)

Test the version output:

go run main.go --version
# Should display: audit-cli version 0.2.0

Commit the changes:

git add main.go CHANGELOG.md
git commit -m "Release version 0.2.0"

Tag the release (optional but recommended):
```
git tag v0.2.0
git push origin v0.2.0
```

CHANGELOG Format

The CHANGELOG.md follows the Keep a Changelog format with these sections:

Added: New features, commands, or capabilities
Changed: Changes to existing functionality
Deprecated: Features that will be removed in future versions
Removed: Features that have been removed
Fixed: Bug fixes
Security: Security-related changes

Each version entry should include:

Version number in square brackets: [0.2.0]
Release date in ISO format: YYYY-MM-DD
Organized sections with bullet points describing changes
User-facing language (avoid technical jargon when possible)

Development Patterns

Command Structure

Parent-Subcommand Pattern: All commands follow a two-level hierarchy:

Parent command (e.g., extract, analyze) - defined in commands/{parent}/{parent}.go
Subcommand (e.g., code-examples, procedures) - defined in commands/{parent}/{subcommand}/{subcommand}.go

File Organization per Subcommand:

commands/{parent}/{subcommand}/
├── {subcommand}.go       # Command definition and RunE function
├── {subcommand}_test.go  # Tests
├── types.go              # Type definitions
├── parser.go or analyzer.go  # Core logic
├── output.go or report.go    # Output formatting
└── (other domain-specific files)

Command Registration: Parent commands register subcommands in their New{Parent}Command() function:

func NewExtractCommand() *cobra.Command {
    cmd := &cobra.Command{Use: "extract", Short: "..."}
    cmd.AddCommand(codeexamples.NewCodeExamplesCommand())
    cmd.AddCommand(procedures.NewProceduresCommand())
    return cmd
}

Error Handling

Use fmt.Errorf() for error wrapping with context
Return errors from functions; handle at command level
Print errors to stderr using fmt.Fprintf(os.Stderr, ...)
Exit with non-zero status on errors

Testing Conventions

Test Data Location: testdata/ directory (auto-ignored by Go build)

Input files: testdata/input-files/source/
Expected output: testdata/expected-output/
Relative path from test: filepath.Join("..", "..", "..", "testdata")

Test Patterns:

Table-driven tests for multiple scenarios
Temporary directories for output: os.MkdirTemp("", "test-*")
Clean up with defer os.RemoveAll(tempDir)
Byte-for-byte content comparison for extracted files

Deterministic Testing: Critical for procedure parsing

All map iterations must be sorted
Content hashing must use sorted keys
Run tests multiple times to verify determinism

Code Patterns

Path Resolution:

Use filepath.Join() for cross-platform paths
Use filepath.Abs() to get absolute paths
Use internal/projectinfo for MongoDB-specific path resolution

RST Parsing:

Use internal/rst package for directive parsing
Use regex patterns from internal/rst/directive_regex.go
Handle include directives with ParseFileWithIncludes()

Output Formatting:

Separate output logic into output.go or report.go
Support multiple output formats (text, JSON) where applicable
Use consistent formatting (headers with = separators, indentation)

Network Request Caching

All network requests to external APIs should implement caching to avoid repeated requests and support offline usage. The caching pattern is implemented in internal/config/url_mapping.go (for Snooty Data API) and internal/rst/rstspec.go (for rstspec.toml).

Cache Location: ~/.audit-cli/ directory

URL mapping cache: ~/.audit-cli/url-mapping-cache.json
Rstspec cache: ~/.audit-cli/rstspec-cache.json

Cache TTL: 24 hours (configurable per cache type)

Implementation Pattern:

Define cache constants:

const CacheTTL = 24 * time.Hour
const CacheDir = ".audit-cli"
const CacheFileName = "my-cache.json"

Create cache struct with timestamp and data:

type MyCache struct {
    Timestamp time.Time `json:"timestamp"`
    Data      MyData    `json:"data"`
}

Implement cache functions:

// getCachePath returns the path to the cache file
func getCachePath() (string, error) {
    homeDir, err := os.UserHomeDir()
    if err != nil {
        return "", fmt.Errorf("failed to get home directory: %w", err)
    }
    return filepath.Join(homeDir, CacheDir, CacheFileName), nil
}

// loadCache loads from cache, returns error if missing or expired
func loadCache() (*MyData, error) {
    // Read file, unmarshal JSON, check TTL
}

// saveCache saves data to cache with current timestamp
func saveCache(data *MyData) error {
    // Create directory if needed, marshal JSON, write file
}

// fetchFromAPI fetches fresh data from the network
func fetchFromAPI() (*MyData, error) {
    // HTTP request, parse response
}

Main fetch function with fallback logic:

func FetchData() (*MyData, error) {
    // Try cache first
    data, err := loadCache()
    if err == nil {
        return data, nil
    }

    // Cache miss or expired, try network
    data, fetchErr := fetchFromAPI()
    if fetchErr != nil {
        // Network failed - try expired cache as offline fallback
        // (read cache file without TTL check)
        if expiredData := loadExpiredCache(); expiredData != nil {
            fmt.Fprintf(os.Stderr, "Warning: Using expired cache\n")
            return expiredData, nil
        }
        return nil, fetchErr
    }

    // Save to cache for next time
    if saveErr := saveCache(data); saveErr != nil {
        fmt.Fprintf(os.Stderr, "Warning: Could not save cache: %v\n", saveErr)
    }

    return data, nil
}

Key Behaviors:

Cache is stored in user's home directory for persistence across sessions
Expired cache is used as fallback when network is unavailable (offline support)
Cache save failures are logged as warnings but don't fail the operation
JSON format for easy debugging and human readability

When Adding New Network Calls:

Follow the pattern above
Add cache file name constant
Implement the four cache functions
Use the same ~/.audit-cli/ directory for consistency
Consider appropriate TTL (24 hours is default, adjust if data changes more/less frequently)

Key Design Decisions

RST Parsing Strategy

Incremental, Opportunistic Parsing: RST parsing capabilities are added incrementally as needed by various commands, rather than using a complete AST-based parser.

Rationale:

MongoDB documentation uses many custom directives not supported by standard reStructuredText parsing libraries
A complete parser converting RST to an AST would require significant operational overhead that is not needed at this time
Targeted parsing for specific directives is more maintainable and performant for this use case

Critical Rule: All new RST parsing functionality MUST be added to the internal/rst module, NOT to individual command modules. This ensures:

Parsing capabilities can be reused across commands
Functionality can be expanded incrementally
Parsing logic remains centralized and maintainable

Implementation Pattern:

Add regex patterns to internal/rst/directive_regex.go
Add directive types and parsing logic to internal/rst/directive_parser.go
Add specialized parsing functions (e.g., parse_procedures.go) as separate files in internal/rst
Commands import and use functions from internal/rst package

Procedure Parsing

Uniqueness: Procedures are grouped by heading + content hash

Same heading, different content → separate procedures
Same content, multiple selections → one procedure with multiple appearances

Analysis vs. Extraction:

Analysis: Groups procedures by heading, shows variations
Extraction: Creates one file per unique procedure (by content hash)

Determinism: All operations must produce consistent results

Sort all map iterations
Use sorted keys for hashing
Critical for testing and CI/CD

See docs/PROCEDURE_PARSING.md for comprehensive details.

Include Directive Handling

Context-Aware Expansion:

No composable tutorial → Expand all includes globally
Composable tutorial with selected-content in main file → Expand includes within blocks
Composable tutorial with includes in steps → Expand to detect selected-content blocks

Version Comparison

Auto-Discovery: Automatically detects product directory and available versions from file path

Pattern: /path/to/{product}/{version}/source/...
Discovers all sibling version directories unless --versions specified

Adding New Commands

1. Create Subcommand Directory

mkdir -p commands/{parent}/{subcommand}

2. Create Command File

Create commands/{parent}/{subcommand}/{subcommand}.go:

package subcommand

import (
    "github.com/spf13/cobra"
)

func NewSubcommandCommand() *cobra.Command {
    cmd := &cobra.Command{
        Use:   "subcommand",
        Short: "Brief description",
        Long:  `Detailed description`,
        Args:  cobra.ExactArgs(1), // or cobra.NoArgs, etc.
        RunE:  runSubcommand,
    }

    // Add flags
    cmd.Flags().StringP("output", "o", "./output", "Output directory")

    return cmd
}

func runSubcommand(cmd *cobra.Command, args []string) error {
    // Implementation
    return nil
}

3. Create Supporting Files

types.go - Type definitions
{subcommand}_test.go - Tests
Domain-specific files (parser, analyzer, output, etc.)

4. Register with Parent Command

In commands/{parent}/{parent}.go:

import "github.com/mongodb/code-example-tooling/audit-cli/commands/{parent}/{subcommand}"

func New{Parent}Command() *cobra.Command {
    cmd := &cobra.Command{...}
    cmd.AddCommand(subcommand.NewSubcommandCommand())
    return cmd
}

5. Add Tests

Create test fixtures in testdata/ and write tests following existing patterns.

Common Tasks

Adding a New RST Directive

Add regex pattern to internal/rst/directive_regex.go:

var MyDirectiveRegex = regexp.MustCompile(`^\.\.\s+my-directive::\s*(.*)$`)

Add directive type to internal/rst/directive_parser.go:

const (
    MyDirective DirectiveType = "my-directive"
)

Add parsing logic in ParseDirectives() function
Add tests in appropriate test file

Updating Expected Test Output

When changing parsing logic:

# Regenerate expected output
./bin/audit-cli extract code-examples testdata/input-files/source/literalinclude-test.rst \
  -o testdata/expected-output

# Verify changes are correct before committing
git diff testdata/expected-output/

Adding Support for a New Product

For count tested-examples command:

Add product to valid products list in commands/count/tested-examples/counter.go
Update README.md with new product
Add test data to testdata/count-test-monorepo/content/code-examples/tested/
Update tests in tested_examples_test.go

Important Notes for LLMs

When Making Changes

Always check existing patterns - Look at similar commands/functions before implementing new ones
Maintain consistency - Follow the established file organization and naming conventions
Update tests - All changes should include corresponding test updates
Check determinism - For procedure parsing, verify output is consistent across runs
Update documentation - Keep README.md, AGENTS.md, and PROCEDURE_PARSING.md in sync with code changes

Common Pitfalls

Map iteration order - Always sort map keys before iterating (especially in procedure parsing)
Path separators - Use filepath.Join() instead of string concatenation
Relative paths in tests - Remember tests are 3 levels deep: commands/{parent}/{subcommand}/
Include directive resolution - Use internal/projectinfo for MongoDB-specific path conventions
Testdata directory - This is a special Go convention; files here are ignored during builds

Testing Requirements

Unit tests for all new functions
Integration tests for command execution
Determinism tests for procedure parsing
Table-driven tests for multiple scenarios
Test coverage should not decrease

Documentation Requirements

Package-level comments for all packages
Function comments for exported functions
README.md updates for user-facing changes
PROCEDURE_PARSING.md updates for procedure parsing logic changes
Inline comments for complex logic

Resources

README.md: Comprehensive user documentation and development guide
docs/PROCEDURE_PARSING.md: Detailed procedure parsing business logic
Go Cobra Documentation: https://github.com/spf13/cobra
MongoDB RST Conventions: See examples in testdata/input-files/source/

Quick Reference

File Naming Conventions

Commands: {subcommand}.go
Tests: {subcommand}_test.go
Types: types.go
Core logic: parser.go, analyzer.go, counter.go, etc.
Output: output.go, report.go

Import Path

import "github.com/grove-platform/audit-cli/{package}"

Running Specific Tests

# By package
go test ./internal/rst -v

# By function name
go test ./commands/extract/procedures -run TestParseFileDeterministic -v

# With race detection
go test ./... -race

Build Tags

None currently used in this project.

Environment Variables