This document provides essential context for LLMs performing development tasks in the audit-cli repository.
Purpose: A Go CLI tool for auditing and analyzing MongoDB's reStructuredText (RST) documentation.
Key Capabilities:
- Extract code examples and procedures from RST files
- Search documentation for patterns
- Analyze file dependencies and relationships
- Analyze composable definitions and usage across projects
- Compare files across documentation versions
- Count documentation pages and tested code examples
- Generate reports on testable code examples from analytics data
Target Users: MongoDB technical writers performing maintenance, scoping work, and reporting.
audit-cli/
├── main.go # CLI entry point using cobra
├── go.mod # Module: github.com/grove-platform/audit-cli
├── commands/ # Command implementations (parent + subcommands)
│ ├── extract/ # Extract content from RST files
│ │ ├── extract.go # Parent command
│ │ ├── code-examples/ # Extract code examples subcommand
│ │ └── procedures/ # Extract procedures subcommand
│ ├── search/ # Search through files
│ │ └── find-string/ # Find string subcommand
│ ├── analyze/ # Analyze RST structures
│ │ ├── includes/ # Analyze include relationships
│ │ ├── usage/ # Find file usages
│ │ ├── procedures/ # Analyze procedure variations
│ │ └── composables/ # Analyze composable definitions and usage
│ ├── compare/ # Compare files across versions
│ │ └── file-contents/ # Compare file contents
│ ├── count/ # Count documentation content
│ │ ├── tested-examples/ # Count tested code examples
│ │ └── pages/ # Count documentation pages
│ └── report/ # Generate reports from documentation data
│ └── testable-code/ # Analyze testable code examples from analytics
├── internal/ # Internal packages (not importable externally)
│ ├── config/ # Configuration management
│ │ ├── config.go # Config loading from file/env/args
│ │ ├── config_test.go # Config tests
│ │ ├── url_mapping.go # URL-to-source-file mapping via Snooty Data API
│ │ └── url_mapping_test.go # URL mapping tests
│ ├── language/ # Programming language utilities
│ │ ├── language.go # Language normalization, extensions, products
│ │ └── language_test.go # Language tests
│ ├── projectinfo/ # MongoDB docs project structure utilities
│ │ ├── pathresolver.go # Path resolution
│ │ ├── products.go # Content directory to product mapping
│ │ ├── source_finder.go # Source directory detection
│ │ └── version_resolver.go # Version path resolution
│ ├── rst/ # RST parsing utilities
│ │ ├── parser.go # Generic parsing with includes
│ │ ├── directive_parser.go # Directive parsing with language resolution
│ │ ├── directive_regex.go # Regex patterns for directives
│ │ ├── parse_procedures.go # Procedure parsing (core logic)
│ │ ├── get_procedure_variations.go # Variation extraction
│ │ ├── rstspec.go # Fetch and parse canonical rstspec.toml
│ │ └── yaml_steps_parser.go # Parse YAML steps files for code examples
│ └── snooty/ # Snooty.toml parsing utilities
│ ├── snooty.go # Parse snooty.toml, find project config
│ └── snooty_test.go # Snooty tests
├── testdata/ # Test fixtures (auto-ignored by Go build)
│ ├── input-files/source/ # Test RST files
│ ├── expected-output/ # Expected extraction results
│ ├── compare/ # Compare command test data
│ ├── count-test-monorepo/ # Count command test data
│ └── testable-code-test/ # Testable code report test data
├── bin/ # Build output directory
├── docs/ # Additional documentation
│ └── PROCEDURE_PARSING.md # Detailed procedure parsing logic
└── README.md # Comprehensive user documentation
- Language: Go
- CLI Framework: spf13/cobra
- Diff Library: aymanbagabas/go-udiff
- YAML Parsing: gopkg.in/yaml.vX
- TOML Parsing: github.com/BurntSushi/toml v1.5.0
- Testing: Go standard library (
testingpackage)
Refer to the go.mod for version info.
Code Example Directives:
.. literalinclude::- Transclude code from external files.. code-block::- Inline code blocks.. io-code-block::- Input/output code examples with.. input::and.. output::sub-directives
Procedure Directives:
.. procedure::with.. step::- Structured procedures- Ordered lists (numbered
1.or lettereda.) - Simple procedures #.- Continuation marker (auto-numbered)- YAML steps files - Converted to procedures during build
Variation Mechanisms:
.. composable-tutorial::with.. selected-content::- Content variations by selection.. tabs::with.. tab::and:tabid:- Tabbed content variations
Content Inclusion:
.. include::- Include RST content from other files.. toctree::- Table of contents (navigation, not content inclusion)
Composables:
- Defined in
snooty.tomlfiles at project/version root - Canonical definitions also exist in
rstspec.tomlin the snooty-parser repository - Used in
.. composable-tutorial::directives with:options:parameter - Enable context-specific documentation (e.g., different languages, deployment types)
- Each composable has an ID, title, default, and list of options
- The
internal/rstmodule providesFetchRstspec()to retrieve canonical definitions
Versioned Projects: content/{project}/{version}/source/
- Versions:
manual,current,upcoming,v8.0,v7.0, etc.
Non-versioned Projects: content/{project}/source/
Tested Code Examples: content/code-examples/tested/{language}/{product}/
- Products:
pymongo,mongosh,go/driver,go/atlas-sdk,javascript/driver,java/driver-sync,csharp/driver
Some commands require a monorepo path (analyze composables, count tested-examples, count pages). The path can be configured in three ways, with the following priority (highest to lowest):
- Command-line argument - Passed directly to the command
- Environment variable -
AUDIT_CLI_MONOREPO_PATH - Config file -
.audit-cli.yamlin current directory or home directory
Config File Format (.audit-cli.yaml):
monorepo_path: /path/to/docs-monorepoConfig File Locations (searched in order):
- Current directory:
./.audit-cli.yaml - Home directory:
~/.audit-cli.yaml
Implementation:
- Config loading is handled by
internal/configpackage - Commands use
config.GetMonorepoPath(cmdLineArg)to resolve the path - Commands accept 0 or 1 arguments using
cobra.MaximumNArgs(1) - If no path is configured, a helpful error message is displayed
Example Usage:
// In command RunE function
var cmdLineArg string
if len(args) > 0 {
cmdLineArg = args[0]
}
monorepoPath, err := config.GetMonorepoPath(cmdLineArg)
if err != nil {
return err
}File-based commands support flexible path resolution through config.ResolveFilePath(). This allows users to specify paths in three ways:
- Absolute path - Used as-is
- Relative to monorepo root - If monorepo is configured and path exists there
- Relative to current directory - Fallback if not found in monorepo
Priority Order:
- If path is absolute → return as-is (after verifying it exists)
- If monorepo is configured and path exists relative to monorepo → use monorepo-relative path
- Otherwise → resolve relative to current directory
Implementation:
- File path resolution is handled by
config.ResolveFilePath(pathArg)ininternal/configpackage - Commands that take file paths should use this function in their
RunEfunction - The function returns an absolute path or an error if the path doesn't exist
Example Usage:
// In command RunE function for file-based commands
RunE: func(cmd *cobra.Command, args []string) error {
// Resolve file path (supports absolute, monorepo-relative, or cwd-relative)
filePath, err := config.ResolveFilePath(args[0])
if err != nil {
return err
}
return runCommand(filePath, ...)
}Commands Using File Path Resolution:
extract code-examplesextract proceduresanalyze includesanalyze usagesearch find-stringcompare file-contents
cd bin
go build ../
./audit-cli --helpgo run main.go [command] [flags]./audit-cli --version
# Output: audit-cli version 0.1.0# All tests
go test ./...
# Specific package
go test ./commands/extract/code-examples -v
# Specific test
go test ./commands/extract/code-examples -run TestLiteralIncludeDirective -v
# With coverage
go test ./... -coverThis project follows Semantic Versioning (SemVer):
- MAJOR version (X.0.0): Incompatible API changes or breaking changes to command behavior
- MINOR version (0.X.0): New functionality added in a backward-compatible manner
- PATCH version (0.0.X): Backward-compatible bug fixes
Note: While in 0.x.x versions, breaking changes may occur in minor releases. Version 1.0.0 will signal a stable, production-ready release.
-
MAJOR (e.g., 0.5.0 → 1.0.0):
- Breaking changes to command syntax or flags
- Removal of commands or features
- Changes to output format that break existing scripts
- First stable release (0.x.x → 1.0.0)
-
MINOR (e.g., 0.1.0 → 0.2.0):
- New commands or subcommands
- New flags or options
- New RST directive support
- New output formats (when existing formats remain unchanged)
- Significant new features
-
PATCH (e.g., 0.1.0 → 0.1.1):
- Bug fixes
- Performance improvements
- Documentation updates
- Internal refactoring with no user-facing changes
When releasing a new version, follow these steps:
-
Update the version constant in
main.go:const version = "0.2.0" // Update this line
-
Update CHANGELOG.md following the Keep a Changelog format:
## [0.2.0] - YYYY-MM-DD ### Added - New feature descriptions ### Changed - Modified behavior descriptions ### Fixed - Bug fix descriptions ### Removed - Removed feature descriptions (if any)
-
Test the version output:
go run main.go --version # Should display: audit-cli version 0.2.0 -
Commit the changes:
git add main.go CHANGELOG.md git commit -m "Release version 0.2.0" -
Tag the release (optional but recommended):
git tag v0.2.0 git push origin v0.2.0
The CHANGELOG.md follows the Keep a Changelog format with these sections:
- Added: New features, commands, or capabilities
- Changed: Changes to existing functionality
- Deprecated: Features that will be removed in future versions
- Removed: Features that have been removed
- Fixed: Bug fixes
- Security: Security-related changes
Each version entry should include:
- Version number in square brackets:
[0.2.0] - Release date in ISO format:
YYYY-MM-DD - Organized sections with bullet points describing changes
- User-facing language (avoid technical jargon when possible)
Parent-Subcommand Pattern: All commands follow a two-level hierarchy:
- Parent command (e.g.,
extract,analyze) - defined incommands/{parent}/{parent}.go - Subcommand (e.g.,
code-examples,procedures) - defined incommands/{parent}/{subcommand}/{subcommand}.go
File Organization per Subcommand:
commands/{parent}/{subcommand}/
├── {subcommand}.go # Command definition and RunE function
├── {subcommand}_test.go # Tests
├── types.go # Type definitions
├── parser.go or analyzer.go # Core logic
├── output.go or report.go # Output formatting
└── (other domain-specific files)
Command Registration: Parent commands register subcommands in their New{Parent}Command() function:
func NewExtractCommand() *cobra.Command {
cmd := &cobra.Command{Use: "extract", Short: "..."}
cmd.AddCommand(codeexamples.NewCodeExamplesCommand())
cmd.AddCommand(procedures.NewProceduresCommand())
return cmd
}- Use
fmt.Errorf()for error wrapping with context - Return errors from functions; handle at command level
- Print errors to stderr using
fmt.Fprintf(os.Stderr, ...) - Exit with non-zero status on errors
Test Data Location: testdata/ directory (auto-ignored by Go build)
- Input files:
testdata/input-files/source/ - Expected output:
testdata/expected-output/ - Relative path from test:
filepath.Join("..", "..", "..", "testdata")
Test Patterns:
- Table-driven tests for multiple scenarios
- Temporary directories for output:
os.MkdirTemp("", "test-*") - Clean up with
defer os.RemoveAll(tempDir) - Byte-for-byte content comparison for extracted files
Deterministic Testing: Critical for procedure parsing
- All map iterations must be sorted
- Content hashing must use sorted keys
- Run tests multiple times to verify determinism
Path Resolution:
- Use
filepath.Join()for cross-platform paths - Use
filepath.Abs()to get absolute paths - Use
internal/projectinfofor MongoDB-specific path resolution
RST Parsing:
- Use
internal/rstpackage for directive parsing - Use regex patterns from
internal/rst/directive_regex.go - Handle include directives with
ParseFileWithIncludes()
Output Formatting:
- Separate output logic into
output.goorreport.go - Support multiple output formats (text, JSON) where applicable
- Use consistent formatting (headers with
=separators, indentation)
All network requests to external APIs should implement caching to avoid repeated requests and support offline usage. The caching pattern is implemented in internal/config/url_mapping.go (for Snooty Data API) and internal/rst/rstspec.go (for rstspec.toml).
Cache Location: ~/.audit-cli/ directory
- URL mapping cache:
~/.audit-cli/url-mapping-cache.json - Rstspec cache:
~/.audit-cli/rstspec-cache.json
Cache TTL: 24 hours (configurable per cache type)
Implementation Pattern:
- Define cache constants:
const CacheTTL = 24 * time.Hour
const CacheDir = ".audit-cli"
const CacheFileName = "my-cache.json"- Create cache struct with timestamp and data:
type MyCache struct {
Timestamp time.Time `json:"timestamp"`
Data MyData `json:"data"`
}- Implement cache functions:
// getCachePath returns the path to the cache file
func getCachePath() (string, error) {
homeDir, err := os.UserHomeDir()
if err != nil {
return "", fmt.Errorf("failed to get home directory: %w", err)
}
return filepath.Join(homeDir, CacheDir, CacheFileName), nil
}
// loadCache loads from cache, returns error if missing or expired
func loadCache() (*MyData, error) {
// Read file, unmarshal JSON, check TTL
}
// saveCache saves data to cache with current timestamp
func saveCache(data *MyData) error {
// Create directory if needed, marshal JSON, write file
}
// fetchFromAPI fetches fresh data from the network
func fetchFromAPI() (*MyData, error) {
// HTTP request, parse response
}- Main fetch function with fallback logic:
func FetchData() (*MyData, error) {
// Try cache first
data, err := loadCache()
if err == nil {
return data, nil
}
// Cache miss or expired, try network
data, fetchErr := fetchFromAPI()
if fetchErr != nil {
// Network failed - try expired cache as offline fallback
// (read cache file without TTL check)
if expiredData := loadExpiredCache(); expiredData != nil {
fmt.Fprintf(os.Stderr, "Warning: Using expired cache\n")
return expiredData, nil
}
return nil, fetchErr
}
// Save to cache for next time
if saveErr := saveCache(data); saveErr != nil {
fmt.Fprintf(os.Stderr, "Warning: Could not save cache: %v\n", saveErr)
}
return data, nil
}Key Behaviors:
- Cache is stored in user's home directory for persistence across sessions
- Expired cache is used as fallback when network is unavailable (offline support)
- Cache save failures are logged as warnings but don't fail the operation
- JSON format for easy debugging and human readability
When Adding New Network Calls:
- Follow the pattern above
- Add cache file name constant
- Implement the four cache functions
- Use the same
~/.audit-cli/directory for consistency - Consider appropriate TTL (24 hours is default, adjust if data changes more/less frequently)
Incremental, Opportunistic Parsing: RST parsing capabilities are added incrementally as needed by various commands, rather than using a complete AST-based parser.
Rationale:
- MongoDB documentation uses many custom directives not supported by standard reStructuredText parsing libraries
- A complete parser converting RST to an AST would require significant operational overhead that is not needed at this time
- Targeted parsing for specific directives is more maintainable and performant for this use case
Critical Rule: All new RST parsing functionality MUST be added to the internal/rst module, NOT to individual command modules. This ensures:
- Parsing capabilities can be reused across commands
- Functionality can be expanded incrementally
- Parsing logic remains centralized and maintainable
Implementation Pattern:
- Add regex patterns to
internal/rst/directive_regex.go - Add directive types and parsing logic to
internal/rst/directive_parser.go - Add specialized parsing functions (e.g.,
parse_procedures.go) as separate files ininternal/rst - Commands import and use functions from
internal/rstpackage
Uniqueness: Procedures are grouped by heading + content hash
- Same heading, different content → separate procedures
- Same content, multiple selections → one procedure with multiple appearances
Analysis vs. Extraction:
- Analysis: Groups procedures by heading, shows variations
- Extraction: Creates one file per unique procedure (by content hash)
Determinism: All operations must produce consistent results
- Sort all map iterations
- Use sorted keys for hashing
- Critical for testing and CI/CD
See docs/PROCEDURE_PARSING.md for comprehensive details.
Context-Aware Expansion:
- No composable tutorial → Expand all includes globally
- Composable tutorial with selected-content in main file → Expand includes within blocks
- Composable tutorial with includes in steps → Expand to detect selected-content blocks
Auto-Discovery: Automatically detects product directory and available versions from file path
- Pattern:
/path/to/{product}/{version}/source/... - Discovers all sibling version directories unless
--versionsspecified
mkdir -p commands/{parent}/{subcommand}Create commands/{parent}/{subcommand}/{subcommand}.go:
package subcommand
import (
"github.com/spf13/cobra"
)
func NewSubcommandCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "subcommand",
Short: "Brief description",
Long: `Detailed description`,
Args: cobra.ExactArgs(1), // or cobra.NoArgs, etc.
RunE: runSubcommand,
}
// Add flags
cmd.Flags().StringP("output", "o", "./output", "Output directory")
return cmd
}
func runSubcommand(cmd *cobra.Command, args []string) error {
// Implementation
return nil
}types.go- Type definitions{subcommand}_test.go- Tests- Domain-specific files (parser, analyzer, output, etc.)
In commands/{parent}/{parent}.go:
import "github.com/mongodb/code-example-tooling/audit-cli/commands/{parent}/{subcommand}"
func New{Parent}Command() *cobra.Command {
cmd := &cobra.Command{...}
cmd.AddCommand(subcommand.NewSubcommandCommand())
return cmd
}Create test fixtures in testdata/ and write tests following existing patterns.
- Add regex pattern to
internal/rst/directive_regex.go:
var MyDirectiveRegex = regexp.MustCompile(`^\.\.\s+my-directive::\s*(.*)$`)- Add directive type to
internal/rst/directive_parser.go:
const (
MyDirective DirectiveType = "my-directive"
)-
Add parsing logic in
ParseDirectives()function -
Add tests in appropriate test file
When changing parsing logic:
# Regenerate expected output
./bin/audit-cli extract code-examples testdata/input-files/source/literalinclude-test.rst \
-o testdata/expected-output
# Verify changes are correct before committing
git diff testdata/expected-output/For count tested-examples command:
- Add product to valid products list in
commands/count/tested-examples/counter.go - Update README.md with new product
- Add test data to
testdata/count-test-monorepo/content/code-examples/tested/ - Update tests in
tested_examples_test.go
- Always check existing patterns - Look at similar commands/functions before implementing new ones
- Maintain consistency - Follow the established file organization and naming conventions
- Update tests - All changes should include corresponding test updates
- Check determinism - For procedure parsing, verify output is consistent across runs
- Update documentation - Keep README.md, AGENTS.md, and PROCEDURE_PARSING.md in sync with code changes
- Map iteration order - Always sort map keys before iterating (especially in procedure parsing)
- Path separators - Use
filepath.Join()instead of string concatenation - Relative paths in tests - Remember tests are 3 levels deep:
commands/{parent}/{subcommand}/ - Include directive resolution - Use
internal/projectinfofor MongoDB-specific path conventions - Testdata directory - This is a special Go convention; files here are ignored during builds
- Unit tests for all new functions
- Integration tests for command execution
- Determinism tests for procedure parsing
- Table-driven tests for multiple scenarios
- Test coverage should not decrease
- Package-level comments for all packages
- Function comments for exported functions
- README.md updates for user-facing changes
- PROCEDURE_PARSING.md updates for procedure parsing logic changes
- Inline comments for complex logic
- README.md: Comprehensive user documentation and development guide
- docs/PROCEDURE_PARSING.md: Detailed procedure parsing business logic
- Go Cobra Documentation: https://github.com/spf13/cobra
- MongoDB RST Conventions: See examples in
testdata/input-files/source/
- Commands:
{subcommand}.go - Tests:
{subcommand}_test.go - Types:
types.go - Core logic:
parser.go,analyzer.go,counter.go, etc. - Output:
output.go,report.go
import "github.com/grove-platform/audit-cli/{package}"# By package
go test ./internal/rst -v
# By function name
go test ./commands/extract/procedures -run TestParseFileDeterministic -v
# With race detection
go test ./... -raceNone currently used in this project.
None currently used in this project.
The project uses GitHub Actions to automatically run tests on pull requests and pushes to main.
Workflow file: .github/workflows/run-tests.yml
What it does:
- Runs on all PRs to
mainand pushes tomain - Sets up Go 1.24
- Caches Go modules for faster builds
- Runs all tests with race detection:
go test ./... -v -race -coverprofile=coverage.out - Displays test coverage summary
Viewing results:
- Check the "Actions" tab in GitHub to see workflow runs
- Each PR will show a green checkmark or red X indicating test status
- Click on the workflow run to see detailed test output and coverage
Local testing before pushing:
# Run the same tests that CI runs
go test ./... -v -race -coverprofile=coverage.out
# View coverage summary
go tool cover -func=coverage.out | tail -1