Skip to content

ci: gate docstring quality and coverage in CI (#616)#3

Open
planetf1 wants to merge 61 commits intomainfrom
ci/doc-quality-gate-616
Open

ci: gate docstring quality and coverage in CI (#616)#3
planetf1 wants to merge 61 commits intomainfrom
ci/doc-quality-gate-616

Conversation

@planetf1
Copy link
Copy Markdown
Owner

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Adds a hard-fail docstring quality gate to the docs-publish workflow (--quality --fail-on-quality --threshold 100). Both checks currently pass in CI (100% coverage, 0 quality issues).

Also adds a typeddict_mismatch scanner to audit_coverage.py — flags Attributes: sections on TypedDict classes that document phantom fields or omit declared ones (mirrors the existing param_mismatch logic for functions).

Pre-commit hook updated to use --fail-on-quality; stays stages: [manual] since it requires pre-built docs. CI is the hard gate.

Contribution docs updated with TypedDict docstring requirements and the two new check kinds.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

ajbozarth and others added 30 commits March 10, 2026 13:16
…ive-computing#563)

* feat: add token usage counter metrics

Add mellea.llm.tokens.input/output counters following Gen-AI semantic conventions with zero overhead when disabled

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* feat: integrate token metrics into OpenAI, Ollama, WatsonX, and LiteLLM backends

Add record_token_usage_metrics() calls to all backend post_processing methods to track input/output tokens. Add get_value() helper in backends/utils.py to handle dict/object attribute extraction.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* feat: add token metrics to HuggingFace backend

Calculate token counts from input_ids and output sequences. Records to both tracing spans and metrics using helper function.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: add token metrics integration tests for all backends

- Add integration tests for Ollama, OpenAI, LiteLLM, HuggingFace, WatsonX
- Tests revealed metrics were coupled with tracing (architectural issue)
- Fixed: Metrics now record independently of tracing spans
- WatsonX: Store full response to preserve usage information
- HuggingFace: Add zero-overhead guard, optimize test model

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* fix: use module-scoped fixture to prevent tracer provider reinitialization

Use MonkeyPatch for cleanup and update Watsonx to granite-4-h-small.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* docs: add token usage metrics documentation and examples

- Add Token Usage Metrics section to docs/dev/telemetry.md with metric
  definitions, backend support table, and configuration examples
- Create metrics_example.py demonstrating token tracking with tested
  console output
- Update telemetry_example.py to reference new metrics example
- Update examples/telemetry/README.md with metrics quick start guide

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* fix: lazy import is_metrics_enabled in backends

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: add streaming token metrics test and document timing

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* refactor: consolidate duplicate get_value function

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* feat: add streaming token usage metrics support

Enable token metrics for streaming responses in OpenAI and LiteLLM backends.
Parametrize backend tests for streaming/non-streaming coverage.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: update to non-deprecated Granite 4 hybrid models

- Replace llama3.2:1b with granite4:micro-h in telemetry tests
- Replace deprecated granite-4.0-micro with granite-4.0-h-micro in HF tests
- Use model constants instead of hardcoded strings
- Remove redundant gh_run checks (rely on pytest markers)

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* style: apply ruff formatting to test signatures

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: skip HuggingFace test in CI (requires model download)

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: add unit tests and reorganize telemetry tests

Add 4 unit tests for record_token_usage_metrics() in test_metrics_token.py.

Split test_backend_telemetry.py into focused modules:
- test_tracing_backend.py: backend tracing integration tests
- test_metrics_backend.py: backend token metrics integration tests
- test_metrics_token.py: unit tests for record_token_usage_metrics()

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* doc: addressed review comments

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Co-authored-by: jakelorocco <59755218+jakelorocco@users.noreply.github.com>
…ting#569)

* use larger model and change jinja ref for decompose example

* llm as default constraint strategy if none

* add comment stating 8b is needed for tags

* adds default llm to constraint validation strategies
…#582)

* docs: add initial specification for extensibility hooks in Mellea

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: update hook system spec to factor component hooks and address design drifts

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: add clarifications for component hook payload fields and additional suggestions by maintainers

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: add implementation plan

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: update implementation plan

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: minor cleanups to implementation plan

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update to reflect programmatic and functional-first design

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: specify hook payload write protection

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: add optional dependency for plugin framework

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: implemented hook system and initial set of hook types

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: add plugin examples

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: update examples to use MelleaHookType enum

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: add PluginMode enum

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: drop estimated_tokens from generation_pre_call payload

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: add context manager  block support for plugins and plugin sets

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: update hook system specification to document with-block support

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: removed unused imports

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: implement tool hooks

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update example for tool call hooks

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: tune internal log levels for clarity

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: tune internal log levels for clarity

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: updated spec with not implemented payload fields

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: minor implementaiton bugs and tests

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* Update hook_system.md

Added implementation priorities to Hook Table.

* refactor: use cpex package; update handling of modified_payloads

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: update lock file

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: bump cpex version to 0.1.0.dev2

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: mode semantics

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: implemented fire_and_forget mode

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update execution mode map

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update plugin modes and specs

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update examples with concurrent hooks

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: update examples

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: refine has_plugins to accept hook type

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: update cpex version

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: cleanup

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: tool call hook types

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: tool hooks example; payload mutation handling

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: PR review comments

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: renamed dependency group from cpex to hooks

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: lint and formatting fixes

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: mypy and remaining lint issues

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: updated specs to reflect implementation changes

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: backend generate_from_context wrapper

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: generation pre call

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: generation_post_call hook placement

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: added modify result object, unregister function, other cleanups

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* feat: improve handling of generate_post_call

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: previously existing mypy issues (can be cherry picked to fix main branch)

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: improves invoke_hook function; deduplicate context and other objects passed to plugins

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: update hook specs

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: drop backend_kwargs from session_pre_init payload

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: improvements and bug fixes

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: remove unimplemented generation_stream_chunk from hook type enum

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: regression introduced with weakrefs

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: cleanup

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: add tutorial-style examples for plugins

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* chore: fix formatting issues in new examples

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: drop weakrefs, unwrap session in payloads, and refactor execution modes

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: clarify payload mutability approach in docstring

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: update hook spec to document payload mutability approach

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: set default to silence plugin errors; added acceptance tests for plugins

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* fix: minor regressions in examples

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: initial user docs for plugins

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: converted a few writable fields into observe-only fields in hook payloads

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* docs: minor updates and nits to the plugin docs

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

* refactor: move pre and post gen hooks

* refactor: modify plugin fixture and fix tests

* refactor: refactor tests for hook_call sites

* refactor: move to local imports for hooks; fix pre-commit issues

* fix: add back type error ignore

---------

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Co-authored-by: Hendrik Strobelt <HendrikStrobelt@users.noreply.github.com>
Co-authored-by: Jake LoRocco <jake.lorocco@ibm.com>
* fix: guarding optional imports for hooks

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

* fix: issues with mellea when hooks not installed

---------

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Co-authored-by: Jake LoRocco <jake.lorocco@ibm.com>
- astream() on computed MOT now raises error
- the last astream call does NOT contain the full text again. All astream() chunks concatenated will be the full text.

Related tests are added/modified.
* fix: hf metrics tests run out of memory

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: add requires_heavy_ram marker to HuggingFace backend tests

Add @pytest.mark.requires_heavy_ram to tests that instantiate LocalHFBackend
to address memory leak issues when running these tests in pytest. This ensures
tests are skipped on systems without sufficient RAM.

Changes:
- test/telemetry/test_metrics_backend.py: Added marker to test_huggingface_token_metrics_integration
- test/stdlib/test_spans.py: Added marker to module-level pytestmark

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
…erative-computing#605)

- Add --isolate-heavy CLI flag for explicit GPU isolation
- Add @pytest.mark.requires_gpu_isolation marker
- Rewrite pytest_collection_finish with 4-guard architecture
- Fix test discovery (pytest --collect-only now works instantly)
- Apply markers to all 4 heavy GPU test files
- Fix failure propagation from subprocesses
- Update documentation for new markers and flags

Fixes generative-computing#604
…eus) (generative-computing#610)

* feat: add configurable OTLP metrics exporter

- Add MELLEA_METRICS_OTLP env var for explicit enablement
- Support metrics-specific endpoint via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
- Add configurable export interval via MELLEA_METRICS_EXPORT_INTERVAL
- Add error handling and validation with helpful warnings

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* test: add unit tests for OTLP exporter enhancements

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* feat(telemetry): add Prometheus metrics exporter with tests

Add Prometheus exporter support with HTTP endpoint for metrics scraping.
Includes comprehensive unit tests and uses standard OpenTelemetry env vars.

Also updates previous OTLP implementation to use standard OTEL_METRIC_EXPORT_INTERVAL
(milliseconds) instead of custom MELLEA_METRICS_EXPORT_INTERVAL (seconds).

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* docs: add comprehensive metrics exporter documentation

- Updated docs/dev/telemetry.md with detailed configuration for Console, OTLP, and Prometheus exporters
- Added troubleshooting section for common metrics issues
- Enhanced mellea/telemetry/metrics.py module docstring with exporter examples
- Updated docs/examples/telemetry/metrics_example.py with configuration examples for all exporters

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* fix: addressed review

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* fix(telemetry): remove HTTP server from library, use registry-based Prometheus

The library should not start an HTTP server for Prometheus scraping.
Instead, register metrics with the prometheus_client default registry
via PrometheusMetricReader and let the application expose the endpoint.

- Replace OTEL_EXPORTER_PROMETHEUS_PORT/HOST with MELLEA_METRICS_PROMETHEUS
- Remove start_http_server() call from metrics module
- Update example to show application-side server startup and keep-alive
- Update docs with registry-based approach and framework examples
- Add missing env vars to telemetry __init__.py docstring
- Clean up tests: remove port/server mocking, unused imports

* fix: update docs/examples/telemetry/metrics_example.py

Co-authored-by: Paul Schweigert <paul@paulschweigert.com>

* fix: github ui commits skip pre-format

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Co-authored-by: Paul Schweigert <paul@paulschweigert.com>
…ng#609)

* Fix flaky test

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* Add test cases for intrinsics formatters

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* First step of manual merge

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* xfail tests to work around known CI issues

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* Make failing test produce useful output on CI

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* More CI debugging

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

---------

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
* docs: API docs pipeline improvements

- Fix MDX parse errors: wrap bare doctest blocks, escape {/} outside fences
- Fix GitHub source links to use versioned tag format
- Richer landing page with prose bullet list and module descriptions
- Add --source-dir support to build.py and audit_coverage.py
- Add --quality docstring audit (8 issue categories, 322/322 pass)
- Add --orphans nav audit for MDX files absent from docs.json
- Add --fail-on-quality flag for CI/pre-commit hard gate
- Emit GitHub Actions annotations when run in CI (GITHUB_ACTIONS=true)
- Fix *args/**kwargs forwarder exemption using Griffe ParameterKind
- Remove generated API docs from version control; add to .gitignore
- Add poethepoet dev tasks (apidocs, apidocs-quality, apidocs-preview, etc)
- Disable CI workflow pending branch strategy decision (see PR generative-computing#611)
- Remove redundant files (requirements.txt, stray diagnostic output)

Fixes generative-computing#532

* chore: remove 'Made with Bob' attribution comments from docs-autogen tooling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…n repair messages (generative-computing#633)

* fix: update multiturnstrategy repair message

append validation failure reasons to the repair message for MultiTurnStrategy

Fixes generative-computing#631

Signed-off-by: va <va@us.ibm.com>

* docs: update docstring to be correct

---------

Signed-off-by: va <va@us.ibm.com>
Co-authored-by: jakelorocco <59755218+jakelorocco@users.noreply.github.com>
…-computing#619)

Adds return type and parameter annotations to ~50 functions across the
14 high-priority public API files identified in issue generative-computing#615.

Also fixes two latent type bugs uncovered during annotation:
- stdlib/requirements/md.py: _md_list/_md_table were returning bare bool
  where Requirement.validate() expected ValidationResult; wrap in
  ValidationResult(result=...)
- stdlib/tools/interpreter.py: add narrowing asserts for stdout/stderr
  to confirm the -> str return type that the top-level assert guarantees

Closes generative-computing#615
) (generative-computing#601)

* docs: Phase 0 infrastructure + getting-started.md

- CONTRIBUTING.md: writing conventions, PR checklist, code block
  runability rule, Backend note callout type
- .markdownlint.json: fix MD025 front_matter_title so body H1 is
  allowed alongside YAML frontmatter title
- getting-started.md: full tutorial page — install, hello world,
  user variables, requirements, core concepts, troubleshooting
- glossary.md: skeleton in place

* docs: Phase 1.2 — the-instruction-model.md

Full how-to page covering instruct(), user variables, requirements,
custom validation functions (req/check/simple_validate), sampling
strategies + IVR loop, grounding context, images, ChatContext,
and chat() vs instruct() comparison. Imports verified against source.
One inline review note on icl_examples API pending verification.

* docs: Phase 1.3 — backends-and-configuration.md

Covers Ollama (default), OpenAI-compatible, LiteLLM, HuggingFace, and
WatsonX backends. ModelOption constants table, system prompt pattern,
direct backend construction. Backend note callouts on each provider.
Imports verified against source.

* docs: Phase 2.1 — generative-functions.md

Covers @Generative decorator, Literal type constraints, Pydantic
structured output, pre/post-conditions (PreconditionException),
composing generative pipelines, and chain-of-thought pattern.
Imports verified against source.

* docs: Phase 2.2 — tools-and-agents.md

Covers @tool decorator, MelleaTool.from_callable/from_langchain/
from_smolagents, ModelOption.TOOLS, uses_tool, tool_arg_validator,
react() agentic loop with structured output, code_interpreter.
Incorporates agent definition and ReACT context from old agents.mdx.
Imports verified against source (react is async).

* docs: Phase 2.3 — working-with-data.md

Covers grounding context, RAG with FAISS + generative filtering,
@mify / MObject pattern (query/transform, ad-hoc mify, custom
stringify, funcs_include), and RichDocument with PDF parsing and
table extraction. Incorporates content from mobjects.mdx and
generative-slots.mdx. Imports verified against CI examples.

* docs: Phase 2.4 — intrinsics.md

Covers all RAG intrinsic operations: answerability, context relevance,
hallucination detection, answer relevance rewriting, query rewriting,
citations, and direct Intrinsic/GraniteCommonAdapter usage.
Backend note callout on HF requirement. Imports verified against source.
Note: adapters.mdx content (tool calling) already covered in tools-and-agents.md.

* docs: Phase 2.5 — sampling-strategies.md

Covers RejectionSamplingStrategy (with SamplingResult inspection),
validation feedback via ValidationResult.reason, SOFAISamplingStrategy
dual-model escalation with s2_solver_mode table, BudgetForcingSamplingStrategy,
and MajorityVotingStrategyForMath. Review notes on budget forcing and
majority voting exports/parameters.

* docs: Phase 2.6 — async-and-streaming.md

Covers async/sync method table, parallel generation with ModelOutputThunk,
wait_for_all_mots, streaming with ModelOption.STREAM + astream(), and
context warnings for concurrent ChatContext use. Imports verified.

* docs: Phase 2.7 — act-and-aact.md

Covers three abstraction levels (instruct/act/mfuncs), working with
Message and Document, validation + sampling strategies via act(),
structured output with format=, functional API (mfuncs.act/aact),
and aact() async usage. Fixed stale numeric cross-references.

* docs: Phase 4.1 — safety-and-validation.md

Covers GuardianCheck + GuardianRisk (full enum table), custom criteria,
groundedness detection, use as instruct() requirement, and input gate
pattern. Backend note on Guardian model independence. Verified against
CI example docs/examples/safety/guardian.py.

* docs: Phase 4.2 — mcp-integration.md

Covers FastMCP server creation, @mcp.tool decorator, mcp dev UI,
ModelOption in tools, multiple tools in one server. Imports verified
against mcp_example.ipynb CI notebook.

* docs: Phase 4.3 — telemetry.md

Covers two independent OTEL trace scopes (application + backend),
all configuration env vars, start_session() as context manager for
trace lifecycle, console debugging, Jaeger/OTLP export, programmatic
status checks, and metrics API (create_counter/create_histogram).
Verified against mellea/telemetry/__init__.py and telemetry_example.py.
Includes Gen-AI semantic convention attribute tables.

* docs: Phase 4.4 — custom-sessions.md

Covers SimpleContext vs ChatContext, ctx introspection helpers
(last_output/last_turn), session.clone() for context branching,
session.reset(), and extending MelleaSession with a ChatCheckingSession
example. Absorbs content from core-concept/context-management.mdx.
Verified against session.py and creating_a_new_type_of_session.py.

* docs: Phase 5.1 — generative-programming.md

Conceptual page explaining what generative programs are, the
deterministic/stochastic interleaving challenge, requirements as the
core reliability mechanism, failure handling, uncertainty compounding,
context management, and Mellea's position as execution layer (not
orchestrator). Absorbs content from overview/generative-programming.mdx
and overview/mellea-welcome.mdx.

* docs: Phase 5.2 — mellea-core-internals.md

Covers the three core data structures (CBlock, Component,
ModelOutputThunk), six abstraction layers from MelleaSession down to
direct backend.generate_from_context() with lazy thunks, composition
via SimpleComponent, and template/prompt engineering (TemplateFormatter,
TemplateRepresentation, Jinja2 template resolution, model-specific paths).
Verified imports against session_deepdive step files. Absorbs content
from prompt-engineering.mdx.

* docs: Phase 5.3 — troubleshooting.md

Covers installation errors (outlines/Rust, Intel Mac, missing extras),
Ollama connectivity, requirements/sampling diagnosis with
return_sampling_results=True, PreconditionException, react() loop
exhaustion, tool selection debugging, async/event-loop errors,
Jupyter nest_asyncio, and Guardian setup issues.

* docs: Phase B — restructure guide/ into target hierarchy

Reorganises the 18 flat guide/ pages written in Phase A into the
target Diataxis-aligned directory structure:

- getting-started/  installation.md + quickstart.md (split from getting-started.md)
- concepts/         generative-programming.md, instruct-validate-repair.md
- how-to/           use-async-and-streaming.md, use-context-and-sessions.md
- integrations/     mcp-and-m-serve.md
- evaluation-and-observability/  metrics-and-telemetry.md
- advanced/         intrinsics.md, inference-time-scaling.md,
                    security-and-taint-tracking.md, mellea-core-internals.md
- troubleshooting/  common-errors.md
- guide/            generative-functions, tools-and-agents, working-with-data,
                    backends-and-configuration, act-and-aact, glossary (in place)

Updates:
- docs.json: replaces old MDX nav with new hierarchy (9 groups)
- All cross-links updated to new relative paths
- Nav footers updated to match new linear order
- Navbar "Contribution Guide" link updated to /guide/CONTRIBUTING

Old MDX pages (overview/, core-concept/) removed from nav; files kept
on disk until Phase C content is verified complete.

* docs: Phase C.1 — concepts/requirements-system.md

Adds depth page on the Requirements system: Requirement class,
ValidationResult, simple_validate(), req()/check(), check_only/purple-elephant
effect, precondition_requirements + PreconditionException, SamplingResult
inspection, and LLM-as-judge vs custom validator trade-offs.

Updates instruct-validate-repair.md footer and docs.json nav.

* docs: Phase C.2 — concepts/architecture-vs-agents.md

Adds positioning page explaining Mellea as execution layer vs. orchestration
frameworks (LangChain, smolagents). Covers the three adoption paths
(greenfield, leaf-node injection, tool enrichment) with concrete code examples
showing how Mellea functions compose inside smolagents and LangChain.

Updates requirements-system.md footer and docs.json nav.

* docs: Phase C.3 — how-to/enforce-structured-output.md

Adds task-oriented guide for structured output covering @Generative with
Literal/Pydantic return types, instruct(format=...) for dynamic prompts,
content validation on structured output (at_least_n pattern), and
guidance on choosing between the two approaches.

Updates docs.json nav.

* docs: Phase C.4 — how-to/write-custom-verifiers.md

Adds practical guide for writing custom validation functions: full
validation_fn signature, simple_validate shortcut, common patterns
(JSON, Pydantic schema, regex, external API), ValidationResult.score,
composing verifiers, and debugging with SamplingResult.sample_validations.

Updates docs.json nav.

* docs: Phase C.5 — integrations/ollama.md

Adds Ollama integration page covering installation, default setup
(granite4:micro), recommended models table, custom host configuration,
ModelOption usage, vision models, OpenAI-compatible endpoint, and
troubleshooting section.

Updates docs.json nav.

* docs: Phase C.6 — integrations/openai.md

Adds OpenAI integration page covering OpenAI API setup, OpenAI-compatible
local servers (LM Studio, Ollama endpoint, vLLM), vision/multimodal input,
structured output with format=, ModelOption usage, and troubleshooting.

Updates docs.json nav.

* docs: Phase C.7 — tutorials/01-your-first-generative-program.md

Adds the first tutorial: an 8-step walkthrough building a document analysis
pipeline from a single instruct() call through requirements, rejection
sampling, @Generative with Literal and Pydantic, and composition.
Uses a consistent customer feedback example throughout.

Adds Tutorials group to docs.json nav.

* docs: Phase C.8 — concepts/context-and-sessions.md

Adds architecture explanation page covering the Component/Backend/Context/
Session four-layer architecture, SimpleContext vs ChatContext trade-offs,
context window management, session cloning, context inspection, and
why explicit context management matters.

Updates docs.json nav.

* docs: Phase C.9 — evaluation-and-observability/handling-exceptions.md

Adds error handling page covering SamplingResult.success=False patterns,
PreconditionException inspection, ComponentParseError, backend connection
errors, fallback patterns (simpler call, stronger model / SOFAI), and
logging failures.

Updates docs.json nav.

* docs: Phase C.10 — integrations/bedrock-and-watsonx.md

Adds cloud backends page: AWS Bedrock via create_bedrock_mantle_backend
and LiteLLM, IBM WatsonX with WatsonxAIBackend. Covers credentials,
region selection, available models, direct and environment-variable auth,
and troubleshooting for both providers.

Updates docs.json nav.

* docs: Phase C-review fixes — nav footers, code corrections, linting

Fix 9 nav footer mismatches caused by incremental page insertions not
updating adjacent pages: quickstart, generative-programming,
architecture-vs-agents, use-context-and-sessions, write-custom-verifiers,
ollama, openai, metrics-and-telemetry, tutorials/01.

Code fixes:
- requirements-system.md: add missing RejectionSamplingStrategy import
  in precondition example
- bedrock-and-watsonx.md: str(result) for consistency
- instruct-validate-repair.md: correct diataxis to explanation
- tutorials/01: fix stale Full example pointer, remove broken Next link
  to unwritten page 02
- use-context-and-sessions.md: add sidebarTitle to disambiguate from
  concepts page; fix over-heavy prerequisite

Linting:
- Add .markdownlint.json at docs/docs/ level so config covers all
  subdirectories (concepts/, how-to/, integrations/, etc.), not just guide/

* docs: fix README and add reader-facing index.md

README.md had a broken fenced code block (mismatched backticks), a
duplicate Getting Started section, emoji, and a wrong URL pointing to
mellea.ai instead of docs.mellea.ai. Rewritten as a clean contributor
setup guide.

index.md is a new reader-facing landing page for GitHub and non-Mintlify
browsing. Mintlify ignores it (root redirects to getting-started/installation
via docs.json) but GitHub renders it as the directory index.

* docs: expand index.md to show full section structure

* docs: port 4 missing pages from Hendrik's MDX — generative-functions, mobjects-and-mify, configure-model-options, template-formatting

* docs: fix convention violations in 4 new pages (US English, missing import, table spacing)

* docs: update index.md with 4 new pages

* docs: add Core Reference to index.md; cross-link tools-and-agents from generative-functions

* docs: add advanced/lora-and-alora-adapters.md — train and use custom adapters

* docs: fix import errors, deprecated model IDs, nav link, and add Mintlify redirects

- configure-model-options.md: fix ModelOption import path (backends.types → backends);
  replace deprecated IBM_GRANITE_3_2_8B/IBM_GRANITE_4_MICRO_3B with current models
- mobjects-and-mify.md: fix mify/MifiedProtocol import path (stdlib.mify → stdlib.components);
  fix ModelOption import path
- docs.json: fix CONTRIBUTING navbar href to GitHub URL (was unreachable /guide/CONTRIBUTING);
  add feedback.thumbsRating; add redirects for all removed MDX pages to new paths
- CONTRIBUTING.md: add docs writing guide link in Additional Resources

* docs: fix docs badge URL in README (mellea.ai → docs.mellea.ai)

* docs: add m serve section, fix landing page, add GitHub nav index

- mcp-and-m-serve.md: retitle to "MCP and m serve"; add m serve section
  (serve() signature, starting the server, calling the endpoint); fix
  deprecated model IDs; fix nav footer (Previous was wrong page); fix
  MD028/MD024 lint warnings
- index.mdx: new Mintlify landing page with CardGroup layout covering
  core concepts, integrations, and quick-start paths; replaces the plain
  list that was being served at /
- docs/index.md: move GitHub-only nav index out of Mintlify root (to
  docs/ parent) so it no longer overrides the landing page

* docs: revise landing page — closer to original style with updated content

* docs: align landing page with mellea.ai messaging and voice

* docs: fix landing page links — remove non-existent HuggingFace page, add How-To section

* docs: remove oversized logo from landing page — navbar logo is sufficient

* docs: split MCP page, add HuggingFace/vLLM integration, update landing page

Split integrations/mcp-and-m-serve.md into two focused pages:
- integrations/mcp.md — FastMCP tool wrapping for MCP clients
- integrations/m-serve.md — OpenAI-compatible serving with m serve

Add integrations/huggingface-and-vllm.md covering LocalHFBackend
(experimental features: aLoRA, constrained decoding; cuda/mps/cpu auto)
and LocalVLLMBackend (high-throughput batched inference; Linux only).

Update index.mdx: add HuggingFace/vLLM card to Backends section,
fix MCP card link, add subtle Mellea logo.

Update docs.json: nav uses new page slugs, redirect
/integrations/mcp-and-m-serve → /integrations/mcp.

* docs: remove redundant logo from landing page body (navbar logo sufficient)

* docs: fix logo CSS classes — dark/light were inverted

* docs: remove page-body logo (wordmark-only SVG; navbar already shows it)

* docs: add Mellea mushroom mascot to landing page

* docs: fix and expand glossary — correct 5 wrong definitions, add 7 missing terms

* docs: add m decompose guide page; expand glossary with 5 missing terms

* docs: add glossary links on first use; strengthen CONTRIBUTING standard

- Link Mellea-specific terms to glossary on first use across 8 pages:
  quickstart, tutorial/01, concepts/generative-programming,
  concepts/generative-functions, concepts/instruct-validate-repair,
  concepts/requirements-system, concepts/context-and-sessions
- Add external links for Jinja2 and Pydantic on first use
- Expand Requirement glossary entry to document req(), check(), and
  simple_validate() including the prompt-inclusion distinction
- Fix metrics-and-telemetry.md Previous footer (was mcp-and-m-serve, now m-serve)
- CONTRIBUTING.md: formalise glossary link rule with required-terms table
  and add checklist item for glossary links

* docs: add integrations/langchain-and-smolagents.md

Covers two integration patterns:
- MelleaTool.from_langchain() — wrap any LangChain BaseTool for use in Mellea
- MelleaTool.from_smolagents() — wrap smolagents tools (pip install 'mellea[smolagents]')
- Seeding ChatContext from LangChain message history via convert_to_openai_messages

Add to docs.json nav after m-serve; update m-serve and metrics-and-telemetry
nav footers to reflect new page position.

* docs: split bedrock-and-watsonx into separate bedrock.md and watsonx.md

AWS Bedrock and IBM WatsonX are distinct platforms with different auth,
packages, and model IDs. Each now has its own page.

Nav chain: openai → bedrock → watsonx → huggingface-and-vllm
Redirect: /integrations/bedrock-and-watsonx → /integrations/bedrock

* docs: add how-to/use-images-and-vision.md; fix nav footer chain

Covers PIL image input via instruct()/chat(), ImageBlock for OpenAI backend,
multi-turn vision with ChatContext, and backend support matrix. Sources verified
against vision_ollama_chat.py and vision_openai_examples.py examples.

Also fix pre-existing nav bug: ollama.md Previous was pointing to
write-custom-verifiers, skipping configure-model-options entirely.

Nav chain: configure-model-options → use-images-and-vision → ollama

* docs: fix landing page card, add ImageBlock to glossary, improve backend pages

- index.mdx: split single "Bedrock / watsonx" card into separate AWS Bedrock
  and IBM WatsonX cards pointing to the correct split pages
- glossary.md: add ImageBlock entry (used by use-images-and-vision.md)
- bedrock.md: add glossary links for Backend/MelleaSession on first prose use;
  add Vision support section noting image input works via OpenAI-compatible path
- watsonx.md: add glossary links for start_session/Backend on first prose use;
  add Vision support section noting WatsonxAIBackend does not support images

* docs: split huggingface-and-vllm into separate huggingface.md and vllm.md

- Create integrations/huggingface.md (LocalHFBackend, device selection, KV cache,
  aLoRA, vision, troubleshooting)
- Create integrations/vllm.md (LocalVLLMBackend, batched inference, vision, troubleshooting)
- Delete integrations/huggingface-and-vllm.md
- docs.json: replace combined entry with huggingface + vllm; add redirect for old URL
- index.mdx: split single card into separate HuggingFace and vLLM cards
- Update nav footers: watsonx.md Next, mcp.md Previous

* docs: split langchain-and-smolagents into separate langchain.md and smolagents.md

- Create integrations/langchain.md (tool bridging, message history bridge, comparison table)
- Create integrations/smolagents.md (tool bridging, comparison table)
- Delete integrations/langchain-and-smolagents.md
- docs.json: replace combined entry with langchain + smolagents; add redirect for old URL
- Update nav footers: m-serve.md Next, metrics-and-telemetry.md Previous

* docs: reorganise nav — rename Core Reference to Guides, co-locate m-serve, fix section assignments

- Rename "Core Reference" → "Guides" (all 6 pages were diataxis how-to, not reference)
- Move m-serve from Integrations → Guides alongside m-decompose (both first-party CLI tools)
- Move handling-exceptions from Evaluation and Observability → How-To (it's a coding how-to, not observability)
- Reorder Integrations: local (ollama, huggingface, vllm) → cloud (openai, bedrock, watsonx) → protocol/frameworks (mcp, langchain, smolagents)

All 102 nav pages verified to exist on disk.

* docs: float mascot logo left so intro paragraph wraps alongside it

* docs: remove redundant Previous/Next footer nav (Mintlify handles this)

* docs: remove Discord link from landing page

* docs: expand ModelOutputThunk glossary entry with value, async, and streaming details

* docs: remove .md extensions from internal links so Mintlify renders pages correctly

* chore: trigger Mintlify rebuild

* fix: use jsx styles on index.mdx

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

* docs: remove duplicate H1 headings — Mintlify renders frontmatter title automatically

* docs: add 10 new glossary entries and first-use cross-links

* docs: add prefix-caching-and-kv-blocks page, KV smashing + SimpleLRUCache glossary entries

* docs: add tutorials 02-03, LLM-as-a-judge how-to, and new glossary entries

Add three new content pages:
- tutorials/02-mifying-legacy-code: five-step tutorial on @mify — query
  and transform existing Python objects with m.query() and m.transform(),
  stringify_func, fields_include, funcs_include, and ad-hoc mify(obj)
- tutorials/03-using-generative-slots: five-step tutorial on @Generative —
  Literal/Pydantic returns, pipeline composition, ChatContext injection,
  m.reset(), and pre/postcondition validation patterns
- evaluation-and-observability/evaluate-with-llm-as-a-judge: how-to
  covering default LLMaJ behavior, standalone m.validate(), GenerateLog
  capture, purple elephant effect with check(), simple_validate bypass,
  combined checks, and SamplingResult metadata

Also:
- Add all three pages to docs.json nav
- Add GenerateLog, LLM-as-a-judge, and Purple elephant effect to glossary
- Add first-use glossary cross-links and full example pointers in each page

* docs: add 14 new pages, fix nav, update AGENTS.md writing guide

New pages:
- tutorials/04-making-agents-reliable (ReACT, requirements, GuardianCheck)
- how-to/refactor-prompts-with-cli (m decompose workflow)
- how-to/unit-test-generative-code (pytest markers, TestBasedEval)
- integrations/vertex-ai (LiteLLMBackend, vertex_ai/ model strings)
- advanced/custom-components (Component protocol, TemplateRepresentation)
- evaluation-and-observability/opentelemetry-tracing (spans, OTLP, Jaeger)
- examples/index + 4 example pages (data-extraction, legacy-code, rag, telemetry)
- community/contributing-guide, building-extensions, code-of-conduct
- troubleshooting/faq (10 Q&A)

Fixes:
- tutorials/01: broken Next steps links; model-config review note added
- docs.json: handling-exceptions moved to Eval & Observability (was How-To)
- docs.json nav: all new pages registered
- glossary: ComponentParseError, GuardianRisk, GuardianCheck expanded
- AGENTS.md: Section 10 "Writing Docs" added with key conventions

* docs: fix lint, complete review items, add missing strategy docs

- Fix MD012 multiple-blank-lines in 20 files (trailing double blank lines)
- Fix MD028 blank-line-inside-blockquote in smolagents.md
- vertex-ai.md: replace "Hendrik please confirm" review note with
  verified LiteLLM docs — vertex_project/vertex_location keys are correct
  and override env vars at call time
- inference-time-scaling.md: remove two "review needed" notes on
  BudgetForcingSamplingStrategy and MajorityVotingStrategyForMath;
  add source-verified parameter docs for both
- inference-time-scaling.md: add sections for RepairTemplateStrategy,
  MultiTurnStrategy, and BaseSamplingStrategy (all in __all__ but
  previously undocumented)

* docs: add missing glossary entries for new sampling strategies and PythonExecutionReq

- Sampling strategy table: add RepairTemplateStrategy, MultiTurnStrategy,
  MBRDRougeLStrategy, BaseSamplingStrategy, correct MajorityVoting name to
  MajorityVotingStrategyForMath
- Requirement entry: document PythonExecutionReq (code execution validator)
  with import path and key parameters

* chore: delete legacy MDX files — replaced by new docs structure

* ci: add markdownlint docs-lint job to CI

* ci: add markdownlint pre-commit hook for docs/

* docs: refresh landing page cards; add index.mdx reminder to CONTRIBUTING checklist

- Key patterns: swap MCP card for Tools and agents (@tool, MelleaTool, react())
- How-to guides: swap Handling exceptions for Use images and vision
- Backends: add LiteLLM / Vertex AI card
- CONTRIBUTING.md checklist: add item to review landing page cards when adding a major page

* docs: separate contributor vs user content; fix internal references

unit-test-generative-code.md:
- Add single top-of-page callout directing Mellea contributors to
  contributing-guide#testing; remove all other contributor callouts
- Rewrite session fixture using plain OllamaModelBackend (no gh_run)
- Rewrite module markers section as generic user guidance with pyproject.toml snippet
- Rewrite CI strategy section with a user-owned conftest.py pattern (CI=true)
  instead of Mellea's internal CICD=1 convention

traced-generation-loop.md:
- Replace dead internal reference docs/dev/telemetry.md (deleted file)
  with link to user-facing OpenTelemetry Tracing page

mellea-core-internals.md:
- mfuncs async row: "Mellea contributors" → "Advanced users building async pipelines"

template-formatting.md:
- "contributors and advanced users" → "advanced users and library authors"

* docs: improve landing page key patterns and backends grid

- Key patterns: remove 'Generative slots' (concept already in How it works section)
  replace 'Intrinsics and adapters' (too advanced/niche) with:
  - Async and streaming (use-async-and-streaming)
  - Safety checks (GuardianCheck via tutorial 04)
- Backends: add LangChain as 8th card — makes even 4+4 grid

* docs: add streaming/async tutorial; promote to T02, demote mify to T05

Add 02-streaming-and-async.md covering ainstruct(), streaming with
ModelOption.STREAM/astream(), concurrent batch processing with
wait_for_all_mots, mixed parallel/sequential pipelines, and context
behaviour with async.

Rename 02-mifying-legacy-code.md → 05-mifying-legacy-code.md so the
main onboarding path (01 → 02 → 03 → 04) builds from universal async
patterns to agents before introducing the Mellea-specific @mify feature.

Update Tutorial 04 prerequisites to include Tutorial 02, since Step 7
introduces asyncio and react(). Update docs.json nav.

* docs: add RAG how-to; expand examples index to all categories

Add how-to/build-a-rag-pipeline.md covering the full RAG pattern:
embedding and indexing, vector search, @Generative bool relevance filter,
grounding_context for grounded generation, IVR requirements on the answer,
and optional GuardianCheck groundedness verification. Includes a tuning
table and a complete worked example.

Expand examples/index.md from 4 documented examples to a comprehensive
catalogue of all example categories, grouped by area (core concepts, data,
agents, safety, integrations, performance, multimodal, observability,
experimental). Preserves the existing 4 walkthrough pages at the top.

Register build-a-rag-pipeline in docs.json How-To nav group.

* docs: add cross-linking guideline for paired explanation/how-to pages

When a feature has both a concepts/ explanation and a guide/ or how-to/
page, contributors should add a brief cross-link near the top of each so
readers who land on either page can find the other. Adds the guideline
under Diataxis classification and a PR checklist item.

* docs: merge Guides into How-To; rename Advanced to Deep Dives

Removes the Guides nav section — all 6 pages were how-to content
and are now merged into a single How-To section (15 pages total).
Core feature how-tos lead; task-specific how-tos follow.

Moves integrations/m-serve from Guides to the Integrations section
where its path already placed it logically.

Renames Advanced to Deep Dives to signal optional/technical depth
rather than implying a content type distinct from How-To.

* docs: revert Advanced rename — keep as Advanced pending discussion

* docs: add cross-links between paired explanation and how-to pages

Add concept overview / practical usage callouts to the two clear
paired page sets:
- concepts/generative-functions ↔ guide/generative-functions
- concepts/context-and-sessions ↔ how-to/use-context-and-sessions

* docs: nav reorder, glossary additions, and CONTRIBUTING fixes

Move Examples section to position 5 (after How-To, before Integrations)
so runnable code follows concept and how-to content in the learning path.

Glossary: add grounding_context and wait_for_all_mots entries.

Fix CONTRIBUTING guide violations in new pages:
- tutorials/02: add --- footer separator; link ModelOutputThunk,
  start_session()/SimpleContext/ChatContext on first use
- how-to/build-a-rag-pipeline: link @Generative, GuardianCheck,
  grounding_context on first use; change Note to Backend note
  for the Ollama-specific GuardianCheck requirement
- examples/resilient-rag-fallback: link @Generative and grounding_context
  on first use; add missing navigation footer

* docs: correct Navigation footer guideline — Mintlify generates prev/next automatically

* docs: fix footers, add cross-links, and standardise imports

- Fix how-to/build-a-rag-pipeline: ## See also H2 → **See also:** bold footer
- Add RAG how-to card to index.mdx (How-To section, 7 of 8 cards)
- Add paired explanation/how-to cross-links:
  - concepts/requirements-system ↔ how-to/write-custom-verifiers
  - concepts/mobjects-and-mify ↔ tutorials/05-mifying-legacy-code
- Add **See also:** footers to 12 pages missing them:
  guide/act-and-aact, guide/backends-and-configuration,
  guide/generative-functions, guide/m-decompose, guide/tools-and-agents,
  guide/working-with-data, how-to/use-async-and-streaming,
  tutorials/01, tutorials/03 (add --- separator), tutorials/04,
  examples/data-extraction-pipeline, examples/legacy-code-integration
- Convert ## Next steps / ## What to try next H2 headings to **See also:**
  inline format (tutorials/01, tutorials/04) or bold text (examples)
- Standardise import style in build-a-rag-pipeline to match example:
  import mellea → from mellea import generative, start_session

* docs: fix three code correctness issues found in review

1. tutorials/03-using-generative-slots: add missing from typing import Literal
   to step 2 code block — FeedbackAnalysis uses Literal but the import was absent,
   causing a NameError if the block was run standalone.

2. tutorials/02-streaming-and-async: remove dead code from step 4 — FeedbackIssues
   class and extract_issues @Generative function were defined but never called; the
   pipeline used m.ainstruct() directly for extraction instead.

3. examples/resilient-rag-fallback: fix create_index() using global docs instead
   of parameter ds in the documentation page. Code worked by coincidence (always
   called with docs) but would silently ignore any other dataset passed in.

Also removes spurious double --- separator from how-to/build-a-rag-pipeline footer.

Note: the same bug exists in docs/examples/rag/simple_rag_with_filter.py (source
file). That fix is tracked separately — committing the Python file here would
trigger the mypy hook which currently fails on pre-existing optional-dependency
import-not-found errors in mellea/backends/ that are unrelated to this change.

* docs: fix broken links, shell quoting, and add validation tooling

Address reviewer-reported issues from PR generative-computing#601 review:
- Convert 22 relative ../../examples/ links to absolute GitHub URLs
  (Mintlify only serves docs/docs/, so relative paths 404 on the site)
- Fix 5 other broken links (docs.json navbar, CONTRIBUTING placeholders,
  building-extensions API link, glossary docling URL, README escaping)
- Quote all [extras] in pip/uv install commands for zsh compatibility
  (26 instances across 12+ files)
- Fix simple_rag_with_filter.py: encode(docs) → encode(ds) parameter bug

Add review tooling:
- docs/scripts/check_docs.py: standalone validation script (stdlib only)
  checking links, Python code blocks, and shell quoting
- docs/PR601-REVIEW.md: review comment tracker

* docs: add missing imports to tutorial snippets and fix title casing

- tutorials/03: add `from mellea import generative` to Steps 1-3 code blocks
- tutorials/05: add `import mellea` and mify import to Steps 2-5 code blocks
- concepts/generative-functions.md: "functions" → "Functions" in title

Addresses reviewer comments M1-M4 and C3.

* docs: fix missing start_session import in FAQ code blocks

Two FAQ answers imported `generative` but used `start_session()` without
importing it. Found by check_docs.py, not flagged by reviewers.

* docs: fix 5 runtime errors found in PR review (E2, E4, E6, E7, E8)

- E2: Remove unnecessary MelleaTool.from_callable() wrapping — @tool
  decorated functions are already MelleaTool objects
- E4: Fix result.body → result.parsed_repr.body on ModelOutputThunk
- E6: Fix langchain.tools → langchain_core.tools import path
- E7: Fix mellea.stdlib.docs → mellea.stdlib.components.docs import path
- E8: Replace broken Document example with working Message approach;
  filed generative-computing#636 for the underlying Document.parts() bug

* docs: enhance import validation to check full mellea module paths

Previously check_docs.py only validated the top-level package name
(e.g. `mellea` exists), missing incorrect submodule paths like
`mellea.stdlib.docs` (should be `mellea.stdlib.components.docs`).

Now walks the filesystem to verify each dotted component resolves
to a real package directory or .py file. Would have caught E7 from
the PR review mechanically.

* docs: address review comments — installation rewrite, WatsonX deprecation, tutorial cleanup

- C1: reword landing page intro per reviewer suggestion
- C4: remove dead blog link in requirements-system
- C5: document grounding_context arbitrary key convention
- C6: add sample output to tutorial 01 Step 1
- C7: remove duplicate @Generative steps from tutorial 01 (covered in tutorial 03)
- C9: clarify ChatContext deprecation warning in tutorial 02
- E3: add ddgs + langchain-community install note
- E5: add smolagents install note
- E9: add mkdir and model-size warnings to m-decompose
- I2+I3: rewrite installation.md with pip/uv as equals, add mellea[all] note
- I6: replace WatsonX backend section with deprecation notice
- Add deprecation banner to integrations/watsonx.md
- Update PR601-REVIEW.md tracker with root cause analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: bump Python version references from 3.10+ to 3.11+

Upstream merged generative-computing#603 (move off python 3.10), pyproject.toml now requires >=3.11.
Update all doc references to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: fix angle-bracket email parsing error in code-of-conduct

Mintlify treats `<email@example.com>` as JSX/HTML tags, causing a parse
error at line 88. Use markdown link syntax instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Co-authored-by: Paul S. Schweigert <paul@paulschweigert.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add OTLP logging export

Adds OTLP logging handler that exports logs to OpenTelemetry collectors.
Configured via MELLEA_LOGS_OTLP and OTEL_EXPORTER_OTLP_LOGS_ENDPOINT
environment variables. Integrates with existing FancyLogger infrastructure.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* refactor: rename get_otlp_handler to get_otlp_log_handler

Renamed function for clarity to indicate it's specifically for log handling.
Updated all references in telemetry module, core utils, and tests.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
…ative-computing#646)

* docs: implement publishing pipeline (generative-computing#617)

Add docs-publish.yml workflow that builds, validates, and deploys
docs to orphan deployment branches (docs/staging from main,
docs/production from releases). Replaces docs-autogen-pr.yml.

Includes pre-commit hooks for MDX validation and docstring quality,
docs/PUBLISHING.md strategy document, and .gitignore entries for
generated API docs.

Validation is soft-fail by default with strict_validation toggle.
workflow_dispatch supports force_publish for testing.

* docs: use docs/preview as default workflow_dispatch target

Safer default for manual dispatch — testing deploys to docs/preview
rather than docs/staging. The automatic paths (push→staging,
release→production) are unaffected.

* docs: add label-gated preview deployment for PRs

PRs with the 'docs-preview' label deploy to docs/preview branch.
PRs without the label only run build + validation (no deploy).
Also triggers on 'labeled' event so adding the label fires the workflow.

* docs: document preview deployment and docs-preview label

Update PUBLISHING.md with docs/preview branch, label-based PR
deployment, and updated manual dispatch instructions.

* docs: document fork PR limitation and fix markdownlint warnings

Fork PRs can't deploy because GITHUB_TOKEN lacks write access to
upstream. Documented the workaround (manual dispatch or push to
upstream). Fixed MD032 blank line warnings.

* docs: clarify Mintlify must use root path on deployment branches

* docs: rename workflow to Docs, remove duplicate docs-lint from ci.yml

markdownlint is already in Docs/build-and-validate with proper path
filtering. Running it in ci.yml on every PR regardless of changed
files was wasteful and duplicated.

* ci: add job summaries to Docs and code-checks workflows

Each validation step now tees its output to a temp log file. A final
"Write job summary" step in both workflows writes a markdown table to
$GITHUB_STEP_SUMMARY showing pass/fail per sub-check with key stats
(lint issues, coverage %, symbol counts, test pass/fail counts).

Collapsible <details> blocks include the full output for deeper
inspection without cluttering the summary view.

* ci: restore default pytest traceback (revert --tb=short)

* ci: add branch, trigger, PR number and run URL to deploy commit message

* docs: document deploy commit message format in PUBLISHING.md

* docs: rename 'deployment branches' to 'docs branches' per review feedback

* ci: opt into Node.js 24 for actions, add docstring quality to job summary

* ci: run docstring quality in CI, fix false-positive annotation, update action versions

audit_coverage.py emitted 'All documented symbols pass quality checks'
even when --quality was not passed (quality_issues always [] without
the flag). Now emits 'skipped' when --quality is absent, so the
annotation is accurate.

Add --quality to the CI audit step so quality is actually run and
surfaced in the job summary.

Update actions to latest versions (Node.js 24 compatible):
- actions/checkout v4 → v6
- astral-sh/setup-uv v5 → v7
- actions/upload-artifact v4 → v7
- actions/download-artifact v4 → v8

* ci: fix job summary layout and docstring quality presentation

- Split quality row into proper Result/Details columns matching table header
- Strip 'run locally' from annotation message (we run it in CI now)
- Add separate 'Docstring quality details' collapsible extracted from
  the coverage log so quality issues are easy to expand inline

* ci: add job summary to deploy step showing destination branch and status

* ci: remove per-item docstring quality annotations

Individual annotations per missing docstring don't point to diff lines
so they just create noise in the Annotations panel (14+ entries). The
single summary annotation and the job summary collapsible provide
sufficient visibility. Consistent with markdownlint and MDX validation
which only emit one top-level result, not per-issue annotations.

* ci: strip ANSI codes from job summary logs; defer docstring quality pre-commit hook

- Strip ANSI escape codes from captured logs before writing to GITHUB_STEP_SUMMARY
  so collapsible sections render as clean text rather than raw escape sequences
- Move docs-docstring-quality pre-commit hook to stages: [manual] with a TODO
  pointing to generative-computing#616 — Griffe loads the full package (~10s) which is too slow for
  normal commit flow; re-enable once quality issues reach 0
- CI audit still checks all symbols including methods (no --no-methods) so the
  job summary reports the full picture; hard-fail deferred to generative-computing#616

* ci: check all symbols including methods in docstring quality audit

* ci: improve job summary detail rows and fix duplicate quality output

- MDX Validation row now shows per-check error counts (e.g. "1 syntax error(s),
  2 broken link(s)") instead of just PASS/FAIL
- Markdownlint row already showed issue count; API Coverage row shows pct + symbols
- Fix: coverage log is split at the quality section boundary so the docstring
  quality details collapsible no longer duplicates the coverage output

* ci: raise collapsible log limits (100k for quality, 5k for others)

* ci: raise docstring quality collapsible cap to 1MB

* docs: clarify docs branches retain no history between deploys

* docs: add --quality to local audit command to match CI behaviour
* docs: add Qiskit code validation IVR example

Add comprehensive example demonstrating Instruct-Validate-Repair pattern
for Qiskit quantum computing code generation with external validation.

Features:
- Pre-condition validation (prompt and input code)
- Post-condition validation using flake8-qiskit-migration
- Automatic repair loop with detailed error feedback
- Code extraction from markdown blocks
- Real-world use case: fixing deprecated Qiskit APIs

Updated README with example description and requirements.

Co-authored-by: va <va@us.ibm.com>
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* refactor: review feedback

* fix: format code

run ruff format

Signed-off-by: va <va@us.ibm.com>

* docs: add script block and fix uv run command in qiskit example

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* refactor: extract validation helpers into separate module

- Move helper functions to validation_helpers.py
- Reorganize into qiskit_code_validation/ subdirectory
- Prepare structure for additional documentation

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* docs: add README and improve qiskit example configuration

- Add comprehensive README with example prompts and troubleshooting
- Move configuration inline for easier editing
- Update parent README to document subdirectory
- Add 'aer' to codespell ignore list for qiskit_aer package

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* docs: add Future Work section to Qiskit IVR example

Add concise documentation of planned enhancements:
- MultiTurnStrategy integration for conversation-based repair
- Grounding context to enable smaller models

Addresses remaining work items from PR generative-computing#576 review discussion.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Signed-off-by: va <va@us.ibm.com>
Co-authored-by: va <va@us.ibm.com>
Co-authored-by: //va <vabarbosa@users.noreply.github.com>
…mocks (generative-computing#567)

* test: isolate astream_incremental tests from CI

Fixes generative-computing#562

* test: add deterministic mock tests for astream incremental logic

Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.

* test: adapt astream mock tests to upstream incremental semantics

Update tests to match the astream() behavior change from PR generative-computing#618:
- astream() now always returns incremental content (including final call)
- astream() on a computed MOT raises RuntimeError

* chore: trigger CI rebuild
…generative-computing#652)

The dictionary keys are generator expressions, which are not hashable and as a result duplicates were not detected. Using tuple() explicitly creates a hashable, comparable key to properly deduplicate records.

Closes generative-computing#651

Signed-off-by: Yannis Katsis <35782820+yannisk2@users.noreply.github.com>
…generative-computing#614)

* docs: expand module-level docstrings for API reference (generative-computing#612)

Replace one-liner and missing module docstrings across 53 files with
substantive 2-4 sentence descriptions covering each module's purpose,
key exports, and when to reach for it. Covers all three priority tiers
from the issue: __init__.py landing pages, undocumented modules, and
short stubs that merely restated the module name.

* fix: replace EN DASH with hyphen in docstrings (RUF002)

* docs: add Google-style Args/Returns to public functions (generative-computing#613)

Add missing ``Args:`` and ``Returns:`` sections to all public
functions and methods across ``mellea/`` and ``cli/`` that lacked
them. Also converts RST-style ``:param:``/``:returns:`` docstrings
inherited from the ``granite_io`` upstream into Google style to
match the project convention (``pyproject.toml`` ``convention =
"google"``).

* docs: expand docstrings for public classes and functions (generative-computing#613)

Add or expand Google-style docstrings across CLI and library modules:

- cli/alora/train.py: add docstrings to load_dataset_from_json,
  formatting_prompts_func, SaveBestModelCallback, SafeSaveTrainer,
  train_model
- cli/decompose/decompose.py: add class docstring to DecompVersion,
  expand run() with full Args section
- cli/decompose/pipeline.py: add class docstrings to ConstraintResult,
  DecompSubtasksResult, DecompPipelineResult, DecompBackend; add full
  docstring with Args/Returns to decompose()
- cli/decompose/utils.py: add Args/Returns to validate_filename
- cli/eval/commands.py: add full docstring with Args to eval_run
- cli/m.py: expand callback docstring
- mellea/backends/adapters/adapter.py: expand LocalHFAdapter docstring
- mellea/backends/cache.py: expand SimpleLRUCache docstring
- mellea/backends/tools.py: expand SubscriptableBaseModel docstring;
  fix json_extraction to use Returns: instead of Yields:
- mellea/core/backend.py: expand Backend docstring; add Args/Returns
  to generate_walk
- mellea/core/base.py: add Args/Returns to blockify and
  get_images_from_component
- mellea/core/utils.py: expand RESTHandler, JsonFormatter, FancyLogger
  class docstrings
- mellea/helpers/openai_compatible_helpers.py: add Args/Returns to
  message_to_openai_message and messages_to_docs
- mellea/stdlib/components/docs/richdocument.py: expand TableQuery
  and TableTransform class docstrings
- mellea/stdlib/components/genslot.py: expand Argument, Function,
  Arguments, GenerativeSlot class docstrings; add SyncGenerativeSlot
  class docstring
- mellea/stdlib/components/mobject.py: expand Query and Transform
  class docstrings
- mellea/stdlib/requirements/requirement.py: add Args sections to
  req() and check()
- mellea/stdlib/sampling/budget_forcing.py: expand
  BudgetForcingSamplingStrategy class docstring

* docs: add missing Args: sections to class-level docstrings (generative-computing#613)

Add Args: sections to class docstrings that had Attributes: but no
Args:, completing Google-style docstring coverage:

- core/requirement.py: ValidationResult, Requirement
- core/sampling.py: SamplingResult
- core/base.py: CBlock, ImageBlock, ModelOutputThunk, ContextTurn,
  TemplateRepresentation, GenerateLog, ModelToolCall
- formatters/template_formatter.py: TemplateFormatter
- formatters/granite/intrinsics/input.py: IntrinsicsRewriter
- formatters/granite/intrinsics/output.py: TransformationRule,
  TokenToFloat, DecodeSentences, MergeSpans
- formatters/granite/retrievers/embeddings.py: InMemoryRetriever
- stdlib/components/chat.py: Message, ToolMessage
- helpers/async_helpers.py: ClientCache

* docs: remove redundant Attributes: entries and drop TASK.md (generative-computing#613)

Remove Attributes: sections where fields duplicate constructor params
verbatim (same name, same type, same description), keeping only entries
that add genuinely new information (computed/derived attributes):

- mellea/backends/backend.py: FormatterBackend
- mellea/core/base.py: CBlock, ImageBlock, ModelOutputThunk, ContextTurn
- cli/eval/runner.py: InputEvalResult, TestEvalResult (keep computed
  properties passed_count, total_count, pass_rate)

Also removes TASK.md left over from development.

* chore: upgrade mypy to 1.19.1 to fix cpex import-not-found in CI

mypy 1.18.2 reports import-not-found for cpex.framework.* (which has
no py.typed marker). 1.19.1 handles this correctly and was the version
used when PR generative-computing#582 passed CI.

* docs: fix pipeline docstring - call count depends on constraints not fixed at five

* docs: apply Option C docstring convention — Args: on class only, clean Attributes: (generative-computing#613)

- Remove Args: from __init__ docstrings across 60 classes; Args: moved to (or
  already present on) the class docstring so the docs pipeline (which skips
  __init__) and IDE hover both show the full parameter list without duplication
- Review Attributes: on 46 classes: remove pure-echo entries that repeat Args:
  verbatim; retain sections where stored values differ in type or behaviour
  (type transforms such as str→CBlock, class constants such as YAML_NAME,
  computed values such as SamplingResult.result_index)
- audit_coverage.py: drop no_attributes check (now optional by design); add
  duplicate_init_args check to catch Args: appearing in both class and __init__
- CONTRIBUTING.md: add class/__init__ docstring placement section with example
- AGENTS.md: expand Google-style docstring bullet with Option C rule; remove
  duplicate line

* docs: remove isolated venv from generate-ast.py, fix audit_coverage.py (generative-computing#613)

- generate-ast.py: remove isolated .venv-docs-autogen — use the project venv
  (sys.executable) directly; mdxify and mellea are already installed via
  uv sync --all-extras --group dev; keep clean MDXIFY_CWD for import safety;
  remove --no-venv / --pypi-name / --pypi-version args and ensure_venv /
  pip_install helpers; update module docstring
- audit_coverage.py: fix no_class_args to filter variadic *args/**kwargs by
  ParameterKind (matching the existing function check) so SimpleComponent
  (**kwargs) is not falsely flagged; update find_documented_symbols to match
  mdxify 0.2.37 heading format (### `SymbolName`)
- mellea/core/utils.py: add Args: to RESTHandler class docstring (missed in
  Option C sweep); remove pure-echo Attributes: section

* docs: add docstring validation guidance to CONTRIBUTING.md (generative-computing#613)

Add a 'Validating docstrings' subsection under the class docstring
placement rule with:
- audit_coverage.py command to run after generate-ast.py
- Table of key quality checks (no_class_args, duplicate_init_args, etc.)
- Three real classes to hover over in VS Code to verify IDE UX

* docs: add class/__init__ docstring placement rule to docs guide (generative-computing#613)

The docs contribution guide only showed the function docstring pattern.
Add the class docstring placement rule (Option C): Args: on class only,
__init__ gets a summary sentence, Attributes: only for genuine transforms.
Link back to CONTRIBUTING.md for the full validation workflow.

* docs: update nav with plugins/telemetry groups (generative-computing#613)

- docs/docs/docs.json: add plugins and telemetry/logging groups to API
  Reference nav (generated by pipeline run)
- .gitignore: add .venv-docs-autogen/ (leftover from earlier failed run,
  no longer created by generate-ast.py)
- docs/PR614-review-summary.md: add full review summary for agents and
  reviewers covering problem, Option C rationale, changes made, and
  known issues

* fix: remove --no-venv arg from build.py calls to generate-ast.py

generate-ast.py no longer accepts --no-venv (removed when the isolated
venv was dropped in favour of sys.executable). build.py was still
passing it, breaking the docs CI build.

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

---------

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…tive-computing#654) (generative-computing#664)

* docs: fix missing docstring sections in plugins and telemetry (generative-computing#654)

Add Args/Returns/Raises sections to ~15 functions and Args to 4
classes in mellea/plugins/ and mellea/telemetry/tracing.py.
Expand short HookType docstring. Convert RST cross-references
to Google style. Audit now reports 0 quality issues.

* docs: remove spurious Returns from contextmanager generators

trace_application() and trace_backend() are @contextmanager generators
that already have Yields: sections. Remove incorrect Returns: sections
per Google docstring style.
…rative-computing#663)

- Move plugins.mdx from core-concept/ to concepts/ (matches existing
  convention; all other core-concept/ pages already migrated)
- Add to Concepts group in docs.json navigation
- Apply documentation standards: diataxis frontmatter, sentence-case
  headings, markdownlint fixes, See also footer, glossary links on
  first use of Component/Requirement/MelleaSession
- Fix broken internal link (/core-concept/interoperability → /integrations/mcp)
- Add glossary entries: Hook/HookType, Plugin, PluginSet
- Trim docs/dev/hook_system.md to internal design notes only, with
  pointer to user-facing page for usage docs
…inks (generative-computing#658)

RST-style ``Symbol`` in docstrings caused add_cross_references to
generate malformed link syntax: `[`Symbol`](url)` renders as inline
code rather than a clickable link in Mintlify.

- Add normalize_rst_backticks() pass in decorate_api_mdx.py (runs before
  add_cross_references) to convert ``x`` to `x` in MDX prose
- Add validate_rst_docstrings() in validate.py to scan source files and
  report occurrences as a warning (does not fail the build)

91 source files / 992 occurrences detected; fix is applied at build time.
Source cleanup tracked separately.
…ncies) (generative-computing#665)

* docs: pre-release verification pass (generative-computing#645)

- Fix 7 missing Returns sections in core, plugins, stdlib docstrings
- Delete stale docs/index.md and docs/PR601-REVIEW.md artifacts
- Add stale-file detection to validate.py with tests
- Wire stale-file check into docs-publish workflow summary

* docs: add doc import validation and fix stale imports

- Add validate_doc_imports() to check mellea imports in doc code blocks
  resolve at import time (skips optional deps, handles submodule imports)
- Add validate_stale_files() integration with generate_report()
- Fix glossary.md: ChatContext/SimpleContext import paths
- Fix intrinsics.md: GraniteCommonAdapter → CustomIntrinsicAdapter
- Add 3 tests for doc import validation (17/17 pass)

* docs: distinguish generator Yields from Returns in docstring audit

The quality audit incorrectly flagged @contextmanager generators as
missing Returns sections. Generators (Generator, Iterator, etc.) should
use Yields, not Returns. Adds no_yields check kind and
_GENERATOR_RETURN_PATTERNS detection.

Fixes: trace_application/trace_backend no longer flagged for Returns.
New: json_extraction correctly flagged for missing Yields.

* docs: fix Yields section for json_extraction generator docstring
…g#551)

* Add uncertainty/certainty intrinsic support

Wire up the uncertainty intrinsic from ibm-granite/granite-lib-core-r1.0
with a high-level check_certainty() API. The intrinsic evaluates model
confidence in its response given a user question and assistant answer.

- Add check_certainty(context, backend) in core.py
- Extract shared call_intrinsic() helper into _util.py
- Update catalog to point uncertainty at granite-lib-core-r1.0
- Add test, example, and README entry

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Fix uncertainty description in README

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Rename _call_intrinsic to call_intrinsic

Drop the underscore prefix and alias — use call_intrinsic consistently
across _util.py, rag.py, core.py, and test_rag.py.

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Rename GraniteCommonAdapter to IntrinsicAdapter in _util.py

Upstream PR generative-computing#571 renamed GraniteCommonAdapter to IntrinsicAdapter.
Update our _util.py to match before rebasing onto origin/main.

* Feat/requirement_check (#1)

* Add uncertainty/certainty intrinsic support

Wire up the uncertainty intrinsic from ibm-granite/granite-lib-core-r1.0
with a high-level check_certainty() API. The intrinsic evaluates model
confidence in its response given a user question and assistant answer.

- Add check_certainty(context, backend) in core.py
- Extract shared call_intrinsic() helper into _util.py
- Update catalog to point uncertainty at granite-lib-core-r1.0
- Add test, example, and README entry

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Fix uncertainty description in README

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Rename _call_intrinsic to call_intrinsic

Drop the underscore prefix and alias — use call_intrinsic consistently
across _util.py, rag.py, core.py, and test_rag.py.

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* test

* removed test file

* added req check intrinsic

* Update README.md

* Update _util.py

* updated files

---------

Co-authored-by: ink-pad <inkit.padhi@gmail.com>
Co-authored-by: manish-nagireddy <manish.nagireddy@ibm.com>

* fix: add comment on repo structures

* fix: linting

* fix rc repo name

* resolve rc references

* fix: field ref in req check intrinsic

---------

Co-authored-by: inkpad <inkit.padhi@ibm.com>
Co-authored-by: Manish Nagireddy <65432909+mnagired@users.noreply.github.com>
Co-authored-by: manish-nagireddy <manish.nagireddy@ibm.com>
Co-authored-by: jakelorocco <59755218+jakelorocco@users.noreply.github.com>
* docs: removed outdated tutorial.md

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

* dropping additional references to tutorial.md

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

---------

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
generative-computing#670)

* docs: fix MelleaPlugin/MelleaBasePayload missing from API coverage (generative-computing#667)

Replace dual if/else class definitions with dynamic base classes so
Griffe's static AST parser sees a single ClassDef node per class and
always picks up the authoritative docstring.

* docs: convert intrinsic core docstrings to Google style

The Sphinx :param:/:return: style is not recognised by Griffe's
quality audit. Convert to Google Args/Returns sections.
…generative-computing#669)

* feat: add codeowners specifically for the granite-common part of mellea intrinsics

* fix: empty commit to unstuck mergify
…e-computing#645) (generative-computing#672)

* docs: add missing example categories to examples catalogue (generative-computing#645)

Add plugins, m_decompose, tutorial, notebooks, and hello_world to the
examples index page so all runnable example directories are discoverable
from the published docs.

Also add a note to CONTRIBUTING.md reminding authors to update the
catalogue when adding new example directories.

* docs: add examples catalogue validation check to doc validator

Add validate_examples_catalogue() to validate.py that checks every
example directory under docs/examples/ (containing .py files) has a
corresponding entry in docs/docs/examples/index.md.

Also refine the CONTRIBUTING.md guideline to frame it as a check
rather than only an instruction for when adding examples.
…generative-computing#690)

* fix: add support back for older models and requirement_check adapter

* fix: fix model used in test_rag for intrinsics

* fix: remove answer_relevance_classifier and answer_relevance_rewriter
@planetf1 planetf1 force-pushed the ci/doc-quality-gate-616 branch from 19261e3 to 58a7f44 Compare March 18, 2026 16:26
dennislwei and others added 5 commits March 18, 2026 17:08
…e-computing#679)

* Add uncertainty/certainty intrinsic support

Wire up the uncertainty intrinsic from ibm-granite/granite-lib-core-r1.0
with a high-level check_certainty() API. The intrinsic evaluates model
confidence in its response given a user question and assistant answer.

- Add check_certainty(context, backend) in core.py
- Extract shared call_intrinsic() helper into _util.py
- Update catalog to point uncertainty at granite-lib-core-r1.0
- Add test, example, and README entry

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Fix uncertainty description in README

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* Rename _call_intrinsic to call_intrinsic

Drop the underscore prefix and alias — use call_intrinsic consistently
across _util.py, rag.py, core.py, and test_rag.py.

Co-Authored-By: ink-pad <inkit.padhi@gmail.com>

* test

* removed test file

* added req check intrinsic

* feat: extend sentence boundary marking to conversation history with shared index

- Add `index` parameter to `mark_sentence_boundaries()` to allow callers to
  continue numbering across multiple calls; return the next available index
- Add `all_but_last_message` as a valid `sentence_boundaries` key
- Extend `_mark_sentence_boundaries()` to tag prior conversation turns when
  `all_but_last_message` is configured, using a shared running index with
  documents so that each context sentence has a globally unique tag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* feat: extend DecodeSentences to conversation history and multi-source decoding

- Accept `source: str | list[str]` to allow a single DecodeSentences rule to
  decode sentences from multiple locations in one pass
- Add `all_but_last_message` as a valid source, decoding prior conversation
  turns with a running sentence index shared across all sources
- Add optional `message_index` output field that records which conversation
  turn each attributed sentence came from

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* feat: add context-attribution catalog entry and update core repo

- Update _CORE_REPO to "ibm-granite/granitelib-core-r1.0"
- Add context-attribution intrinsic pointing to _CORE_REPO

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* test: add context-attribution test data and formatter tests

- Add input/test_canned_input/test_canned_output/expected_result JSON test data files
- Add YamlJsonCombo entry for context-attribution pointing to
  ibm-granite/granitelib-core-r1.0
- Exclude context-attribution from Ollama inference tests via _NO_OLLAMA_ADAPTER
  since an Ollama LoRA adapter is not yet available on the HF Hub

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* test: update context-attribution test_run_transformers file for mellea

The model consistently produces {"r": 1, "c": [2, 0, 1, 19, 3]} with the
mellea codebase, yielding 7 attribution records rather than the 12 produced
on the granite-common side. Update the expected output accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* feat: add find_context_attributions() API function

Add find_context_attributions() to core.py since the context-attribution
adapter lives in the ibm-granite/granitelib-core-r1.0 repo, but modelled
after find_citations() in rag.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* docs: add context-attribution example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

* test: add test_find_context_attributions and test files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>

---------

Signed-off-by: Dennis Wei <dwei@us.ibm.com>
Co-authored-by: inkpad <inkit.padhi@ibm.com>
Co-authored-by: ink-pad <inkit.padhi@gmail.com>
Co-authored-by: manish-nagireddy <manish.nagireddy@ibm.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…puting#678)

* Feat/guardianlib (generative-computing#8)

* Update README.md

Added policy_guardrails intrinsic

* Update catalog.py

Added policy guardrails

* Create policy_guardrails.json

Initial checkin

* Create test_guardian.py

initial checkin

* Create policy_guardrails.py

Initial check in

* Create guardian.py

Adding policy_guardrails intrinsic

* Update guardian.py

Fixed method call (call_intrinsic instead of _call_intrinsic)

* feat: add guardian core intrinsic component

* Added factuality examples

* Added unit tests for factuality intrinsics

* Fixed the pre-commit errors

* fix guardian intrinsic names to match HF repo paths

* fix: lint fixes from pre-commit

* Removed duplicated code in correction example

* refactor: remove _call_guardian_intrinsic workaround, use call_intrinsic

* style: ruff format guardian.py

---------

Co-authored-by: Moninder Singh <39064734+monindersingh@users.noreply.github.com>
Co-authored-by: Subhajit Chaudhury <subhajit@ibm.com>
Co-authored-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

* Fixed factuality correction example

* fix: typo in factuality_correction example

* fix: typo in factuality_correction example

* fix: typo in factuality_correction example

---------

Co-authored-by: Moninder Singh <39064734+monindersingh@users.noreply.github.com>
Co-authored-by: Subhajit Chaudhury <subhajit@ibm.com>
Co-authored-by: Radu Marinescu <radu.marinescu@ie.ibm.com>
Co-authored-by: jakelorocco <jake.lorocco@ibm.com>
Co-authored-by: jakelorocco <59755218+jakelorocco@users.noreply.github.com>
…puting#694) (generative-computing#697)

Token count extraction in _post_process_async was gated behind
`span is not None or metrics_enabled`, so mot.usage was never
populated in plain (non-telemetry) runs. Now extracted unconditionally
— usage is a standard mot field, not a telemetry concern.
@planetf1 planetf1 force-pushed the ci/doc-quality-gate-616 branch from 58a7f44 to dfb95c4 Compare March 18, 2026 22:37
code4days and others added 20 commits March 20, 2026 11:17
…ests to run (generative-computing#674)

Adds a Required models section with a list of required ollama models for non qualitative tests to pass
…ed module issues (generative-computing#676)

* upd: clean over long or result files

* add: validation code generation

* add: validation code icl

* add: prompt init

* add: validation decision

* add: decomp jinja

* refact: pipeline and primary stages

* refact: module logging

* fea: validation report

* fea: validation report icl

* fea: cli script with config

* add: examples

* add: examples

* add: README doc

* pre-commit: add test attribute

* upd: type annotations

* fix: add constraint type annotation

* upd: pre-commit format

* upd: pre-commit type annotations

* upd: pre-commit format

* add: m_decompose tests

* add: constraint retry

* upd: constraint

* upd: constraint

* upd: same logmode

* fix: a missed parse

* pre-commit format

* add: multi request support

* clean

* fix: type clean

* upd: input file arg

* fea: constraint retry

* test

* fix

* fix

* fix

* clean: final result

* add: decompose tests

* add: decompose tests

* clean: pre-commit format

* clean: pre-commit formating on decompose tests
…erative-computing#706)

* fix: modify plugin logging to not print

* fix: increase test timeout
Add a hard-fail docstring quality gate to the docs-publish workflow:
- New 'Docstring quality gate' step runs --quality --fail-on-quality
  --threshold 100; fails if any quality issue is found or coverage
  drops below 100% (both currently pass in CI)
- Existing audit_coverage step (soft-fail, threshold 80) retained for
  the summary coverage metric

Add typeddict_mismatch checks to audit_coverage.py:
- typeddict_phantom: Attributes: documents a field not declared in the TypedDict
- typeddict_undocumented: declared field absent from Attributes: section
- Mirrors the existing param_mismatch logic for functions

Pre-commit: enable --fail-on-quality on the manual-stage hook (CI is
the hard gate; hook remains stages: [manual] as docs must be pre-built).

Update CONTRIBUTING.md and docs/docs/guide/CONTRIBUTING.md with TypedDict
docstring requirements and the two new audit check kinds.
… and fix hints

audit_coverage.py:
- Add file/line fields to every issue dict (repo-relative path + def line)
- _print_quality_report now shows [file:line] per issue, per-kind Fix:/Ref:
  hints linking to CONTRIBUTING.md anchors, and emits ::error file=...,line=...
  GHA annotations so issues appear inline in PR diffs
- Cap GHA annotations at 10 per check kind with "N more in job log" notice
- Add _KIND_FIX_HINTS and _gha_file_annotation helpers; _CONTRIB_DOCS_URL constant

validate.py:
- Convert all check functions from list[str] to list[dict] errors (file/line/message)
- Add line-number tracking to validate_source_links, validate_internal_links,
  validate_anchor_collisions, and validate_doc_imports
- Emit per-error GHA annotations with file/line; shared 20-annotation budget
  across all checks so every category gets representation in PR diff
- Fix icon bug: summary rows now use correct pass/fail icon
- Group detailed errors by check type with section headers

docs-publish.yml:
- Add --orphans and --output /tmp/quality_report.json to quality gate step
- Upload quality_report.json as docstring-quality-report artifact (30-day retention)

pyproject.toml:
- cli/**/*.py: suppress only D2/D3/D4xx style rules; enable D1xx (missing
  docstrings) as a ruff-level complement to the audit_coverage quality gate

docs/docs/guide/CONTRIBUTING.md:
- Add CI docstring checks reference section with per-kind tables (fix instructions
  + anchors) for all 11 check kinds across 4 categories
- Add callout explaining GHA annotation cap (10 per kind) and where to find
  the full list (job log + JSON artifact)
…y links

audit_coverage.py (audit_nav_orphans):
- Probe docs/docs/docs.json before docs/mint.json so both Mintlify v1
  and v2 nav configs are supported
- Extend _extract to handle plain string page entries used by docs.json
  (v2 uses "pages": ["api/..."] strings; v1 used {"page": "api/..."} dicts)
- Previously mint.json was never found, nav_refs stayed empty, and every
  MDX file was reported as an orphan

docs-publish.yml (Write job summary):
- When the quality gate fails, render a prominent markdown callout with a
  direct link to the CI docstring checks reference section in CONTRIBUTING.md
- Add a per-kind fix reference table with clickable anchor links to each
  category section (missing/short, args/returns, class Option C, TypedDict)
- Per-kind Ref: URLs in the raw log are inside a text block and do
  not render as links in the step summary; this table surfaces them rendered
…ped notice

docs-publish.yml:
- Parse per-kind counts from _print_quality_report section headers in the
  quality gate log (e.g. "Missing docstrings (12)") and show them as a
  comma-separated breakdown in the Docstring Quality table cell instead of
  just the total — gives developers an immediate view of which categories
  are failing without expanding the log

audit_coverage.py:
- Remove the "skipped (pass --quality to enable)" GHA notice emitted by
  the coverage-only step; there is always a dedicated quality gate step
  immediately after so the notice was misleading and redundant
audit_coverage.py:
- Coverage miss section now shows a structured Fix:/Ref: block with the
  exact generate-ast.py command and a link to CONTRIBUTING.md#validating-docstrings
- Missing symbols listed one per line (symbol indented under module) for
  scannability instead of comma-joined on one long line
- Emit a ::error or ::warning GHA annotation with symbol/module counts
  when coverage symbols are undocumented

validate.py:
- Add _CHECK_FIX_HINTS dict mapping each check label to a (fix text, ref URL)
  pair, covering all 8 check types with specific fix instructions and links
  into CONTRIBUTING.md (root or guide as appropriate)
- _print_check_errors now prints Fix:/Ref: under each section header,
  matching the pattern established by _print_quality_report
audit_coverage.py:
- Add missing_param_type check: fires when Args: section exists but one
  or more concrete params lack Python type annotations; naturally
  non-overlapping with no_args (which fires when section is absent)
- Add missing_return_type check: fires when Returns: section is
  documented but the function has no return annotation; naturally
  non-overlapping with no_returns (annotation exists but section absent)
- Add fix hints and CONTRIBUTING.md anchors for both new check kinds
- Update kind_labels and iteration order in _print_quality_report

generate-ast.py:
- Add remove_internal_modules() post-generation filter step
- Uses AST-based import analysis: a submodule is internal when the
  parent __init__.py imports from at least one sibling submodule but
  not from this one (import-based visibility, not __all__ name-matching)
- Conservative: keeps module when parent imports nothing (indeterminate)
  or __init__.py cannot be parsed
- _CONFIRMED_INTERNAL_MODULES hardcoded set for known internals where
  parent imports nothing (json_util, backend_instrumentation); these
  should eventually be renamed with _ prefix per Python convention
- Package index files (stem == parent dir) are never filtered

docs.json: nav regenerated by build; internal modules removed from nav
CONTRIBUTING.md: add missing_param_type / missing_return_type to CI
  docstring checks reference table
docs-publish.yml: add both new kinds to summary kind_short and
  kind_anchors fix-reference table
audit_coverage.py was walking all non-_-prefixed source modules via
Griffe, including internal modules (json_util, backend_instrumentation,
etc.) whose MDX files were removed by remove_internal_modules() in
generate-ast.py.  This caused coverage to drop because those symbols
were no longer 'documented' but were still counted in the denominator.

Apply the same import-based public-API filter in discover_public_symbols():
skip submodules that the parent __init__.py does not import from, mirroring
the generate-ast.py logic.  _CONFIRMED_INTERNAL_MODULES kept in sync.

Also drop the per-kind anchor table from the GHA job summary.  Anchors
in GitHub Actions summaries only navigate to the top of the referenced
document, so the table added noise without working links.
Adds two new quality check kinds that fire when the type explicitly
stated in an Args:/Returns: docstring entry disagrees with the Python
type annotation in the function signature:

  param_type_mismatch — 'param (OldType): ...' vs annotation 'NewType'
  return_type_mismatch — 'Returns: OldType: ...' vs annotation '-> NewType'

Both checks fire only when BOTH sides have an explicit type; one-sided
absence is already handled by missing_param_type / missing_return_type.

Type comparison uses _types_match() / _normalize_type() which handles:
  - typing aliases: List→list, Dict→dict, Optional→X|None, Union→A|B
  - typing. prefix stripping
  - pipe-union component ordering (str|None == None|str)
  - incidental whitespace

Known conservative suppressions (prefer false negatives over false positives,
since there is no per-site suppression mechanism):
  - Nested generics not fully expandable by regex (e.g. Optional[list[str]])
    are silently skipped — both sides must fully normalise to be compared
  - Union with bracket-containing members
  - Callable argument ordering
…ubmodule

For a module file mellea/pkg/submodule.py, Griffe gives filepath ending in
.py (not __init__.py). The parent __init__.py is fp.parent/__init__.py.
The previous code used fp.parent.parent which is correct for packages
(whose filepath IS the __init__.py) but goes one level too far for plain
module files — it was checking the grandparent init instead of the parent.

Effect: genslot, react, unit_test_eval and similar non-exported modules
in stdlib/components were incorrectly counted as public symbols, inflating
the denominator and lowering the reported coverage percentage.
@planetf1 planetf1 force-pushed the ci/doc-quality-gate-616 branch from 48b32b4 to 4bfcd0a Compare March 20, 2026 14:56
- Fix missing_param_type, missing_return_type, param_type_mismatch,
  return_type_mismatch, no_args, no_returns, and missing docstring issues
- Add TYPE_CHECKING imports for HuggingFace types in util.py with
  type: ignore[union-attr] for pre-existing None-safety gaps
- Add Granite3ChatCompletion import to granite32/33 input.py for
  correct sanitize() parent signature match
- Convert reST-style docstrings to Google style in intrinsics/input.py
- Document AST single-quote normalization for Literal types in
  CONTRIBUTING.md
Adding TYPE_CHECKING annotations to util.py made mypy check function
bodies it previously skipped (untyped params = implicit Any = no body
checking). This exposed a pre-existing Tensor-not-callable issue and
a dict-variance issue in mobject.py. Suppress with targeted type:
ignore comments — these are not new bugs, just newly visible ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: gate doc quality in CI and pre-commit to prevent future degradation