Skip to content

CI (docs): Reusable Mintlify docs checks for integrations repos#106258

Merged
Blargian merged 7 commits into
ClickHouse:masterfrom
Blargian:reusable-mintlify-docs-check
Jun 10, 2026
Merged

CI (docs): Reusable Mintlify docs checks for integrations repos#106258
Blargian merged 7 commits into
ClickHouse:masterfrom
Blargian:reusable-mintlify-docs-check

Conversation

@Blargian

@Blargian Blargian commented Jun 1, 2026

Copy link
Copy Markdown
Member

Structures the Mintlify docs check such that it can be easily run standalone without Praktika in other repos. The motivation for this is to enable us to easily run the same Mintlify docs checks that will run in this repo in any of the language client repositories.

Context

Our integrations team would like to keep docs close to the source code so they can make docs updates alongside code changes. Currently the docs are kept in clickhouse-docs repo and it's necessary to context switch to make docs updates.

With the Mintlify migration, beginning with clickhouse-connect: ClickHouse/clickhouse-connect#768, we will move language client docs back into their own repositories.

  1. Language client docs live in the local repo
  2. As part of the language client CI we will run the docs check via [WIP] add Mintlify docs verification workflow clickhouse-connect#778
  • Runs the check using the docs_builder image
  • Shallows clones the docs and any necessary scripts from ClickHouse/ClickHouse
  • Unifies the local docs together with the complete docs
  • Runs the docs check
  1. If all checks pass, then the PR merges
  2. GitHub version releases of the language client will trigger a workflow which will make a versioned copy in this repo for publication

We want the sync to be unidirectional, so there is a check added to this repository to fail the docs check with an error message pointing to the canonical source if someone tries to make the updates to the copies in ClickHouse/ClickHouse docs folder.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

@Blargian Blargian changed the title Reusable mintlify docs check CI (docs): Reusable Mintlify docs checks for integrations repos Jun 1, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [1a5f6aa]

Summary:


AI Review

Summary

This PR factors the Mintlify docs checks into a reusable standalone driver, adds a guard for one-way copied docs, and updates changed-file collection so rename sources can be inspected. The earlier safety issues around stale overlay files, missing changed-file data, rename-outs, and escaped --replace destinations are fixed in the current code, but the new CI behavior still needs focused regression coverage before merge.

Missing context / blind spots
  • ⚠️ The direct S3 CI report for 1a5f6aac2dc527d07b770c305fb619f85a20ed91 returned 0 test results, so I could not use CI evidence to validate the new scripts.
Tests
  • ⚠️ [ci/jobs/scripts/docs/check_readonly_copies.py:43] The PR adds the enforcement boundary for one-way copied docs and destructive docs-tree replacement, but it does not add focused tests for the invariants it now relies on. Please add small Python tests for protected-path edits failing, unrelated paths passing, rename-out source paths being treated as violations, _readonly_copies_guard failing when changed_files is unavailable, and resolve_replace_dest rejecting absolute and .. destinations before replace can delete anything.
Final Verdict

Status: ⚠️ Request changes

Minimum required action: add focused regression tests for the read-only copied-docs guard and --replace destination validation.

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 1, 2026
Comment thread ci/jobs/scripts/docs/mintlify_docs_check.py Outdated
Comment thread docs/en/sql-reference/data-types/nested-data-structures/index.md
Comment thread ci/jobs/scripts/docs/mintlify_docs_check.py Outdated
Comment thread ci/jobs/docs_job_mintlify.py Outdated
Blargian and others added 4 commits June 5, 2026 15:44
Decouple the Mintlify docs check logic from Praktika so another repository
(e.g. a language client whose docs are a slice of the larger site) can run the
same checks without vendoring Praktika.

- Add `ci/jobs/scripts/docs/mintlify_docs_check.py`, a self-contained driver
  that pulls the docs image, shallow/sparse-clones the aggregator docs repo
  (including `ci/jobs/scripts/docs` so shared check scripts come along),
  overlays one or more local folders, and runs the checks inside the image.
  The host only needs `docker`, `git` and `python3`; `mint` is provided by the
  image. The check definitions live in `DEFAULT_CHECKS` as the single source of
  truth.
- Rewrite `ci/jobs/docs_job_mintlify.py` as a thin Praktika adapter that imports
  `DEFAULT_CHECKS` and runs them natively (it already runs inside the image), so
  new checks are declared in one place.
- Add `ci/jobs/scripts/docs/check_readonly_copies.py` and
  `readonly_copies.json`, a one-way-sync guard that fails edits to docs folders
  whose canonical source lives in another repo and points contributors at that
  source. Wired into the Praktika job only (aggregator side); deliberately not
  part of `DEFAULT_CHECKS`.
- Track `ci/jobs/scripts/docs` in the docs job digest so changes to the shared
  check scripts re-run the job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The standalone check should validate the exact docs slice the consuming
repo would sync. The old `--overlay` defaulted to a merge, so a page
deleted or renamed in the source repo left the stale file from the cloned
aggregator under the destination, letting `mint validate` and
`mint broken-links` pass even though the next sync would remove that page.

Since the docs sync is one-way (the consuming repo is the source of
truth), there is never a need to merge. Rename the function and the flag
to `replace` and always wipe the destination before copying, so deletions
and renames are visible to the checks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`clone_aggregator` ran `git clone --filter=blob:none` with no `--depth`, so it
downloaded the aggregator's entire commit history (just without blobs) before
the shallow `fetch` -- enormous for a repo like ClickHouse, run on every docs
change.

Build the repo with `git init` + a single `git fetch --depth 1
--filter=blob:none` over the sparse paths instead, so only the tip commit and
only the docs slice are fetched. Measured against ClickHouse master: 3.6 s,
31 MB, 1 commit -- no src/ or contrib/ subtrees.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The driver orchestrated docker itself (`docker pull` + `docker run` per check),
which required docker on the host and could not run from inside a container.

Run the checks natively instead: the driver is now meant to run inside the
docs image (`clickhouse/docs-builder`), where `mint`, `git` and `python3` are
all present. It clones the aggregator, replaces the target folder, and runs
each check from the docs root via `sh -lc`. This lets a consuming repo run the
image as its CI job container, fetch this one file, and run it -- vendoring
nothing. The shallow/sparse clone and one-way replace are unchanged, and the
shared `DEFAULT_CHECKS` list is still the single source of truth with the
Praktika job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Blargian Blargian force-pushed the reusable-mintlify-docs-check branch from 9c08cb6 to 6a591d8 Compare June 5, 2026 15:42
Comment thread ci/jobs/scripts/docs/check_readonly_copies.py
Blargian and others added 3 commits June 10, 2026 09:59
`Info().get_changed_files` returns `None` when Praktika could not fetch or
store the PR file list; treating that as an empty list made the guard report
success without checking anything. Distinguish `None` from an empty list and
fail the job instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`GH.get_changed_files` used `gh pr view --json files --jq '.files[].path'`,
which reports only the destination path of a renamed file. Renaming a file out
of a protected folder therefore bypassed the read-only docs guard: no path in
the list started with the protected prefix, although the rename deletes the
file from the copied tree. The same blind spot made job filtering miss renames
out of a watched directory.

Switch to the REST files API (`pulls/{n}/files` for PRs, `commits/{sha}`
for branches) and emit `previous_filename` alongside `filename`, so both
sides of a rename appear in the list. Rename sources do not exist in the
checkout at `HEAD`, exactly like deleted files, which were always included;
all consumers of the list either match strings or check file existence first.
The REST endpoint with `--paginate` also lifts the 100-file cap of
`gh pr view --json files`.

Also document in `check_readonly_copies` that the input must cover both
sides of renames, and that standalone callers should pipe
`git diff --name-only --no-renames`.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`replace` wipes its destination with `rmtree` before copying, and the
destination was joined to the docs root without validation, so an absolute
path, a `..` component, or a symlink pointing out of the clone would delete
an unrelated directory on the host. Resolve the destination with `realpath`
against the docs root and fail unless it lands strictly inside it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Blargian Blargian marked this pull request as ready for review June 10, 2026 08:36
@Blargian Blargian requested a review from maxknv June 10, 2026 08:36
@Blargian

Copy link
Copy Markdown
Member Author

Hey @maxknv, could I please ask for a review when you get a chance? 🙏 Open to ideas here if there is a better way to do this

return path.strip().removeprefix("./")


def find_violations(changed_files, sources):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the enforcement boundary for one-way copied docs, but the PR does not add focused coverage for the contract it depends on. The current series has fixed several edge cases by review (None changed-file data passing as success, rename sources escaping the guard, and unsafe --replace destinations reaching rmtree), so these invariants should be locked down before this becomes reusable CI.

Please add small Python tests covering at least: a protected-path edit failing with the canonical-source message, an unrelated path passing, a rename-out source path being treated as a violation, _readonly_copies_guard failing when changed_files is unavailable, and resolve_replace_dest rejecting absolute and .. destinations before replace can delete anything.

@Blargian Blargian added this pull request to the merge queue Jun 10, 2026
Merged via the queue into ClickHouse:master with commit 21475b7 Jun 10, 2026
166 checks passed
@Blargian Blargian deleted the reusable-mintlify-docs-check branch June 10, 2026 12:33
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants