Skip to content

Generate LLMs text during site build#52

Open
koriym wants to merge 2 commits into
masterfrom
codex/ci-generate-llms-full
Open

Generate LLMs text during site build#52
koriym wants to merge 2 commits into
masterfrom
codex/ci-generate-llms-full

Conversation

@koriym
Copy link
Copy Markdown
Member

@koriym koriym commented May 21, 2026

Summary

  • add a Pages workflow that generates llms-full.txt before the Jekyll build
  • add an llms-full.txt generator based on English manual markdown
  • ignore and stop tracking generated llms-full.txt

Verification

  • zsh -ic "sphp85; php -l bin/generate_llms_full.php"
  • zsh -ic "sphp85; php bin/generate_llms_full.php"
  • git diff --cached --check

Note: local Jekyll build was not run successfully because required bundle gems are not installed in this local environment.

Summary by CodeRabbit

  • Chores

    • Added automated site deployment to GitHub Pages.
    • Enabled automatic documentation generation as part of the build pipeline.
    • Streamlined local development to run pre-generation steps before serving.
  • Documentation

    • Cleared the existing consolidated full-documentation file (content removed; regeneration now part of the workflow).

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96de607f-e662-40ef-b367-f650e7aa0a10

📥 Commits

Reviewing files that changed from the base of the PR and between 7e338fe and 0fc9e06.

📒 Files selected for processing (3)
  • .github/workflows/jekyll.yml
  • bin/generate_llms_full.php
  • bin/serve_local.sh

📝 Walkthrough

Walkthrough

The PR replaces the committed llms-full.txt with a generated artifact produced from llms.txt plus markdown manuals, integrates generation into local serve script, and adds a GitHub Actions workflow to build and deploy the Jekyll site to GitHub Pages.

Changes

Documentation generation and deployment

Layer / File(s) Summary
PHP generation script and build artifact config
bin/generate_llms_full.php, .gitignore
New PHP script validates inputs, discovers and sorts markdown files from manuals/1.0/en while skipping specified basenames, strips YAML front-matter, concatenates with llms.txt (with per-file <!-- Source: ... --> comments), writes llms-full.txt, and .gitignore now excludes llms-full.txt.
Local development integration
bin/serve_local.sh
Script now enables strict error handling and runs ruby bin/merge_md_files.rb and php bin/generate_llms_full.php before starting bundle exec jekyll serve --watch so generated content is available locally.
GitHub Pages deployment workflow
.github/workflows/jekyll.yml
New workflow triggers on pushes to master and manual dispatch; build job sets up Ruby (3.2.2) with bundler caching and PHP (8.3), runs generation scripts, builds Jekyll using the Pages base path, and uploads the artifact; deploy job deploys via actions/deploy-pages@v4.

🐰 A document once static, now dances and glides,
Generated fresh from markdown tides,
From local dev to GitHub's grand page,
The llms-full.txt enters a new age!

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding automated generation of llms-full.txt during the Jekyll site build process via a new PHP script and GitHub Actions workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/ci-generate-llms-full

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
.github/workflows/jekyll.yml (1)

22-22: ⚖️ Poor tradeoff

Consider pinning actions to commit SHAs.

The workflow uses version tags (e.g., @v4, @v1) instead of commit SHAs for action references. While version tags are more maintainable, pinning to commit SHAs provides stronger supply-chain security by preventing tag manipulation attacks. This is a common trade-off in GitHub Actions workflows.

Example for one action:

- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

Note: If you choose to pin, you'll need to maintain SHA references and update them when new versions are released. Many teams use Dependabot or Renovate to automate this.

Also applies to: 24-24, 29-29, 34-34, 44-44, 55-55

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/jekyll.yml at line 22, Replace floating version tags for
GitHub Actions with their corresponding commit SHAs: locate each "uses:"
reference such as "actions/checkout@v4" (and the other entries mentioned on
lines 24, 29, 34, 44, 55) and update them to the pinned commit SHA for the exact
release you want to lock (e.g., replace actions/checkout@v4 with
actions/checkout@<commit-sha>); ensure you fetch the authoritative commit SHA
from the action repo release/tag and keep a short comment noting the tag/version
for future updates, and consider using Dependabot/Renovate to automate SHA
bumping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/jekyll.yml:
- Around line 21-22: The Checkout step using actions/checkout@v4 should
explicitly disable credential persistence to avoid leaking Git credentials to
artifacts; update the checkout step (the action reference actions/checkout@v4)
to include persist-credentials: false so credentials are not stored in the
workspace or uploaded with Pages artifacts.
- Around line 8-11: The workflow-level permissions currently grant pages: write
and id-token: write globally; move those two keys into the deploy job's
permissions block and leave only minimal global permissions (e.g., contents:
read) at the top. Update the "deploy" job (job name "deploy") to include
permissions: pages: write and id-token: write so only that job has elevated
rights; remove those keys from the workflow-level permissions to minimize scope.

In `@bin/generate_llms_full.php`:
- Line 65: The file write using file_put_contents($outputFile, $content) is not
validated; check the return value (it returns number of bytes written or false)
and handle failures by logging an error with $outputFile and the returned/errno
info and then exiting or throwing; update the code around the file_put_contents
call to detect === false (or compare to strlen($content)) and produce a clear
error via error_log/process exit so failed writes (permissions/disk) are
reported rather than silently ignored.
- Line 49: The file read using file_get_contents($llmsFile) is being cast to
string which hides failures; change the code to first assign the result to a
variable (e.g. $llmsContent = file_get_contents($llmsFile)), check if
$llmsContent === false, and on failure log a clear error or throw/exit with
non-zero status; only when the read succeeds use rtrim($llmsContent) to build
$content and proceed. Ensure you reference $llmsFile and $content and update the
error handling so read failures are not silently ignored.
- Line 52: The file read using file_get_contents for $markdown can return false
and casting to (string) hides failures; change the code around the
file_get_contents($file) call so you check its return value !== false, handle
errors explicitly (e.g., log an error with the filename and exit/throw), and
only cast or use the content when the read succeeded; update the code path that
uses $markdown to expect a valid string or bail early if the read failed to
avoid silent corruption.

In `@bin/serve_local.sh`:
- Around line 6-8: The script currently runs ruby bin/merge_md_files.rb and php
bin/generate_llms_full.php then always starts jekyll; update serve_local.sh to
fail fast so jekyll is only invoked when generation succeeds: enable immediate
exit on any error (e.g., set -e or check each command's exit status) and log a
clear error message referencing the failing commands (merge_md_files.rb or
generate_llms_full.php) before exiting so developers won’t be served
stale/missing llms-full.txt.

---

Nitpick comments:
In @.github/workflows/jekyll.yml:
- Line 22: Replace floating version tags for GitHub Actions with their
corresponding commit SHAs: locate each "uses:" reference such as
"actions/checkout@v4" (and the other entries mentioned on lines 24, 29, 34, 44,
55) and update them to the pinned commit SHA for the exact release you want to
lock (e.g., replace actions/checkout@v4 with actions/checkout@<commit-sha>);
ensure you fetch the authoritative commit SHA from the action repo release/tag
and keep a short comment noting the tag/version for future updates, and consider
using Dependabot/Renovate to automate SHA bumping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1d8d63e-e9db-4d7b-a165-659f3a148caa

📥 Commits

Reviewing files that changed from the base of the PR and between 3df03aa and 7e338fe.

📒 Files selected for processing (5)
  • .github/workflows/jekyll.yml
  • .gitignore
  • bin/generate_llms_full.php
  • bin/serve_local.sh
  • llms-full.txt
💤 Files with no reviewable changes (1)
  • llms-full.txt

Comment thread .github/workflows/jekyll.yml Outdated
Comment thread .github/workflows/jekyll.yml
Comment thread bin/generate_llms_full.php Outdated
Comment thread bin/generate_llms_full.php Outdated
Comment thread bin/generate_llms_full.php Outdated
Comment thread bin/serve_local.sh
@koriym
Copy link
Copy Markdown
Member Author

koriym commented May 21, 2026

CodeRabbit nitpick about pinning GitHub Actions to commit SHAs is intentionally not applied here. This repository uses version tags in its Pages workflows for maintainability, and keeping that policy consistent is a better tradeoff for these manual sites. The security-sensitive actionable items were addressed in 0fc9e06.

Please close the non-blocking nitpick if this explanation is acceptable.

@koriym
Copy link
Copy Markdown
Member Author

koriym commented May 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant