Post-incident reviews (PIRs) and operational runbooks for the pgmac homelab Kubernetes infrastructure.
Published at https://incidents.pgmac.net.au/
- Incidents — PIRs documenting what went wrong, why, and how it was fixed
- Runbooks — Step-by-step recovery procedures for known failure modes
Requires mise and Python 3.13.
mise run install # create venv and install dependencies
mise run serve # serve at http://localhost:8000 with live reload
mise run build # build static site
mise run build-strict # strict build (matches CI)- Name:
YYYY-MM-DD-brief-description.md - Location:
src/incidents/ - Add a row to the top of
src/incidents/index.md(newest-first) - Follow
src/doc-templates/pir-template.md
- Name:
<service>-<failure-description>.md - Location:
src/runbooks/ - Add a row to
src/runbooks/index.md - Follow
src/doc-templates/runbook-template.md(simple or multi-mode pattern) - Cross-link from the PIR that documented the failure
validate.yml— MkDocs strict build on every PRdeploy.yml— builds and deploys to GitHub Pages on merge tomain