diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..8d7fc7f --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,24 @@ +# Changelog + +## v1.1.0 + +### Added + +- **Per-platform image sizes on the `size.json` badge.** The scraper now fetches compressed layer sizes from the GHCR OCI registry API (`ghcr.io/v2/`). Multi-arch images display per-platform breakdowns (e.g. `82.5 MB (amd64) | 81.2 MB (arm64)`). Single-arch images show a plain size. +- Version-aware manifest fetching: uses the scraped version tag first, falls back to `latest` if the tag is not found on the registry. +- Best-effort resilience: a transient failure fetching one platform's manifest does not lose data for the other platforms. +- New test fixtures and 6 new unit tests covering manifest parsing and badge formatting. + +### Changed + +- `PackageStats.SizeBytes` (int64, never populated) replaced with `PlatformSizes` (map[string]int64) to support per-platform sizes. +- Badge formatting for size: nil/empty map returns `"unknown"`, single platform with known arch shows `"82.5 MB (amd64)"`, multiple platforms show pipe-separated breakdown. +- Updated README, wiki/Home, wiki/Badge-Usage, and wiki/Troubleshooting to reflect new OCI-based size fetching and per-platform output format. + +### Fixed + +- Size badge no longer shows `"unknown"` for all packages (closes GiteaLN/pkgbadge#1). + +## v1.0.0 + +Initial release. HTML scraping for pull counts, versions, and platform architectures. Shields.io endpoint badge server. diff --git a/README.md b/README.md index 1fdad1f..63424fa 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,11 @@ # pkgbadge -Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and serves [shields.io endpoint badges](https://shields.io/badges/endpoint-badge) with pull counts, versions, image sizes, and platform info. +[![CI](https://github.com/Will-Luck/pkgbadge/actions/workflows/ci.yml/badge.svg)](https://github.com/Will-Luck/pkgbadge/actions/workflows/ci.yml) +[![Release](https://img.shields.io/github/v/release/Will-Luck/pkgbadge)](https://github.com/Will-Luck/pkgbadge/releases) +[![Licence](https://img.shields.io/github/license/Will-Luck/pkgbadge)](LICENSE) +[![Docker Pulls](https://img.shields.io/docker/pulls/willluck/pkgbadge)](https://hub.docker.com/r/willluck/pkgbadge) + +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and the OCI registry API to serve [shields.io endpoint badges](https://shields.io/badges/endpoint-badge) with pull counts, versions, image sizes, and platform info. ## Features @@ -52,7 +57,7 @@ Then add badges to your README: |-------|----------|---------| | Pull count | `/owner/package/pulls.json` | `1.5k` | | Version | `/owner/package/version.json` | `2.11.1` | -| Image size | `/owner/package/size.json` | `12.4 MB` | +| Image size | `/owner/package/size.json` | `82.5 MB (amd64) \| 79.2 MB (arm64)` | | Platforms | `/owner/package/arch.json` | `amd64 \| arm64` | ## Configuration diff --git a/docs/superpowers/plans/2026-04-06-oci-manifest-size.md b/docs/superpowers/plans/2026-04-06-oci-manifest-size.md new file mode 100644 index 0000000..3e7b42b --- /dev/null +++ b/docs/superpowers/plans/2026-04-06-oci-manifest-size.md @@ -0,0 +1,744 @@ +# OCI Manifest Size Badge Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Fetch compressed image sizes from the GHCR OCI registry API and display per-platform sizes on the `size.json` badge. + +**Architecture:** Add `fetchImageSizes` to the scraper that hits `ghcr.io/v2/` for anonymous OCI manifest data after the HTML scrape. Extract a testable `parseManifestSizes` helper that takes raw JSON + a digest-fetcher callback so all parsing logic can be tested with fixtures. Update the badge formatter to render per-platform breakdowns. + +**Tech Stack:** Go stdlib only (net/http, encoding/json). No new dependencies. + +**Spec:** `docs/superpowers/specs/2026-04-06-oci-manifest-size-design.md` + +--- + +### Task 1: Update data model in types.go + +**Files:** +- Modify: `types.go:6-14` + +- [ ] **Step 1: Replace SizeBytes with PlatformSizes** + +Replace the `SizeBytes` field in `PackageStats`: + +```go +type PackageStats struct { + Owner string + Package string + TotalPulls int + LatestVersion string + Architectures []string + PlatformSizes map[string]int64 // key: "linux/amd64" or "" for unknown platform; value: compressed bytes + ScrapedAt int64 // unix timestamp +} +``` + +- [ ] **Step 2: Run existing tests to confirm nothing breaks** + +Run: `cd /home/lns/pkgbadge && go test ./...` +Expected: PASS. The existing `seedCache()` in `server_test.go` does not set `SizeBytes`, so removing the field is safe. The `scraper_test.go` tests only check pulls/version/arch. + +- [ ] **Step 3: Commit** + +```bash +cd /home/lns/pkgbadge && git add types.go && git commit -m "refactor: replace SizeBytes with PlatformSizes map" +``` + +--- + +### Task 2: Add OCI types and parseManifestSizes helper + +**Files:** +- Modify: `scraper.go` (add types and helper after existing code) +- Create: `testdata/manifest-index.json` +- Create: `testdata/manifest-single.json` +- Create: `testdata/manifest-amd64.json` +- Create: `testdata/manifest-arm64.json` +- Modify: `scraper_test.go` (add parsing tests) + +- [ ] **Step 1: Create test fixtures** + +`testdata/manifest-index.json` -- multi-arch OCI index with 2 real platforms + 2 attestation entries: + +```json +{ + "mediaType": "application/vnd.oci.image.index.v1+json", + "schemaVersion": 2, + "manifests": [ + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:amd64digest", + "size": 2197, + "platform": { "os": "linux", "architecture": "amd64" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:arm64digest", + "size": 2197, + "platform": { "os": "linux", "architecture": "arm64" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:attestation1", + "size": 565, + "platform": { "os": "unknown", "architecture": "unknown" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:attestation2", + "size": 565, + "platform": { "os": "unknown", "architecture": "unknown" } + } + ] +} +``` + +`testdata/manifest-amd64.json` -- platform manifest for amd64: + +```json +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 3000000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 7000000 } + ] +} +``` + +`testdata/manifest-arm64.json` -- platform manifest for arm64: + +```json +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 2500000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 6500000 } + ] +} +``` + +`testdata/manifest-single.json` -- single image manifest (no index wrapper): + +```json +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 4000000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 8000000 } + ] +} +``` + +- [ ] **Step 2: Write failing tests for parseManifestSizes** + +Add to `scraper_test.go`: + +```go +func TestParseManifestSizes_Index(t *testing.T) { + indexBody := loadFixture(t, "manifest-index.json") + amd64Body := loadFixture(t, "manifest-amd64.json") + arm64Body := loadFixture(t, "manifest-arm64.json") + + fetcher := func(digest string) ([]byte, error) { + switch digest { + case "sha256:amd64digest": + return []byte(amd64Body), nil + case "sha256:arm64digest": + return []byte(arm64Body), nil + default: + return nil, fmt.Errorf("unexpected digest: %s", digest) + } + } + + sizes, err := parseManifestSizes([]byte(indexBody), fetcher) + if err != nil { + t.Fatal(err) + } + if len(sizes) != 2 { + t.Fatalf("got %d platforms, want 2", len(sizes)) + } + if sizes["linux/amd64"] != 10000000 { + t.Errorf("amd64 = %d, want 10000000", sizes["linux/amd64"]) + } + if sizes["linux/arm64"] != 9000000 { + t.Errorf("arm64 = %d, want 9000000", sizes["linux/arm64"]) + } +} + +func TestParseManifestSizes_Single(t *testing.T) { + body := loadFixture(t, "manifest-single.json") + + sizes, err := parseManifestSizes([]byte(body), nil) + if err != nil { + t.Fatal(err) + } + if len(sizes) != 1 { + t.Fatalf("got %d entries, want 1", len(sizes)) + } + if sizes[""] != 12000000 { + t.Errorf("size = %d, want 12000000", sizes[""]) + } +} +``` + +Add `"fmt"` to the imports if not already present. + +Run: `cd /home/lns/pkgbadge && go test -run TestParseManifestSizes -v` +Expected: FAIL -- `parseManifestSizes` undefined. + +- [ ] **Step 3: Add OCI types and parseManifestSizes to scraper.go** + +Add after the existing `scrapeAll` function at the bottom of `scraper.go`. Add `"encoding/json"` to the imports. + +```go +// OCI registry types (private, for JSON parsing only). + +type ociTokenResponse struct { + Token string `json:"token"` +} + +type ociManifest struct { + MediaType string `json:"mediaType"` + Manifests []ociDescriptor `json:"manifests,omitempty"` + Layers []ociDescriptor `json:"layers,omitempty"` +} + +type ociDescriptor struct { + MediaType string `json:"mediaType"` + Digest string `json:"digest"` + Size int64 `json:"size"` + Platform *ociPlatform `json:"platform,omitempty"` +} + +type ociPlatform struct { + OS string `json:"os"` + Architecture string `json:"architecture"` +} + +// parseManifestSizes parses an OCI manifest (index or single) and returns +// per-platform compressed layer sizes. For an index, fetchManifest is called +// for each real platform digest. For a single manifest, fetchManifest is unused. +func parseManifestSizes(body []byte, fetchManifest func(digest string) ([]byte, error)) (map[string]int64, error) { + var m ociManifest + if err := json.Unmarshal(body, &m); err != nil { + return nil, fmt.Errorf("unmarshal manifest: %w", err) + } + + if len(m.Manifests) > 0 { + return parseIndexSizes(m.Manifests, fetchManifest) + } + if len(m.Layers) > 0 { + return parseSingleSizes(m.Layers), nil + } + return nil, fmt.Errorf("manifest has no manifests[] or layers[]") +} + +func parseIndexSizes(descriptors []ociDescriptor, fetchManifest func(string) ([]byte, error)) (map[string]int64, error) { + sizes := make(map[string]int64) + for _, d := range descriptors { + if d.Platform == nil || d.Platform.OS == "" || d.Platform.OS == "unknown" { + continue + } + raw, err := fetchManifest(d.Digest) + if err != nil { + return nil, fmt.Errorf("fetch %s: %w", d.Digest, err) + } + var pm ociManifest + if err := json.Unmarshal(raw, &pm); err != nil { + return nil, fmt.Errorf("unmarshal platform manifest %s: %w", d.Digest, err) + } + key := d.Platform.OS + "/" + d.Platform.Architecture + for _, l := range pm.Layers { + sizes[key] += l.Size + } + } + return sizes, nil +} + +func parseSingleSizes(layers []ociDescriptor) map[string]int64 { + var total int64 + for _, l := range layers { + total += l.Size + } + return map[string]int64{"": total} +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `cd /home/lns/pkgbadge && go test -run TestParseManifestSizes -v` +Expected: PASS -- both `TestParseManifestSizes_Index` and `TestParseManifestSizes_Single` green. + +- [ ] **Step 5: Commit** + +```bash +cd /home/lns/pkgbadge && git add scraper.go scraper_test.go testdata/manifest-*.json && git commit -m "feat: add OCI manifest parsing with per-platform sizes" +``` + +--- + +### Task 3: Add fetchImageSizes to the scraper + +**Files:** +- Modify: `scraper.go` (add `fetchImageSizes`, wire into `scrapeAll`) + +- [ ] **Step 1: Add fetchImageSizes** + +Add below `parseSingleSizes` in `scraper.go`: + +```go +const ( + ghcrTokenURL = "https://ghcr.io/token?scope=repository:%s/%s:pull" + ghcrManifestURL = "https://ghcr.io/v2/%s/%s/manifests/%s" + ociAccept = "application/vnd.oci.image.index.v1+json, " + + "application/vnd.docker.distribution.manifest.list.v2+json, " + + "application/vnd.oci.image.manifest.v1+json, " + + "application/vnd.docker.distribution.manifest.v2+json" +) + +func fetchImageSizes(ctx context.Context, owner, pkg, version string) (map[string]int64, error) { + token, err := fetchGHCRToken(ctx, owner, pkg) + if err != nil { + return nil, fmt.Errorf("token: %w", err) + } + + tag := version + if tag == "" { + tag = "latest" + } + + body, err := fetchManifestByTag(ctx, owner, pkg, tag, token) + if err != nil && tag != "latest" { + body, err = fetchManifestByTag(ctx, owner, pkg, "latest", token) + } + if err != nil { + return nil, err + } + + return parseManifestSizes(body, func(digest string) ([]byte, error) { + return fetchManifestByTag(ctx, owner, pkg, digest, token) + }) +} + +func fetchGHCRToken(ctx context.Context, owner, pkg string) (string, error) { + url := fmt.Sprintf(ghcrTokenURL, strings.ToLower(owner), strings.ToLower(pkg)) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + return "", err + } + + resp, err := httpClient.Do(req) + if err != nil { + return "", err + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("token endpoint HTTP %d", resp.StatusCode) + } + + var tr ociTokenResponse + if err := json.NewDecoder(resp.Body).Decode(&tr); err != nil { + return "", fmt.Errorf("decode token: %w", err) + } + return tr.Token, nil +} + +func fetchManifestByTag(ctx context.Context, owner, pkg, ref, token string) ([]byte, error) { + url := fmt.Sprintf(ghcrManifestURL, strings.ToLower(owner), strings.ToLower(pkg), ref) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + return nil, err + } + req.Header.Set("Authorization", "Bearer "+token) + req.Header.Set("Accept", ociAccept) + + resp, err := httpClient.Do(req) + if err != nil { + return nil, err + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return nil, fmt.Errorf("manifest HTTP %d for %s", resp.StatusCode, ref) + } + + return io.ReadAll(io.LimitReader(resp.Body, 1<<20)) // 1 MiB cap +} +``` + +- [ ] **Step 2: Wire fetchImageSizes into scrapeAll** + +Replace the current `scrapeAll` function body: + +```go +func scrapeAll(ctx context.Context, packages []PackageRef, cache *Cache, log *slog.Logger) { + for _, ref := range packages { + html, err := fetchPackagePage(ctx, ref) + if err != nil { + log.Warn("scrape failed, keeping stale data", "package", ref.Key(), "error", err) + continue + } + stats, err := parsePackagePage(html, ref.Owner, ref.Package) + if err != nil { + log.Warn("parse failed", "package", ref.Key(), "error", err) + continue + } + + sizes, err := fetchImageSizes(ctx, ref.Owner, ref.Package, stats.LatestVersion) + if err != nil { + log.Warn("size fetch failed", "package", ref.Key(), "error", err) + } else { + stats.PlatformSizes = sizes + } + + stats.ScrapedAt = time.Now().Unix() + cache.Set(ref.Key(), stats) + log.Info("scraped", "package", ref.Key(), "pulls", stats.TotalPulls, "version", stats.LatestVersion) + } +} +``` + +- [ ] **Step 3: Verify it compiles** + +Run: `cd /home/lns/pkgbadge && go build ./...` +Expected: Build succeeds with no errors. + +- [ ] **Step 4: Commit** + +```bash +cd /home/lns/pkgbadge && git add scraper.go && git commit -m "feat: fetch image sizes from GHCR OCI registry API" +``` + +--- + +### Task 4: Update badge formatter in server.go + +**Files:** +- Modify: `server.go:58-90` (buildBadge size case) +- Modify: `server_test.go` (add size badge tests, update seedCache) + +- [ ] **Step 1: Write failing tests for size badge formatting** + +Add to `server_test.go`. First add `"sort"` to imports. + +Update `seedCache` to include `PlatformSizes`: + +```go +func seedCache() *Cache { + c := NewCache() + c.Set("will-luck/docker-sentinel", &PackageStats{ + Owner: "Will-Luck", + Package: "docker-sentinel", + TotalPulls: 433, + LatestVersion: "2.11.1", + Architectures: []string{"linux/amd64", "linux/arm64"}, + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + "linux/arm64": 83000000, + }, + ScrapedAt: 1710000000, + }) + return c +} +``` + +Add the test functions: + +```go +func TestBuildBadge_Size_MultiPlatform(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + "linux/arm64": 83000000, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "82.5 MB (amd64) | 79.2 MB (arm64)" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_SinglePlatformLabelled(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "82.5 MB (amd64)" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_SingleUnknownPlatform(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "": 12000000, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "11.4 MB" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_NilMap(t *testing.T) { + stats := &PackageStats{} + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + if badge.Message != "unknown" { + t.Errorf("message = %q, want %q", badge.Message, "unknown") + } +} +``` + +Run: `cd /home/lns/pkgbadge && go test -run TestBuildBadge_Size -v` +Expected: FAIL -- the current `buildBadge` size case still calls `formatBytes(stats.SizeBytes)` which no longer exists. + +- [ ] **Step 2: Update buildBadge size case** + +Replace the `case "size":` block in `buildBadge` (server.go). Add `"sort"` and `"strings"` to imports if not already there. + +```go + case "size": + return BadgeResponse{ + SchemaVersion: 1, + Label: "image size", + Message: formatSizeMessage(stats.PlatformSizes), + Color: "blue", + }, true +``` + +Add the `formatSizeMessage` function after `formatBytes`: + +```go +func formatSizeMessage(sizes map[string]int64) string { + if len(sizes) == 0 { + return "unknown" + } + + // Single entry with empty key: unknown platform, plain size. + if len(sizes) == 1 { + for k, v := range sizes { + if k == "" { + return formatBytes(v) + } + return formatBytes(v) + " (" + strings.TrimPrefix(k, "linux/") + ")" + } + } + + // Multiple platforms: sorted alphabetically by arch. + keys := make([]string, 0, len(sizes)) + for k := range sizes { + keys = append(keys, k) + } + sort.Strings(keys) + + parts := make([]string, len(keys)) + for i, k := range keys { + parts[i] = formatBytes(sizes[k]) + " (" + strings.TrimPrefix(k, "linux/") + ")" + } + return strings.Join(parts, " | ") +} +``` + +- [ ] **Step 3: Run tests to verify they pass** + +Run: `cd /home/lns/pkgbadge && go test ./... -v` +Expected: ALL PASS -- all existing tests plus the 4 new size tests. + +- [ ] **Step 4: Commit** + +```bash +cd /home/lns/pkgbadge && git add server.go server_test.go && git commit -m "feat: per-platform size badge formatting" +``` + +--- + +### Task 5: Update documentation + +**Files:** +- Modify: `README.md` +- Modify: `wiki/Home.md` +- Modify: `wiki/Badge-Usage.md` +- Modify: `wiki/Troubleshooting.md` + +- [ ] **Step 1: Update README.md** + +Line 8 -- change description from: +``` +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and serves ... +``` +to: +``` +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and the OCI registry API to serve ... +``` + +Line 60 -- change the size row example from `12.4 MB` to `82.5 MB (amd64) | 79.2 MB (arm64)`: + +```markdown +| Image size | `/owner/package/size.json` | `82.5 MB (amd64) \| 79.2 MB (arm64)` | +``` + +- [ ] **Step 2: Update wiki/Home.md** + +Line 3 -- change from: +``` +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and serves ... +``` +to: +``` +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and the OCI registry API to serve ... +``` + +- [ ] **Step 3: Update wiki/Badge-Usage.md** + +Line 11 -- change the size row from: +``` +| Image size | `size.json` | image size | blue | `12.4 MB`, `1.2 GB` | +``` +to: +``` +| Image size | `size.json` | image size | blue | `82.5 MB (amd64)`, `82.5 MB (amd64) \| 79.2 MB (arm64)` | +``` + +Lines 59-67 -- update the "Image sizes" table to note multi-platform output: + +```markdown +Image sizes use binary units. Multi-arch images show per-platform sizes (e.g. `82.5 MB (amd64) | 79.2 MB (arm64)`): + +| Raw Bytes | Displayed | +|-----------|-----------| +| 0 or nil | `unknown` | +| < 1 KiB | `512 B` | +| < 1 MiB | `45.2 KB` | +| < 1 GiB | `12.4 MB` | +| >= 1 GiB | `1.2 GB` | +``` + +- [ ] **Step 4: Update wiki/Troubleshooting.md** + +Replace the "Badge Shows unknown" section (lines 3-9) with: + +```markdown +## Badge Shows "unknown" + +The version and architecture badges return `unknown` when the scraper could not extract that field from the GitHub packages page. The size badge returns `unknown` when the OCI registry API could not be reached or the manifest could not be parsed. Possible causes: + +- **Package is new with no published versions yet.** The packages page won't have version or architecture data until at least one tagged version is pushed. +- **GitHub changed their HTML structure.** The scraper uses regex patterns to extract version and architecture data. If GitHub redesigns the packages page, the patterns may stop matching. Check the logs for `parse failed` warnings. +- **OCI registry unreachable or rate-limited.** The size badge fetches manifests from `ghcr.io/v2/`. If the registry is down or rate-limiting, size will show `unknown`. Check the logs for `size fetch failed` warnings. +- **Private package.** The OCI token fetch uses anonymous auth. Private packages will always show `unknown` for size. +- **Scrape hasn't completed yet.** On startup, pkgbadge scrapes all packages before starting the HTTP server. If you see `unknown` immediately after a restart, wait for the initial scrape to finish. +``` + +- [ ] **Step 5: Commit** + +```bash +cd /home/lns/pkgbadge && git add README.md wiki/ && git commit -m "docs: update size badge examples and troubleshooting for OCI manifest fetching" +``` + +--- + +### Task 6: Smoke test against live GHCR + +**Files:** None (verification only) + +- [ ] **Step 1: Smoke test with a version-tagged package** + +Run the full test suite first: + +```bash +cd /home/lns/pkgbadge && go test ./... -v +``` +Expected: ALL PASS. + +Then do a live smoke test. Build and run locally against docker-sentinel (has version tag `2.12.2`): + +```bash +cd /home/lns/pkgbadge && go build -o pkgbadge . && \ + PKGBADGE_PACKAGES="Will-Luck/Docker-Sentinel/docker-sentinel" \ + PKGBADGE_PORT=19876 \ + ./pkgbadge & +sleep 5 +curl -s http://localhost:19876/will-luck/docker-sentinel/size.json | python3 -m json.tool +kill %1 +``` + +Expected: JSON response with `"message"` containing size(s) like `"XX.X MB (amd64) | XX.X MB (arm64)"`, not `"unknown"`. + +- [ ] **Step 2: Smoke test with iplayer-arr (the original issue reporter)** + +```bash +cd /home/lns/pkgbadge && \ + PKGBADGE_PACKAGES="Will-Luck/iplayer-arr/iplayer-arr" \ + PKGBADGE_PORT=19876 \ + ./pkgbadge & +sleep 5 +curl -s http://localhost:19876/will-luck/iplayer-arr/size.json | python3 -m json.tool +kill %1 +``` + +Expected: JSON with per-platform sizes (iplayer-arr is multi-arch amd64+arm64). + +- [ ] **Step 3: Verify badge message length is reasonable for shields.io** + +Check that the longest expected badge message fits. Shields.io has no hard limit but messages over ~40 characters start to look cramped. A two-platform message like `82.5 MB (amd64) | 79.2 MB (arm64)` is 35 characters -- fine. + +Visually verify by opening: +``` +https://img.shields.io/badge/image%20size-82.5%20MB%20(amd64)%20%7C%2079.2%20MB%20(arm64)-blue +``` + +Expected: Badge renders cleanly without truncation. + +- [ ] **Step 4: Clean up test binary** + +```bash +cd /home/lns/pkgbadge && rm -f pkgbadge +``` + +--- + +### Task 7: Final commit and push + +**Files:** None + +- [ ] **Step 1: Run full test suite one final time** + +```bash +cd /home/lns/pkgbadge && go test ./... -v +``` +Expected: ALL PASS. + +- [ ] **Step 2: Review git log** + +```bash +cd /home/lns/pkgbadge && git log --oneline -10 +``` + +Expected: Clean commit history with one commit per task. + +- [ ] **Step 3: Push to Gitea** + +```bash +cd /home/lns/pkgbadge && git push +``` diff --git a/docs/superpowers/specs/2026-04-06-oci-manifest-size-design.md b/docs/superpowers/specs/2026-04-06-oci-manifest-size-design.md new file mode 100644 index 0000000..d8a68f7 --- /dev/null +++ b/docs/superpowers/specs/2026-04-06-oci-manifest-size-design.md @@ -0,0 +1,161 @@ +# OCI Manifest Size Badge + +**Issue:** GiteaLN/pkgbadge#1 +**Date:** 2026-04-06 +**Status:** Approved + +## Problem + +The `size.json` badge returns `"unknown"` for all GHCR packages. `SizeBytes` is declared in `PackageStats` but never populated because the GitHub packages HTML page does not contain image size information. Size data is only available via the OCI registry API at `ghcr.io/v2/`. + +## Solution + +Add OCI registry manifest fetching to the scraper. After scraping the HTML page for pulls/version/arch, make a second pass against `ghcr.io` to fetch compressed layer sizes per platform. + +## Data Model + +Replace `SizeBytes int64` with: + +```go +PlatformSizes map[string]int64 // key: "linux/amd64" or "" for unknown platform; value: total compressed layer bytes +``` + +This supports both single-arch and multi-arch images. A nil or empty map means size is unknown. + +## New Function: `fetchImageSizes` + +Location: `scraper.go` + +```go +func fetchImageSizes(ctx context.Context, owner, pkg, version string) (map[string]int64, error) +``` + +`version` is the tag extracted by the HTML scraper (e.g. `"2.11.1"`). If empty, falls back to `"latest"`. + +### Flow + +1. **Token:** `GET https://ghcr.io/token?scope=repository:{owner}/{pkg}:pull` -- anonymous auth for public packages. Parse JSON response for `token` field. + +2. **Manifest:** Try `stats.LatestVersion` first, fall back to `latest`: + `GET https://ghcr.io/v2/{owner}/{pkg}/manifests/{tag}` with headers: + - `Authorization: Bearer {token}` + - `Accept: application/vnd.oci.image.index.v1+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v2+json` + + If the versioned tag returns a non-200 response, retry with `latest` before giving up. + +3. **Parse response by `mediaType`:** + + **OCI image index** (`application/vnd.oci.image.index.v1+json` or `application/vnd.docker.distribution.manifest.list.v2+json`): + - Filter `manifests[]` to entries where `platform.os` is non-empty and not `"unknown"` (skips attestation manifests) + - For each platform entry, fetch its manifest by digest: + `GET https://ghcr.io/v2/{owner}/{pkg}/manifests/{digest}` + - Sum `layers[].size` for each platform manifest + - Store as `"linux/amd64" -> 86476951` + + **Single image manifest** (`application/vnd.oci.image.manifest.v1+json` or `application/vnd.docker.distribution.manifest.v2+json`): + - Sum `layers[].size` directly + - Store under a neutral key `""` (empty string) -- the badge formatter treats a single entry with an empty key as "no platform label needed" and renders just the size + +4. Return the map. + +### JSON Structures (minimal, for parsing) + +```go +type ociTokenResponse struct { + Token string `json:"token"` +} + +type ociManifest struct { + MediaType string `json:"mediaType"` + Manifests []ociDescriptor `json:"manifests,omitempty"` // present in index + Layers []ociDescriptor `json:"layers,omitempty"` // present in image manifest +} + +type ociDescriptor struct { + MediaType string `json:"mediaType"` + Digest string `json:"digest"` + Size int64 `json:"size"` + Platform *ociPlatform `json:"platform,omitempty"` +} + +type ociPlatform struct { + OS string `json:"os"` + Architecture string `json:"architecture"` +} +``` + +These are private types in `scraper.go`, not exported. + +## Error Handling + +Size fetching is best-effort. If any step fails (token fetch, manifest fetch, JSON parse, network timeout), log a warning and leave `PlatformSizes` nil. The badge falls back to `"unknown"`. A size fetch failure never causes the entire scrape cycle to fail or skip other fields. + +The existing `httpClient` (30s timeout) is reused for registry requests. + +## Badge Formatting + +The `"size"` case in `buildBadge` changes to: + +- `PlatformSizes` nil or empty: message = `"unknown"` +- Single entry with empty key (single-arch, platform unknown): message = `"82.5 MB"` (no platform label) +- Single entry with non-empty key: message = `"82.5 MB (amd64)"` (include label for clarity) +- Multiple entries: message = `"82.5 MB (amd64) | 79.1 MB (arm64)"` -- sorted alphabetically by architecture name, `linux/` prefix stripped + +## Integration + +In `scrapeAll`, after `parsePackagePage` succeeds: + +```go +sizes, err := fetchImageSizes(ctx, ref.Owner, ref.Package, stats.LatestVersion) +if err != nil { + log.Warn("size fetch failed", "package", ref.Key(), "error", err) +} else { + stats.PlatformSizes = sizes +} +``` + +## Testing + +### Unit tests for manifest parsing + +Extract the JSON-to-map logic into a testable helper: + +```go +func parseManifestSizes(body []byte, fetchManifest func(digest string) ([]byte, error)) (map[string]int64, error) +``` + +Test with JSON fixtures: +- `testdata/manifest-index.json` -- multi-arch OCI index with 2 real platforms + 2 attestation entries +- `testdata/manifest-single.json` -- single image manifest with layers +- `testdata/manifest-amd64.json` -- platform manifest fetched by digest (used by index test) +- `testdata/manifest-arm64.json` -- second platform manifest + +### Unit tests for badge formatting + +In `server_test.go`, test the size badge case: +- Nil `PlatformSizes` returns `"unknown"` +- Single entry with empty key returns plain size (e.g. `"82.5 MB"`) +- Single entry with non-empty key returns labelled size (e.g. `"82.5 MB (amd64)"`) +- Two platforms returns breakdown (e.g. `"82.5 MB (amd64) | 79.1 MB (arm64)"`) + +### Existing tests + +Unchanged. HTML parsing tests are unaffected since size comes from a separate code path. + +## Files Changed + +| File | Change | +|------|--------| +| `types.go` | Replace `SizeBytes int64` with `PlatformSizes map[string]int64` | +| `scraper.go` | Add OCI types, `fetchImageSizes`, `parseManifestSizes`, call from `scrapeAll` | +| `server.go` | Update `buildBadge` size case to format per-platform sizes | +| `server_test.go` | Add size badge formatting tests | +| `scraper_test.go` | Add manifest parsing tests | +| `testdata/manifest-index.json` | New fixture: multi-arch OCI index | +| `testdata/manifest-single.json` | New fixture: single image manifest | +| `testdata/manifest-amd64.json` | New fixture: amd64 platform manifest | +| `testdata/manifest-arm64.json` | New fixture: arm64 platform manifest | +| `README.md` | Update description ("scrapes GHCR pages" wording) and Badge Types table size example | +| `wiki/Home.md` | Update description to mention OCI registry API for size data | +| `wiki/Badge-Usage.md` | Update size row example output to reflect per-platform format | +| `wiki/Troubleshooting.md` | Rewrite "Badge Shows unknown" section: size now comes from OCI registry API, not HTML scraping | diff --git a/scraper.go b/scraper.go index 1613a13..31c8024 100644 --- a/scraper.go +++ b/scraper.go @@ -2,6 +2,7 @@ package main import ( "context" + "encoding/json" "fmt" "io" "log/slog" @@ -57,7 +58,6 @@ func parsePackagePage(html, owner, pkg string) (*PackageStats, error) { return stats, nil } -// PackageRef identifies a configured package to scrape. type PackageRef struct { Owner string Repo string // may differ from Package (e.g. Docker-Sentinel vs docker-sentinel) @@ -69,7 +69,6 @@ func (r PackageRef) Key() string { return strings.ToLower(r.Owner + "/" + r.Package) } -// fetchPackagePage downloads the GitHub packages HTML page. func fetchPackagePage(ctx context.Context, ref PackageRef) (string, error) { url := fmt.Sprintf("https://github.com/%s/%s/pkgs/container/%s", ref.Owner, ref.Repo, ref.Package) @@ -97,7 +96,6 @@ func fetchPackagePage(ctx context.Context, ref PackageRef) (string, error) { return string(body), nil } -// scrapeAll fetches and parses stats for every configured package. func scrapeAll(ctx context.Context, packages []PackageRef, cache *Cache, log *slog.Logger) { for _, ref := range packages { html, err := fetchPackagePage(ctx, ref) @@ -110,8 +108,167 @@ func scrapeAll(ctx context.Context, packages []PackageRef, cache *Cache, log *sl log.Warn("parse failed", "package", ref.Key(), "error", err) continue } + + sizes, err := fetchImageSizes(ctx, ref.Owner, ref.Package, stats.LatestVersion) + if err != nil { + log.Warn("size fetch failed", "package", ref.Key(), "error", err) + } else { + stats.PlatformSizes = sizes + } + stats.ScrapedAt = time.Now().Unix() cache.Set(ref.Key(), stats) log.Info("scraped", "package", ref.Key(), "pulls", stats.TotalPulls, "version", stats.LatestVersion) } } + +// OCI registry types (private, for JSON parsing only). + +type ociTokenResponse struct { + Token string `json:"token"` +} + +type ociManifest struct { + MediaType string `json:"mediaType"` + Manifests []ociDescriptor `json:"manifests,omitempty"` + Layers []ociDescriptor `json:"layers,omitempty"` +} + +type ociDescriptor struct { + MediaType string `json:"mediaType"` + Digest string `json:"digest"` + Size int64 `json:"size"` + Platform *ociPlatform `json:"platform,omitempty"` +} + +type ociPlatform struct { + OS string `json:"os"` + Architecture string `json:"architecture"` +} + +func parseManifestSizes(body []byte, fetchManifest func(digest string) ([]byte, error)) (map[string]int64, error) { + var m ociManifest + if err := json.Unmarshal(body, &m); err != nil { + return nil, fmt.Errorf("unmarshal manifest: %w", err) + } + + if len(m.Manifests) > 0 { + return parseIndexSizes(m.Manifests, fetchManifest) + } + if len(m.Layers) > 0 { + return parseSingleSizes(m.Layers), nil + } + return nil, fmt.Errorf("manifest has no manifests[] or layers[]") +} + +func parseIndexSizes(descriptors []ociDescriptor, fetchManifest func(string) ([]byte, error)) (map[string]int64, error) { + sizes := make(map[string]int64) + for _, d := range descriptors { + if d.Platform == nil || d.Platform.OS == "" || d.Platform.OS == "unknown" { + continue + } + raw, err := fetchManifest(d.Digest) + if err != nil { + continue + } + var pm ociManifest + if err := json.Unmarshal(raw, &pm); err != nil { + continue + } + key := d.Platform.OS + "/" + d.Platform.Architecture + for _, l := range pm.Layers { + sizes[key] += l.Size + } + } + if len(sizes) == 0 { + return nil, fmt.Errorf("no platform manifests resolved") + } + return sizes, nil +} + +func parseSingleSizes(layers []ociDescriptor) map[string]int64 { + var total int64 + for _, l := range layers { + total += l.Size + } + return map[string]int64{"": total} +} + +const ( + ghcrTokenURL = "https://ghcr.io/token?scope=repository:%s/%s:pull" + ghcrManifestURL = "https://ghcr.io/v2/%s/%s/manifests/%s" + ociAccept = "application/vnd.oci.image.index.v1+json, " + + "application/vnd.docker.distribution.manifest.list.v2+json, " + + "application/vnd.oci.image.manifest.v1+json, " + + "application/vnd.docker.distribution.manifest.v2+json" +) + +func fetchImageSizes(ctx context.Context, owner, pkg, version string) (map[string]int64, error) { + token, err := fetchGHCRToken(ctx, owner, pkg) + if err != nil { + return nil, fmt.Errorf("token: %w", err) + } + + tag := version + if tag == "" { + tag = "latest" + } + + body, err := fetchManifestByTag(ctx, owner, pkg, tag, token) + if err != nil && tag != "latest" { + body, err = fetchManifestByTag(ctx, owner, pkg, "latest", token) + } + if err != nil { + return nil, err + } + + return parseManifestSizes(body, func(digest string) ([]byte, error) { + return fetchManifestByTag(ctx, owner, pkg, digest, token) + }) +} + +func fetchGHCRToken(ctx context.Context, owner, pkg string) (string, error) { + url := fmt.Sprintf(ghcrTokenURL, strings.ToLower(owner), strings.ToLower(pkg)) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + return "", err + } + + resp, err := httpClient.Do(req) + if err != nil { + return "", err + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("token endpoint HTTP %d", resp.StatusCode) + } + + var tr ociTokenResponse + if err := json.NewDecoder(resp.Body).Decode(&tr); err != nil { + return "", fmt.Errorf("decode token: %w", err) + } + return tr.Token, nil +} + +func fetchManifestByTag(ctx context.Context, owner, pkg, ref, token string) ([]byte, error) { + url := fmt.Sprintf(ghcrManifestURL, strings.ToLower(owner), strings.ToLower(pkg), ref) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + return nil, err + } + req.Header.Set("Authorization", "Bearer "+token) + req.Header.Set("Accept", ociAccept) + + resp, err := httpClient.Do(req) + if err != nil { + return nil, err + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return nil, fmt.Errorf("manifest HTTP %d for %s", resp.StatusCode, ref) + } + + return io.ReadAll(io.LimitReader(resp.Body, 1<<20)) // 1 MiB cap +} diff --git a/scraper_test.go b/scraper_test.go index 48fb4b1..cab42f1 100644 --- a/scraper_test.go +++ b/scraper_test.go @@ -1,6 +1,7 @@ package main import ( + "fmt" "os" "testing" ) @@ -103,3 +104,49 @@ func TestParsePackages_ThreePart(t *testing.T) { t.Errorf("Package = %q, want %q", ref.Package, "docker-sentinel") } } + +func TestParseManifestSizes_Index(t *testing.T) { + indexBody := loadFixture(t, "manifest-index.json") + amd64Body := loadFixture(t, "manifest-amd64.json") + arm64Body := loadFixture(t, "manifest-arm64.json") + + fetcher := func(digest string) ([]byte, error) { + switch digest { + case "sha256:amd64digest": + return []byte(amd64Body), nil + case "sha256:arm64digest": + return []byte(arm64Body), nil + default: + return nil, fmt.Errorf("unexpected digest: %s", digest) + } + } + + sizes, err := parseManifestSizes([]byte(indexBody), fetcher) + if err != nil { + t.Fatal(err) + } + if len(sizes) != 2 { + t.Fatalf("got %d platforms, want 2", len(sizes)) + } + if sizes["linux/amd64"] != 10000000 { + t.Errorf("amd64 = %d, want 10000000", sizes["linux/amd64"]) + } + if sizes["linux/arm64"] != 9000000 { + t.Errorf("arm64 = %d, want 9000000", sizes["linux/arm64"]) + } +} + +func TestParseManifestSizes_Single(t *testing.T) { + body := loadFixture(t, "manifest-single.json") + + sizes, err := parseManifestSizes([]byte(body), nil) + if err != nil { + t.Fatal(err) + } + if len(sizes) != 1 { + t.Fatalf("got %d entries, want 1", len(sizes)) + } + if sizes[""] != 12000000 { + t.Errorf("size = %d, want 12000000", sizes[""]) + } +} diff --git a/server.go b/server.go index 2850129..1af7605 100644 --- a/server.go +++ b/server.go @@ -5,10 +5,10 @@ import ( "fmt" "net/http" "path" + "sort" "strings" ) -// newMux returns an http.Handler that serves badge endpoints. func newMux(cache *Cache) http.Handler { mux := http.NewServeMux() mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { @@ -81,7 +81,7 @@ func buildBadge(badgeType string, stats *PackageStats) (BadgeResponse, bool) { return BadgeResponse{ SchemaVersion: 1, Label: "image size", - Message: formatBytes(stats.SizeBytes), + Message: formatSizeMessage(stats.PlatformSizes), Color: "blue", }, true @@ -131,3 +131,32 @@ func formatBytes(b int64) string { return fmt.Sprintf("%d B", b) } } + +func formatSizeMessage(sizes map[string]int64) string { + if len(sizes) == 0 { + return "unknown" + } + + // Single entry with empty key: unknown platform, plain size. + if len(sizes) == 1 { + for k, v := range sizes { + if k == "" { + return formatBytes(v) + } + return formatBytes(v) + " (" + strings.TrimPrefix(k, "linux/") + ")" + } + } + + // Multiple platforms: sorted alphabetically by arch. + keys := make([]string, 0, len(sizes)) + for k := range sizes { + keys = append(keys, k) + } + sort.Strings(keys) + + parts := make([]string, len(keys)) + for i, k := range keys { + parts[i] = formatBytes(sizes[k]) + " (" + strings.TrimPrefix(k, "linux/") + ")" + } + return strings.Join(parts, " | ") +} diff --git a/server_test.go b/server_test.go index 08a6e60..e3b11b7 100644 --- a/server_test.go +++ b/server_test.go @@ -15,7 +15,11 @@ func seedCache() *Cache { TotalPulls: 433, LatestVersion: "2.11.1", Architectures: []string{"linux/amd64", "linux/arm64"}, - ScrapedAt: 1710000000, + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + "linux/arm64": 83000000, + }, + ScrapedAt: 1710000000, }) return c } @@ -110,3 +114,63 @@ func TestBadgeHandler_UnknownBadge(t *testing.T) { t.Errorf("status = %d, want 404", w.Code) } } + +func TestBuildBadge_Size_MultiPlatform(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + "linux/arm64": 83000000, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "82.5 MB (amd64) | 79.2 MB (arm64)" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_SinglePlatformLabelled(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "linux/amd64": 86476951, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "82.5 MB (amd64)" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_SingleUnknownPlatform(t *testing.T) { + stats := &PackageStats{ + PlatformSizes: map[string]int64{ + "": 12000000, + }, + } + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + want := "11.4 MB" + if badge.Message != want { + t.Errorf("message = %q, want %q", badge.Message, want) + } +} + +func TestBuildBadge_Size_NilMap(t *testing.T) { + stats := &PackageStats{} + badge, ok := buildBadge("size", stats) + if !ok { + t.Fatal("buildBadge returned false") + } + if badge.Message != "unknown" { + t.Errorf("message = %q, want %q", badge.Message, "unknown") + } +} diff --git a/testdata/manifest-amd64.json b/testdata/manifest-amd64.json new file mode 100644 index 0000000..1ffd650 --- /dev/null +++ b/testdata/manifest-amd64.json @@ -0,0 +1,8 @@ +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 3000000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 7000000 } + ] +} diff --git a/testdata/manifest-arm64.json b/testdata/manifest-arm64.json new file mode 100644 index 0000000..f5bfcf8 --- /dev/null +++ b/testdata/manifest-arm64.json @@ -0,0 +1,8 @@ +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 2500000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 6500000 } + ] +} diff --git a/testdata/manifest-index.json b/testdata/manifest-index.json new file mode 100644 index 0000000..3e03351 --- /dev/null +++ b/testdata/manifest-index.json @@ -0,0 +1,30 @@ +{ + "mediaType": "application/vnd.oci.image.index.v1+json", + "schemaVersion": 2, + "manifests": [ + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:amd64digest", + "size": 2197, + "platform": { "os": "linux", "architecture": "amd64" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:arm64digest", + "size": 2197, + "platform": { "os": "linux", "architecture": "arm64" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:attestation1", + "size": 565, + "platform": { "os": "unknown", "architecture": "unknown" } + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:attestation2", + "size": 565, + "platform": { "os": "unknown", "architecture": "unknown" } + } + ] +} diff --git a/testdata/manifest-single.json b/testdata/manifest-single.json new file mode 100644 index 0000000..9986db8 --- /dev/null +++ b/testdata/manifest-single.json @@ -0,0 +1,8 @@ +{ + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "schemaVersion": 2, + "layers": [ + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 4000000 }, + { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 8000000 } + ] +} diff --git a/types.go b/types.go index e694a7a..b190423 100644 --- a/types.go +++ b/types.go @@ -2,44 +2,38 @@ package main import "sync" -// PackageStats holds scraped stats for a single GHCR package. type PackageStats struct { Owner string Package string TotalPulls int LatestVersion string Architectures []string - SizeBytes int64 // from OCI manifest, 0 if unknown + PlatformSizes map[string]int64 // key: "linux/amd64" or "" for unknown platform; value: compressed bytes ScrapedAt int64 // unix timestamp } -// Cache is a concurrency-safe store for scraped package stats. type Cache struct { mu sync.RWMutex stats map[string]*PackageStats // key: "owner/package" } -// NewCache returns an initialised Cache. func NewCache() *Cache { return &Cache{stats: make(map[string]*PackageStats)} } -// Get returns the stats for a package, or nil if not found. func (c *Cache) Get(key string) *PackageStats { c.mu.RLock() defer c.mu.RUnlock() return c.stats[key] } -// Set stores stats for a package. func (c *Cache) Set(key string, s *PackageStats) { c.mu.Lock() defer c.mu.Unlock() c.stats[key] = s } -// BadgeResponse is the shields.io endpoint badge schema. -// See: https://shields.io/badges/endpoint-badge +// BadgeResponse is the shields.io endpoint-badge schema: https://shields.io/badges/endpoint-badge type BadgeResponse struct { SchemaVersion int `json:"schemaVersion"` Label string `json:"label"` diff --git a/wiki/API-Reference.md b/wiki/API-Reference.md new file mode 100644 index 0000000..fd3078b --- /dev/null +++ b/wiki/API-Reference.md @@ -0,0 +1,70 @@ +# API Reference + +pkgbadge exposes a single HTTP endpoint that serves badge data in the [shields.io endpoint schema](https://shields.io/badges/endpoint-badge). + +## Endpoint + +``` +GET /{owner}/{package}/{badge}.json +``` + +### Path Parameters + +| Parameter | Description | +|-----------|-------------| +| `owner` | GitHub user or organisation (case-insensitive) | +| `package` | GHCR package name (case-insensitive) | +| `badge` | Badge type: `pulls`, `version`, `size`, or `arch` | + +The `.json` extension is required. + +### Success Response (200) + +```json +{ + "schemaVersion": 1, + "label": "ghcr pulls", + "message": "1.5k", + "color": "blue" +} +``` + +Headers: +- `Content-Type: application/json` +- `Cache-Control: max-age=3600` + +### Error Responses (404) + +**Invalid path format** (not exactly 3 path segments): + +```json +{"error": "expected /owner/package/badge.json"} +``` + +**Wrong file extension:** + +```json +{"error": "expected .json extension"} +``` + +**Package not in configured list:** + +```json +{"error": "package not configured"} +``` + +**Unknown badge type:** + +```json +{"error": "unknown badge type: foo"} +``` + +## Caching Behaviour + +Badge responses include a `Cache-Control: max-age=3600` header, telling clients (and shields.io) to cache the response for 1 hour. The scraper interval controls how often pkgbadge fetches fresh data from GitHub; the cache header controls how often downstream consumers re-fetch from pkgbadge. + +For shields.io specifically: shields.io has its own caching layer, so updates may take a few minutes to appear even after pkgbadge has fresh data. You can append `&cacheSeconds=300` to the shields.io URL to reduce this. + +## Health Check + +There is no dedicated health endpoint. To check if the service is running, request any valid badge URL and check for a 200 response, or request an invalid path and check for a 404 (which still indicates the server is up). diff --git a/wiki/Badge-Usage.md b/wiki/Badge-Usage.md new file mode 100644 index 0000000..1b34774 --- /dev/null +++ b/wiki/Badge-Usage.md @@ -0,0 +1,77 @@ +# Badge Usage + +pkgbadge serves [shields.io endpoint badges](https://shields.io/badges/endpoint-badge). Each badge is a JSON endpoint that shields.io fetches and renders as an SVG. + +## Available Badges + +| Badge | Endpoint | Label | Colour | Example Output | +|-------|----------|-------|--------|----------------| +| Pull count | `pulls.json` | ghcr pulls | blue | `433`, `1.5k`, `2.3M` | +| Version | `version.json` | version | green | `2.11.1`, `latest` | +| Image size | `size.json` | image size | blue | `82.5 MB (amd64)`, `82.5 MB (amd64) \| 79.2 MB (arm64)` | +| Platforms | `arch.json` | platforms | blue | `amd64 \| arm64` | + +## Using with shields.io + +The shields.io [endpoint badge](https://shields.io/badges/endpoint-badge) fetches JSON from a URL you provide and renders it as an SVG image. Point it at your pkgbadge instance: + +``` +https://img.shields.io/endpoint?url=///.json +``` + +### Markdown Examples + +```markdown +![GHCR Pulls](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/pulls.json) +![Version](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/version.json) +![Image Size](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/size.json) +![Platforms](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/arch.json) +``` + +### With Link to GHCR Package Page + +```markdown +[![GHCR Pulls](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/pulls.json)](https://github.com/owner/repo/pkgs/container/package) +``` + +### Shields.io Style Overrides + +You can append shields.io query parameters to customise the badge appearance: + +```markdown +![Pulls](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/pulls.json&style=flat-square) +![Version](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/version.json&style=for-the-badge) +![Pulls](https://img.shields.io/endpoint?url=https://badges.example.com/owner/package/pulls.json&color=orange&label=downloads) +``` + +See the [shields.io endpoint docs](https://shields.io/badges/endpoint-badge) for all available parameters (style, color, label, logo, etc). + +## Number Formatting + +Pull counts are formatted for readability: + +| Raw Value | Displayed | +|-----------|-----------| +| 0–999 | As-is (`433`) | +| 1,000–999,999 | Thousands (`1.5k`) | +| 1,000,000+ | Millions (`2.3M`) | + +Image sizes use binary units. Multi-arch images show per-platform sizes (e.g. `82.5 MB (amd64) | 79.2 MB (arm64)`): + +| Raw Bytes | Displayed | +|-----------|-----------| +| 0 or nil | `unknown` | +| < 1 KiB | `512 B` | +| < 1 MiB | `45.2 KB` | +| < 1 GiB | `12.4 MB` | +| >= 1 GiB | `1.2 GB` | + +## Package Name in Badge URL + +The badge URL uses the **package name** (always lowercase), not the repository name. If you configured `Will-Luck/Docker-Sentinel/docker-sentinel`, the badge URL uses `docker-sentinel`: + +``` +/will-luck/docker-sentinel/pulls.json +``` + +Both owner and package are case-insensitive in the badge URL (lowercased internally). diff --git a/wiki/Configuration-Reference.md b/wiki/Configuration-Reference.md new file mode 100644 index 0000000..878a078 --- /dev/null +++ b/wiki/Configuration-Reference.md @@ -0,0 +1,55 @@ +# Configuration Reference + +All configuration is via environment variables. No config file is needed. + +## Environment Variables + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `PKGBADGE_PACKAGES` | Yes | *(required)* | Comma-separated list of GHCR packages to scrape | +| `PKGBADGE_INTERVAL` | No | `6h` | How often to re-scrape. Accepts any [Go duration](https://pkg.go.dev/time#ParseDuration): `30m`, `1h`, `12h`, etc. | +| `PKGBADGE_PORT` | No | `8080` | HTTP listen port | + +## Package Format + +Each entry in `PKGBADGE_PACKAGES` is either: + +- **`owner/package`**: the GitHub repo name matches the GHCR package name +- **`owner/repo/package`**: they differ + +### Why Two Formats? + +GitHub Container Registry package names are always lowercase, but repository names can use mixed case. For example, the repo `Will-Luck/Docker-Sentinel` publishes a package called `docker-sentinel`. To scrape it, pkgbadge needs both names: + +``` +Will-Luck/Docker-Sentinel/docker-sentinel +``` + +The 2-part format `Will-Luck/docker-sentinel` assumes the repo name equals the package name, which would look for a repo called `docker-sentinel` (wrong). + +### Examples + +```bash +# Single package (repo name matches package name) +PKGBADGE_PACKAGES="willfarrell/autoheal" + +# Single package (repo name differs from package name) +PKGBADGE_PACKAGES="Will-Luck/Docker-Sentinel/docker-sentinel" + +# Multiple packages +PKGBADGE_PACKAGES="Will-Luck/Docker-Sentinel/docker-sentinel,Will-Luck/Docker-Guardian/docker-guardian" +``` + +Whitespace around commas and entries is trimmed automatically. + +## Scrape Interval + +The scraper runs once at startup (blocking, before the HTTP server starts), then repeats on the configured interval. GitHub package pages are public HTML, so no API token is needed, but aggressive scraping may trigger rate limits. + +Recommended intervals: + +| Use case | Interval | +|----------|----------| +| Development/testing | `5m` | +| General use | `1h` – `6h` | +| Low-traffic packages | `12h` – `24h` | diff --git a/wiki/Home.md b/wiki/Home.md new file mode 100644 index 0000000..24c79c9 --- /dev/null +++ b/wiki/Home.md @@ -0,0 +1,11 @@ +# pkgbadge + +Self-hosted badge server for GitHub Container Registry. Scrapes GHCR package pages and the OCI registry API to serve [shields.io endpoint badges](https://shields.io/badges/endpoint-badge) with pull counts, versions, image sizes, and platform info. + +## Pages + +- [[Installation]]: Docker CLI, Docker Compose, building from source +- [[Configuration Reference]]: environment variables and package format +- [[Badge Usage]]: available badges and shields.io integration +- [[API Reference]]: HTTP endpoint, response format, status codes +- [[Troubleshooting]]: common issues and debugging diff --git a/wiki/Installation.md b/wiki/Installation.md new file mode 100644 index 0000000..ba3013c --- /dev/null +++ b/wiki/Installation.md @@ -0,0 +1,93 @@ +# Installation + +## Docker CLI + +```bash +docker run -d \ + --name pkgbadge \ + --restart unless-stopped \ + -p 8080:8080 \ + -e PKGBADGE_PACKAGES="owner/repo,owner/repo/package" \ + ghcr.io/will-luck/pkgbadge:latest +``` + +## Docker Compose + +```yaml +services: + pkgbadge: + image: ghcr.io/will-luck/pkgbadge:latest + ports: + - "8080:8080" + environment: + PKGBADGE_PACKAGES: "owner/repo,owner/repo/package" + # PKGBADGE_INTERVAL: "6h" + # PKGBADGE_PORT: "8080" + restart: unless-stopped +``` + +```bash +docker compose up -d +``` + +## Docker Swarm + +```yaml +services: + pkgbadge: + image: ghcr.io/will-luck/pkgbadge:latest + environment: + PKGBADGE_PACKAGES: "owner/repo" + PKGBADGE_INTERVAL: "1h" + ports: + - "8080:8080" + deploy: + replicas: 1 + restart_policy: + condition: on-failure +``` + +Each replica maintains its own in-memory cache and scrapes independently. There is no shared state, so running multiple replicas behind a load balancer works without any extra configuration. + +## Building from Source + +Requires Go 1.24 or later. + +```bash +git clone https://github.com/Will-Luck/pkgbadge.git +cd pkgbadge +go build -o pkgbadge . +``` + +Run it: + +```bash +export PKGBADGE_PACKAGES="owner/repo" +./pkgbadge +``` + +## Building the Docker Image + +```bash +git clone https://github.com/Will-Luck/pkgbadge.git +cd pkgbadge +docker build -t pkgbadge . +``` + +The Dockerfile uses a multi-stage build: compiles with `golang:1.24-alpine`, then copies the binary into `gcr.io/distroless/static-debian12` for a minimal runtime image with no shell or package manager. + +## Reverse Proxy + +pkgbadge serves plain HTTP. Put it behind a reverse proxy (Nginx, Caddy, Traefik) for TLS. + +Example Nginx location block: + +```nginx +location / { + proxy_pass http://127.0.0.1:8080; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; +} +``` + +The shields.io endpoint URL in your README badges should point to the public HTTPS address, not the internal HTTP port. diff --git a/wiki/Troubleshooting.md b/wiki/Troubleshooting.md new file mode 100644 index 0000000..9ba87aa --- /dev/null +++ b/wiki/Troubleshooting.md @@ -0,0 +1,71 @@ +# Troubleshooting + +## Badge Shows "unknown" + +The version and architecture badges return `unknown` when the scraper could not extract that field from the GitHub packages page. The size badge returns `unknown` when the OCI registry API could not be reached or the manifest could not be parsed. Possible causes: + +- **Package is new with no published versions yet.** The packages page won't have version or architecture data until at least one tagged version is pushed. +- **GitHub changed their HTML structure.** The scraper uses regex patterns to extract version and architecture data. If GitHub redesigns the packages page, the patterns may stop matching. Check the logs for `parse failed` warnings. +- **OCI registry unreachable or rate-limited.** The size badge fetches manifests from `ghcr.io/v2/`. If the registry is down or rate-limiting, size will show `unknown`. Check the logs for `size fetch failed` warnings. +- **Private package.** The OCI token fetch uses anonymous auth. Private packages will always show `unknown` for size. +- **Scrape hasn't completed yet.** On startup, pkgbadge scrapes all packages before starting the HTTP server. If you see `unknown` immediately after a restart, wait for the initial scrape to finish. + +## Badge Returns 404 "package not configured" + +The requested `owner/package` combination is not in your `PKGBADGE_PACKAGES` list. Check: + +1. The badge URL uses the **package name**, not the repository name. For `Will-Luck/Docker-Sentinel/docker-sentinel`, the badge URL is `/will-luck/docker-sentinel/pulls.json`. +2. The package is in your `PKGBADGE_PACKAGES` environment variable. +3. Spelling and case. The lookup is case-insensitive, but the package must match exactly what you configured (minus case). + +## Scrape Failures in Logs + +``` +level=WARN msg="scrape failed, keeping stale data" package=owner/package error="HTTP 429 from ..." +``` + +GitHub may rate-limit requests if you scrape too frequently or have many packages. Increase `PKGBADGE_INTERVAL` or reduce the number of configured packages. + +When a scrape fails, pkgbadge keeps serving the last successfully scraped data. Badges will not go blank. If no data has ever been scraped for a package (first boot + immediate failure), that package will return 404 until a successful scrape. + +## Parse Failures in Logs + +``` +level=WARN msg="parse failed" package=owner/package error="no data extracted from page, HTML structure may have changed" +``` + +This means the page was fetched, but none of the regex patterns matched. Most likely GitHub changed their HTML. Open an issue with the current HTML structure and the patterns can be updated. + +## Startup Fails with "invalid PKGBADGE_PACKAGES" + +The package format is `owner/package` or `owner/repo/package`. Common mistakes: + +- Missing owner: `docker-sentinel` (needs `owner/docker-sentinel`) +- Too many segments: `github.com/owner/repo/package` (only 2 or 3 segments allowed) +- Trailing comma with empty entry (trimmed automatically, but double commas with spaces may parse oddly) + +## Startup Fails with "invalid PKGBADGE_INTERVAL" + +The interval must be a valid [Go duration string](https://pkg.go.dev/time#ParseDuration): `30m`, `1h`, `6h`, `24h`. Common mistakes: + +- Using `6` instead of `6h` (number alone is not valid) +- Using `1d` (Go durations don't support days, use `24h`) +- Using `6 hours` (no spaces or words, just the short format) + +## Shields.io Shows Stale Data + +Shields.io caches badge responses on their end. Even after pkgbadge has fresh data, shields.io may serve its cached version for several minutes. Options: + +- Append `&cacheSeconds=300` to the shields.io URL to lower their cache time +- Wait a few minutes for the shields.io cache to expire +- The `Cache-Control: max-age=3600` header from pkgbadge tells shields.io how long to cache. If you need faster updates, this value is set in the source code (`server.go`). + +## Container Won't Start + +Check the logs: + +```bash +docker logs pkgbadge +``` + +The most common startup failure is a missing or invalid `PKGBADGE_PACKAGES`. The service exits immediately with an error message if this variable is missing or malformed. diff --git a/wiki/_Sidebar.md b/wiki/_Sidebar.md new file mode 100644 index 0000000..e689a6c --- /dev/null +++ b/wiki/_Sidebar.md @@ -0,0 +1,8 @@ +### pkgbadge + +- [[Home]] +- [[Installation]] +- [[Configuration Reference]] +- [[Badge Usage]] +- [[API Reference]] +- [[Troubleshooting]] diff --git a/wiki/push-wiki.sh b/wiki/push-wiki.sh new file mode 100755 index 0000000..96b4788 --- /dev/null +++ b/wiki/push-wiki.sh @@ -0,0 +1,16 @@ +#!/bin/bash +# Run this after initialising the wiki via GitHub web UI +# (create any page, then this script replaces it with full content) +set -e + +rm -rf /tmp/pkgbadge-wiki-push +git clone "https://$(gh auth token)@github.com/Will-Luck/pkgbadge.wiki.git" /tmp/pkgbadge-wiki-push +cd /tmp/pkgbadge-wiki-push + +cp "$(dirname "$0")"/*.md . +rm -f push-wiki.sh + +git add -A +git commit -m "docs: full wiki documentation" +git push origin master +echo "Wiki published."