feat: Add Helm chart for Kubernetes deployment#50
feat: Add Helm chart for Kubernetes deployment#50yashGoyal40 wants to merge 26 commits intorepowise-dev:mainfrom
Conversation
Adds a production-ready Helm chart under charts/repowise/ that enables deploying Repowise to any Kubernetes cluster. Includes templates for Deployment, Service, PVC, Ingress, Secret, and ServiceAccount with full configurability via values.yaml. Closes repowise-dev#49 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The HTTPProxy was sending all traffic to the frontend (port 3000). Now /api/*, /health, and /metrics are routed directly to the backend (port 7337), while everything else goes to the frontend. Also replaced the Ingress template with Contour HTTPProxy with wildcard TLS support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend exposes /health, not /api/health. The provider-section component was calling the wrong endpoint causing "Server returned non-healthy status" on every self-hosted deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…api/health" This reverts commit af52058.
Restores the standard networking.k8s.io/v1 Ingress template so the chart works out of the box on any Kubernetes cluster, not just those running Contour. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a post-install/upgrade Kubernetes Job that clones repos declared in values.yaml into /data/repos/, registers them with the Repowise API, and triggers an initial sync. Supports private repos via GitHub PAT or an existing git-credentials Secret. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- initContainer (bitnami/git) clones repos into /data/repos/ before the main app starts - Sidecar container (curlimages/curl) waits for API health, registers each repo via POST /api/repos, and triggers sync - Supports private repos via GitHub PAT or existing git-credentials Secret - Removed the post-install Job approach (PVC ReadWriteOnce conflict) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase liveness probe timeout to 15s and failureThreshold to 10 to prevent pod kills during CPU-intensive indexing - Sidecar registers repos one-by-one, waits for each sync to complete before starting the next (prevents SQLite database lock) - Skip sync for repos that already have a head_commit (already indexed) - Remove old repo-init-scripts ConfigMap (script is now inline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Indexing large repos is so CPU-intensive that the /health endpoint becomes unresponsive, causing the liveness probe to kill the container repeatedly. Disabled liveness probe by default — readiness probe is kept (it only removes from service, doesn't restart). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds optional PostgreSQL deployment (pgvector/pgvector:pg16) that replaces SQLite, eliminating "database is locked" errors during heavy indexing. Repowise app code already supports PostgreSQL natively. - StatefulSet with PVC for PostgreSQL data - Conditional REPOWISE_DB_URL (asyncpg when PG enabled, aiosqlite otherwise) - wait-for-postgres initContainer ensures DB is ready before app starts - pgvector image includes vector extension for semantic search - Fully backward compatible: postgresql.enabled=false keeps SQLite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PostgreSQL eliminates SQLite's "database is locked" errors during heavy indexing and enables concurrent API access. Uses pgvector image for vector search support. SQLite still available via postgresql.enabled=false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With PostgreSQL as default, there's no SQLite lock issue. Repos now trigger sync in parallel without waiting for each to complete. Still skips already-indexed repos (head_commit check). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
initContainer clones repos as root but app runs as uid 1000. Git refuses to read repos with different ownership. Fix: write a .gitconfig with safe.directory=* into /data and set HOME for the app container. This enables hotspots, ownership, and architecture graph features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The register-repos sidecar now sleeps forever after completing its work. This prevents k8s from restarting it in a loop (containers that exit get restarted by default in a pod). Also bumps PostgreSQL to max_connections=4000, shared_buffers=2GB, 8Gi memory limit for heavy indexing workloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the Helm chart, the structure is well organized! There are some issues to sort out though:
The chart scaffolding itself is solid. Happy to re-review once these are addressed. |
… images - database.py: pool_size/max_overflow now behind REPOWISE_HIGH_CONCURRENCY env var (off by default, auto-enabled in Helm). Keeps default behavior safe while enabling 3000 concurrent connections for production indexing workloads. - Converted register-repos sidecar to init container (writes /data/repos.json manifest instead of sleep infinity). PVC is ReadWriteOnce so a separate Job can't mount it. - Pinned bitnami/git to 2.47.1 (was :latest). - Bumped PostgreSQL max_connections to 10000 for high-concurrency indexing. - Added credential warnings in values.yaml and README. - Added recommended production resource limits. - Clarified Dockerfile is user-built in README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review! All points addressed, Helm chart tested end-to-end on a live GKE cluster. Here's the full breakdown: 1. PostgreSQL enabled by defaultKeeping 2. database.py pool settingsMoved the aggressive pool settings behind a The pool_size=1000 / max_overflow=2000 (up to 3000 concurrent connections) is what we run in production for indexing large monorepos. Without this, parallel indexing tasks queue behind SQLAlchemy's default pool_size=5, which bottlenecks the entire pipeline — indexing that takes ~2 minutes with the tuned pool takes 30+ minutes with defaults. The Helm chart's PostgreSQL is configured with Since it's env-gated, users who don't set 3. register-repos sidecar → init containerConverted to an init container. The sidecar couldn't be a Job because the PVC is 4. DockerfileUpdated the Dockerfile with production fixes:
Pre-built image published on Docker Hub: 5. Pin image tagsPinned 6. Default credentialsAdded Additional changesWhile working on the Helm chart, also landed a few fixes that came up during production testing — closely related to making the chart work end-to-end:
Tested on live GKE clusterFresh
All changes pushed, conflicts with main resolved. |
…e updates - webhooks.py: fix bug where webhook created sync jobs but never launched them. Jobs now actually execute via asyncio.create_task(). Added concurrent job protection (skip if pending/running job exists for same repo). Added missing session.commit() before launching background task. - job_executor.py: git fetch + reset --hard before indexing so webhook-triggered syncs always index the latest code, not stale local state. - app.py: mount MCP streamable HTTP server at /api with session_manager lifecycle in lifespan. DNS rebinding protection disabled for reverse proxy support behind ingress. - Dockerfile: empty NEXT_PUBLIC_REPOWISE_API_URL for relative API calls (works behind any reverse proxy). Install postgres extra by default. - README: added MCP server docs and webhook configuration guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…feat/helm-chart # Conflicts: # packages/server/src/repowise/server/routers/webhooks.py
…xtra) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dockerfile now installs ".[all]" which includes postgres + graph-extra via the new all extra in pyproject.toml - Helm chart defaults to yashgoyal04/repowise on Docker Hub (pre-built) - README updated with pre-built image quickstart + build-your-own option Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… at build time) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bian Alpine uses musl libc, python:3.12-slim is Debian/glibc — the copied node binary fails with "required file not found". Install from nodesource apt repo instead, matching the approach that worked before the security hardening PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alpine doesn't have /bin/bash. Replace heredoc with printf for POSIX sh compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@RaghavChamadiya Ready for re-review when you get a chance. All feedback addressed + tested on a live GKE cluster. |
Summary
charts/repowise/) for deploying Repowise on Kubernetesvalues.yamlwith support for LLM API keys, persistence, resource limits, ingress, and existing secretsCloses #49
What's included
deployment.yamlservice.yamlpvc.yaml/data(SQLite DB + indexed repos)secret.yamlexistingSecret)ingress.yamlserviceaccount.yamlUsage
Test plan
helm lint charts/repowisepasses cleanhelm template test charts/repowiserenders all manifests correctly🤖 Generated with Claude Code