Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,26 @@ DATABASE_URL=postgres://stackcraft:stackcraft@localhost:5433/stackcraft
# Interval (ms) between MCP stale-session sweeps
# MCP_SESSION_SWEEP_MS=60000

# --- Operation concurrency limits ---
# Cap concurrent CLI operations across services. Each operation type has its own
# pool, so e.g. a long-running build never blocks a quick `git fetch`.
# Anything below 1 or non-numeric falls back to the default.

# Max concurrent git CLI operations (clone, fetch, pull, ls-remote, branch reads).
# Network-bound; the cap mainly stops the periodic GitWatcher from saturating the
# system when many services share a stack.
# STACK_CRAFT_MAX_PARALLEL_GIT=10

# Max concurrent install operations across services. Installs are network + disk
# heavy and frequently write to shared package caches (yarn/pnpm/npm/nuget) where
# concurrent writers can race.
# STACK_CRAFT_MAX_PARALLEL_INSTALLS=3

# Max concurrent build operations across services. Builds are CPU-bound and each
# one already saturates multiple cores (tsc -b, webpack, dotnet build, …), so
# running more than a handful in parallel typically thrashes the machine.
# STACK_CRAFT_MAX_PARALLEL_BUILDS=1

# --- TLS / Remote access ---
# StackCraft does not terminate TLS itself. For remote (non-localhost) deployments,
# place a reverse proxy (e.g. Caddy, nginx, Traefik) in front of both the main
Expand Down
71 changes: 71 additions & 0 deletions .yarn/changelogs/service.2f2de4cf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<!-- version-type: minor -->
# service

## ✨ Features

### Per-operation concurrency pools (`GitOperationLimit`, `InstallOperationLimit`, `BuildOperationLimit`)

Three new injector-resolved `Semaphore` singletons (from `@furystack/utils`) cap the number of CLI operations the service runs in parallel, broken down by cost profile so a long build never blocks a quick `git fetch`:

- **`GitOperationLimit`** (default `10`) β€” caps every `GitService` call (`clone`, `fetch`, `pull`, `lsRemote`, branch reads, status). Network-bound; the cap mainly stops the periodic `GitWatcher` from saturating the system on stacks with many services.
- **`InstallOperationLimit`** (default `3`) β€” caps `OneShotCommandRunner.installService`. Installs are network + disk heavy and frequently write to shared package caches (yarn / pnpm / npm / nuget) where concurrent writers can race.
- **`BuildOperationLimit`** (default `1`) β€” caps `OneShotCommandRunner.buildService`. Each build already saturates multiple cores (tsc -b, webpack, dotnet build, …); running more than a handful in parallel typically thrashes the machine.

All three are tunable via env vars (`STACK_CRAFT_MAX_PARALLEL_GIT`, `STACK_CRAFT_MAX_PARALLEL_INSTALLS`, `STACK_CRAFT_MAX_PARALLEL_BUILDS`); non-numeric or non-positive values fall back to the default. The semaphore signal is threaded through `runCli`, so disposing the injector aborts in-flight ops with a process-group kill instead of orphaning child processes. Per-service guards (`pendingOperations` / `processes` map) stack on top β€” the conflict check still rejects a duplicate trigger immediately rather than queueing it.

The pre-existing `setupServices` batch flow (which previously fanned out 22 concurrent setups when no service dependencies were declared) now serializes naturally through these pools. Per-service status flips to `installing` / `building` only after the slot is acquired, so the audit log timestamps reflect actual work, not queue time.

### `GitService.lsRemote(url)` β€” consolidated repository accessibility probe

New method on `GitService` that runs `git ls-remote --exit-code <url>` with the standard `GitService` env hardening (`GIT_TERMINAL_PROMPT=0`, `BatchMode=yes` ssh) and shares the `GitOperationLimit` pool. Replaces the two ad-hoc `execFile('git', ['ls-remote', '--exit-code', url], { timeout: 15000 })` callers β€” `validate-repo-action` and the MCP `validate_repository` tool β€” which previously had no env hardening and no process-group kill. Default timeout is 30 s.

## πŸ› Bug Fixes

### Git update / pull no longer feels frozen

`GitService` previously used `promisify(execFile)` with a `timeout` option for every git call. When the timeout fired, Node sent SIGTERM only to the parent git process, leaving grandchildren (credential helpers, ssh, `git-remote-https`, GUI askpass dialogs) holding the inherited stdio pipes. Because `execFile`'s callback fires on stream `close` (not on parent exit), the promise could stay pending well past the nominal timeout β€” visible in audit logs as a 60-second silence followed by a bare `Command failed: git fetch --all --prune\n` with no stderr, which is the documented Node gotcha (nodejs/node#2098).

Every git invocation now goes through a new spawn-based `runCli` helper that:

- Uses `detached: true` on POSIX so the child becomes a process-group leader, then kills the whole group with `process.kill(-pid, signal)` on timeout. On Windows it walks the child tree with `taskkill /T` (escalating to `/F` only on `SIGKILL`).
- Escalates SIGTERM β†’ SIGKILL after a 2 s grace.
- Captures stderr and surfaces it in the rejection message, replacing the previous opaque `Command failed: <cmd>\n`.
- Forces non-interactive mode for git: `GIT_TERMINAL_PROMPT=0`, removes inherited `GIT_ASKPASS` / `SSH_ASKPASS`, sets `SSH_ASKPASS_REQUIRE=never`, and pins `GIT_SSH_COMMAND='ssh -o BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new'`. Missing or expired credentials now fail fast instead of waiting for a prompt nobody can answer from a backend service.

Network-side timeouts also relaxed where appropriate: `fetch` and `pull` go from 60 s to 90 s; `clone` stays at 5 min; cheap local reads stay at 5–10 s.

### `gh auth status` prerequisite check is now non-interactive

The `github-cli` prerequisite check ran `gh auth status` through `execFile` without any env hardening, so a stale GitHub CLI token could trigger a browser-based re-auth flow that blocked indefinitely. `GH_PROMPT_DISABLED=1` and `GH_NO_UPDATE_NOTIFIER=1` are now passed on the call, and the same process-group kill machinery applies on timeout.

## ♻️ Refactoring

### `runCli` helper replaces `promisify(execFile)` across the service layer

New helper in `service/src/utils/run-cli.ts` is the canonical way to invoke any CLI from the service. Public API:

```typescript
import { runCli } from '../utils/run-cli.js'

const { stdout, stderr } = await runCli('node', ['--version'], {
cwd: '/path',
timeoutMs: 30_000,
env: { GH_PROMPT_DISABLED: '1' }, // overrides; set a key to `undefined` to strip an inherited var
signal: abortController.signal, // optional; aborts via process-group kill, same machinery as timeout
})
```

Migrated callers β€” every external CLI invocation in the service now goes through `runCli` (or through `GitService`, which itself uses `runCli`):

- `validate-repo-action` and MCP `validate_repository` tool β€” now use `GitService.lsRemote` (which routes through `runCli` with git env hardening + `GitOperationLimit`).
- `check-prerequisite-action` β€” every check (`node --version`, `yarn --version` / `cmd.exe /c yarn --version` shim, `dotnet --list-sdks` / `--list-runtimes` / `nuget list source`, `git --version`, `gh auth status`, custom-script via `cmd.exe` / `/bin/sh`) now uses `runCli`.

The two file-level results: dropped `import { execFile } from 'child_process'` + `import { promisify } from 'util'` from three actions and the MCP repository-tools registrar; consolidated process-group kill, env hardening, and stderr-rich error reporting into one tested helper.

## πŸ§ͺ Tests

- Added `service/src/utils/run-cli.spec.ts` β€” env passthrough, env strip-on-`undefined`, `stdio` / `windowsHide` / `cwd`, POSIX `detached: true`, success path, non-zero exit reports stderr + cwd, child `error` event, two-stage timeout kill (POSIX + Windows branches), no-kill-before-deadline.
- Added `service/src/services/git-service-runner.spec.ts` β€” git env hardening (`GIT_TERMINAL_PROMPT=0`, askpass strip, `BatchMode=yes` ssh), spawn options, timeout-and-kill, non-zero exit error context, and `GitOperationLimit` queueing (3 concurrent fetches under `Semaphore(2)` β€” first two spawn, third waits, third spawns after first drains).
- Added `service/src/services/operation-limits.spec.ts` β€” defaults (10 / 3 / 1), valid integer overrides, and fallback on garbage / non-positive env values.
- Extended `service/src/services/one-shot-command-runner.spec.ts` β€” install and build calls serialize when their pool is capped to 1, and the install / build pools are independent (one install + one build run together). `withContext` now accepts a `setup(injector)` hook so tests can rebind operation limits before resolving the service.
- Updated `service/src/app-models/github-repositories/actions/validate-repo-action.spec.ts` and `service/src/app-models/prerequisites/actions/check-prerequisite-action.spec.ts` β€” replaced the `vi.mock('child_process')` + `vi.mock('util')` execFile mocks with mocks of `runCli` / a stubbed `GitService.lsRemote` so the specs follow the new abstraction layer.
25 changes: 25 additions & 0 deletions .yarn/changelogs/stack-craft.2f2de4cf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!-- version-type: minor -->
# stack-craft

## ✨ Features

### Operation concurrency limits exposed in `.env.example`

`.env.example` now documents three new tunables that cap concurrent CLI work in the service, so a long-running build never blocks a quick `git fetch` and the periodic `GitWatcher` cannot saturate the system on stacks with many services:

```env
# STACK_CRAFT_MAX_PARALLEL_GIT=10
# STACK_CRAFT_MAX_PARALLEL_INSTALLS=3
# STACK_CRAFT_MAX_PARALLEL_BUILDS=1
```

See the `service` changelog for the underlying `Semaphore`-backed pools and per-operation rationale (network-bound vs. shared-cache writes vs. CPU-bound).

## ♻️ Refactoring

- Every external CLI invocation in the service now goes through a single hardened spawn-based runner with cross-platform process-group kill β€” no more `promisify(execFile)` callers leaking grandchild processes past the nominal timeout. See the `service` changelog for the `runCli` helper, the `GitService` env hardening, and the migrated callers (validate-repo, MCP `validate_repository`, prerequisite checks).

## πŸ› Bug Fixes

- Fixed git update / pull operations sometimes appearing frozen for ~60 s before a bare `Command failed: …` error. Root cause and fix in the `service` changelog (nodejs/node#2098 β€” `execFile` timeout did not reach grandchild processes that held the inherited stdio pipes).
- Fixed `gh auth status` prerequisite check hanging indefinitely when GitHub CLI tried to launch an interactive browser auth flow. Details in the `service` changelog.
3 changes: 3 additions & 0 deletions .yarn/versions/2f2de4cf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
releases:
service: minor
stack-craft: minor
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,25 @@ import { GitHubRepositoryDataSet } from '../../data-store/tokens.js'
import { getDataSetFor } from '@furystack/repository'

import { beforeEach, describe, expect, it, vi } from 'vitest'

import { GitService } from '../../../services/git-service.js'
import { createMockActionContext, withTestInjector } from '../../../test-helpers.js'
import { ValidateRepoAction } from './validate-repo-action.js'
const execFileMock = vi.hoisted(() =>
vi.fn<(cmd: string, args: string[], options: { timeout: number }) => Promise<{ stdout: string; stderr: string }>>(),
)

vi.mock('child_process', () => ({
execFile: (...args: unknown[]) => execFileMock(...(args as [string, string[], { timeout: number }])),
}))
const lsRemoteMock = vi.fn<(url: string) => Promise<void>>()

vi.mock('util', () => ({
promisify: () => execFileMock,
}))
const bindGitServiceStub = (injector: { bind: (token: typeof GitService, factory: () => GitService) => void }) => {
injector.bind(GitService, () => ({ lsRemote: lsRemoteMock }) as unknown as GitService)
}

describe('ValidateRepoAction', () => {
beforeEach(() => {
vi.clearAllMocks()
lsRemoteMock.mockReset()
})

it('should return accessible: true when git ls-remote succeeds', async () => {
await withTestInjector(async ({ elevated }) => {
bindGitServiceStub(elevated)
const ts = new Date().toISOString()
await getDataSetFor(elevated, GitHubRepositoryDataSet).add(elevated, {
id: 'repo-1',
Expand All @@ -34,26 +32,21 @@ describe('ValidateRepoAction', () => {
updatedAt: ts,
})

execFileMock.mockResolvedValue({ stdout: 'abc123\tHEAD\n', stderr: '' })
lsRemoteMock.mockResolvedValue(undefined)

const result = await ValidateRepoAction(
createMockActionContext({ injector: elevated, urlParams: { id: 'repo-1' } }),
)

const body = result.chunk as { accessible: boolean }
expect(body.accessible).toBe(true)
expect(execFileMock).toHaveBeenCalledWith(
'git',
['ls-remote', '--exit-code', 'https://github.com/user/repo.git'],
{
timeout: 15000,
},
)
expect(lsRemoteMock).toHaveBeenCalledWith('https://github.com/user/repo.git')
})
})

it('should return accessible: false when git ls-remote fails', async () => {
await withTestInjector(async ({ elevated }) => {
bindGitServiceStub(elevated)
const ts = new Date().toISOString()
await getDataSetFor(elevated, GitHubRepositoryDataSet).add(elevated, {
id: 'repo-2',
Expand All @@ -65,7 +58,7 @@ describe('ValidateRepoAction', () => {
updatedAt: ts,
})

execFileMock.mockRejectedValue(new Error('Repository not found'))
lsRemoteMock.mockRejectedValue(new Error('Repository not found'))

const result = await ValidateRepoAction(
createMockActionContext({ injector: elevated, urlParams: { id: 'repo-2' } }),
Expand All @@ -79,6 +72,7 @@ describe('ValidateRepoAction', () => {

it('should throw 404 when repository does not exist', async () => {
await withTestInjector(async ({ elevated }) => {
bindGitServiceStub(elevated)
await expect(
ValidateRepoAction(createMockActionContext({ injector: elevated, urlParams: { id: 'nonexistent' } })),
).rejects.toThrow('Repository not found')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@ import { RequestError } from '@furystack/rest'
import { JsonResult, type RequestAction } from '@furystack/rest-service'
import type { ValidateRepoEndpoint } from 'common'

import { execFile } from 'child_process'
import { promisify } from 'util'
const execFileAsync = promisify(execFile)
import { GitService } from '../../../services/git-service.js'

export const ValidateRepoAction: RequestAction<ValidateRepoEndpoint> = async ({ injector, getUrlParams }) => {
const logger = getLogger(injector).withScope('ValidateRepo')
Expand All @@ -22,7 +20,7 @@ export const ValidateRepoAction: RequestAction<ValidateRepoEndpoint> = async ({
}

try {
await execFileAsync('git', ['ls-remote', '--exit-code', repo.url], { timeout: 15000 })
await injector.get(GitService).lsRemote(repo.url)
await logger.information({ message: `Repository validated: ${repo.url}` })
return JsonResult({ accessible: true })
} catch (error) {
Expand Down
Loading
Loading