strongdm · datashaman · Mar 29, 2026
diff --git a/attractor-spec.md b/attractor-spec.md
@@ -140,6 +140,7 @@ Graph attributes are declared in a `graph [ ... ]` block or as top-level `key =
 | `retry_target`            | String   | `""`      | Node ID to jump to if exit is reached with unsatisfied goal gates. |
 | `fallback_retry_target`   | String   | `""`      | Secondary jump target if `retry_target` is missing or invalid. |
 | `default_fidelity`        | String   | `""`      | Default context fidelity mode when explicitly set. Empty string means "unset"; the runtime fallback is `compact` (see Section 5.4). |
+| `working_directory`       | String   | `"."`     | Working directory for all tool execution in this pipeline. Passed to the `ExecutionEnvironment` and used as the base for relative paths in file operations, shell commands, and search tools. Can be overridden per-run at invocation time. |
 
 ### 2.6 Node Attributes
 
@@ -192,6 +193,8 @@ The `shape` attribute on a node determines which handler executes it, unless ove
 | `parallelogram`   | `tool`                | External tool execution (shell command, API call). |
 | `house`           | `stack.manager_loop`  | Supervisor loop. Orchestrates observe/steer/wait cycles over a child pipeline. |
 
+**Rendering note:** `Mdiamond` and `Msquare` are Graphviz shapes that render as double-bordered diamonds and double-bordered squares respectively. Some rendering environments (particularly browser-based Graphviz WASM implementations) produce visual artifacts with these shapes. Implementations that render DOT for visualization may substitute visually cleaner shapes (e.g., `diamond` for `Mdiamond`, `octagon` for `Msquare`) in the rendering layer, while preserving the original shapes in the handler registry for node type resolution.
+
 ### 2.9 Chained Edges
 
 Chained edge declarations are syntactic sugar. The statement:
@@ -576,6 +579,15 @@ The graph traversal is single-threaded. Only one node executes at a time in the
 
 Parallelism exists within specific node handlers (`parallel`, `parallel.fan_in`) that manage concurrent execution internally. Each parallel branch receives an isolated clone of the context. Branch results are collected but individual branch context changes are not merged back into the parent -- only the handler's outcome and its `context_updates` are applied.
 
+### 3.9 Async Execution
+
+Pipeline runs may take minutes or hours to complete. In web-based implementations, synchronous execution will exceed HTTP request timeouts (typically 30-60 seconds). Implementations should support asynchronous execution:
+
+- **Job queue dispatch:** The `run()` function should be callable from a background job/worker. The pipeline's run record (status, node outcomes, checkpoint) is persisted to a database, allowing the web layer to poll or subscribe for updates.
+- **Progress reporting:** Use the event system (Section 9.6) with SSE, WebSocket, or polling to stream pipeline progress to the UI.
+- **Timeouts:** Background jobs should enforce their own timeout (recommended: 10 minutes minimum for multi-node pipelines with LLM calls), independent of HTTP timeouts.
+- **Failure recording:** If the background job fails, the error must be persisted to the run record so the UI can display it. An unhandled exception in a fire-and-forget job is invisible to the user.
+
 ---
 
 ## 4. Node Handlers
@@ -1647,6 +1659,10 @@ FOR EACH event IN pipeline.events():
     process(event)
 ```
 
+**Token aggregation across providers:** When a pipeline uses different LLM providers per node (via the model stylesheet or per-node `llm_provider`/`llm_model` attributes), token usage must be tracked per-provider in addition to any aggregate total. Token counts from different providers are not directly comparable (tokenizers differ), and cost estimation requires provider-specific rate tables. Implementations should emit per-node `Usage` data in stage events and maintain per-provider subtotals in the run record.
+
+**Implementation note:** Event emission should be decoupled from handler logic. Node handlers should return data (outcomes, context updates), and the orchestration layer should emit events based on that data. This keeps handlers testable in isolation (unit tests typically lack a framework container for logging, event dispatch, etc.) and ensures observability is consistently applied regardless of which handler runs.
+
 ### 9.7 Tool Call Hooks
 
 Graph-level or node-level attributes `tool_hooks.pre` and `tool_hooks.post` specify shell commands executed around each LLM tool call:

diff --git a/coding-agent-loop-spec.md b/coding-agent-loop-spec.md
@@ -776,6 +776,7 @@ The default. Runs everything on the local machine.
 **Command execution:**
 - Spawn in a new process group for clean killability
 - Use the platform's default shell (`/bin/bash -c` on Linux/macOS, `cmd.exe /c` on Windows)
+- Pass the command string directly to the shell interpreter -- do **not** apply shell escaping (e.g., `escapeshellcmd()` in PHP, `shlex.quote()` in Python) to the entire command. The LLM generates complete shell commands that rely on operators (`|`, `>`, `>>`, `&&`, `||`, `2>&1`, `$()`, etc.), and escaping them breaks intended functionality. Security should be enforced at the `ExecutionEnvironment` boundary (sandboxing, allowed command lists, resource limits) rather than by escaping shell metacharacters.
 - Enforce timeout: on timeout, send SIGTERM to the process group, wait 2 seconds, then SIGKILL
 - Capture stdout and stderr separately, then combine for the result
 - Record wall-clock duration

diff --git a/unified-llm-spec.md b/unified-llm-spec.md
@@ -646,10 +646,19 @@ Provider finish reason mapping:
 | Gemini    | MAX_TOKENS        | length           |
 | Gemini    | SAFETY            | content_filter   |
 | Gemini    | RECITATION        | content_filter   |
+| Gemini    | BLOCKLIST         | content_filter   |
+| Gemini    | PROHIBITED_CONTENT| content_filter   |
+| Gemini    | SPII              | content_filter   |
+| Gemini    | MALFORMED_FUNCTION_CALL | content_filter |
+| Gemini    | LANGUAGE          | other            |
+| Gemini    | OTHER             | other            |
+| Gemini    | FINISH_REASON_UNSPECIFIED | other      |
 | Gemini    | (has tool calls)  | tool_calls       |
 
 Note: Gemini does not have a dedicated "tool_calls" finish reason. The adapter infers it from the presence of `functionCall` parts in the response.
 
+Gemini also defines several image-generation and advanced finish reasons (`IMAGE_SAFETY`, `IMAGE_PROHIBITED_CONTENT`, `IMAGE_RECITATION`, `IMAGE_OTHER`, `NO_IMAGE`, `UNEXPECTED_TOOL_CALL`, `TOO_MANY_TOOL_CALLS`, `MISSING_THOUGHT_SIGNATURE`, `MALFORMED_RESPONSE`, `MODEL_ARMOR`) that are not yet common in coding agent use cases. Implementations should map these to `content_filter` or `other` as appropriate, and include the raw provider reason in error messages to aid debugging.
+
 ### 3.9 Usage
 
 ```
@@ -1346,6 +1355,8 @@ Every error carries a `retryable` property.
 | QuotaExceededError     | (varies)    | false     |
 | ContentFilterError     | (varies)    | false     |
 | RequestTimeoutError    | 408         | false     |
+
+Note: `ContentFilterError` is raised when a response's `FinishReason` is `content_filter`. This is distinct from an HTTP error -- the request succeeds (200), but the response content is blocked or empty. Gemini returns this via multiple raw values (`SAFETY`, `BLOCKLIST`, `SPII`, `PROHIBITED_CONTENT`, `RECITATION`). Implementations must preserve the raw provider reason in the error message, as the specific filter category affects whether the caller should retry with a modified prompt or abort entirely.
 | ConfigurationError     | (N/A)       | false     |
 
 **Retryable errors** (transient -- may succeed on retry):