27 Feb 23:42

fb0c7f4

Latest

Forkline v0.5.0 Release Notes — CI Integration

Release: v0.5.0
Date: 2026-02-25
Milestone: v0.5 — CI + Test Harness (Roadmap item 5 of 5)

Summary

This release delivers CI integration: a deterministic, offline, build-failing diff system that lets teams gate merges on behavioral identity. If an agent's output changes, the build fails — with a clear diff, a machine-readable exit code, and a suggested fix.

This completes the v0 roadmap. Forkline now covers the full loop: record → replay → diff → CI gate.

What's New

1. `forkline ci` Command Suite (`forkline/ci/commands.py`)

Five new subcommands purpose-built for CI pipelines:

Command	Purpose
`forkline ci record`	Run a script, produce a normalized JSON artifact
`forkline ci replay`	Validate an artifact's schema and structure offline
`forkline ci diff`	Compare two artifacts, exit 1 on divergence
`forkline ci check`	All-in-one: record actual, diff against expected
`forkline ci normalize`	Strip timestamps and metadata for stable diffs

ci check is the primary CI command — a single call that records behavior, normalizes the output, and diffs against a committed baseline:

forkline ci check \
  --entrypoint examples/my_flow.py \
  --expected tests/testdata/my_flow.run.json \
  --offline

Exit 0 means identical behavior. Exit 1 means the build should fail.

2. Offline Enforcement (`forkline/ci/offline.py`)

A hard no-network guarantee for CI runs. When --offline is set (or FORKLINE_OFFLINE=1), Forkline monkeypatches socket.connect, socket.create_connection, and socket.getaddrinfo to raise ForklineOfflineError immediately.

Properties:

Fail-closed. Network calls error instantly — no hangs, no timeouts.
Deterministic. Same call always produces the same error message.
Scoped. offline_context() restores normal access on exit.
Cross-library. Blocks requests, httpx, urllib3, and any library built on socket.

from forkline.ci.offline import offline_context

with offline_context():
    # Any network call raises ForklineOfflineError
    requests.get("https://api.example.com")  # raises immediately
# Normal access restored here

3. Artifact Normalization (`forkline/ci/normalize.py`)

Strips unstable fields from artifacts so that recordings made at different times, on different machines, produce identical diffs when behavior is the same.

Normalized by default:

Timestamp fields (ts, started_at, ended_at, created_at) → 2000-01-01T00:00:00+00:00
Platform metadata (python_version, platform, cwd) → removed
Events sorted by event_id

Preserved:

Event types and payloads (the behavioral data)
Schema version and entrypoint path

Normalization is applied automatically by ci record and ci diff. It can also be run explicitly:

forkline ci normalize artifact.run.json --out normalized.run.json

4. Exit Code Contract (`forkline/ci/exitcodes.py`)

A strict, stable exit code contract for CI automation. These values will not change across releases.

Code	Constant	Meaning
`0`	`EXIT_SUCCESS`	Success, no diff
`1`	`EXIT_DIFF_DETECTED`	Diff detected — fail the build
`2`	`EXIT_USAGE_ERROR`	Bad args, missing file
`3`	`EXIT_REPLAY_FAILED`	Script failed, runtime exception
`4`	`EXIT_OFFLINE_VIOLATION`	Network attempted in offline mode
`5`	`EXIT_ARTIFACT_ERROR`	Cannot parse artifact, schema error
`6`	`EXIT_INTERNAL_ERROR`	Unexpected bug

Every exit code is exercised by a dedicated test.

5. Python Test Helper (`forkline/testing.py`)

A one-line API for snapshot-style testing of agentic workflows:

from forkline.testing import assert_no_diff

def test_my_flow():
    assert_no_diff(
        entrypoint="examples/my_flow.py",
        expected_artifact="tests/testdata/my_flow.run.json",
        offline=True,
    )

On failure, raises ArtifactDiffError with:

First divergent event index
Expected vs actual payloads
Structured diff for programmatic inspection
Suggested re-record command

6. Diff Output (`forkline/ci/commands.py`)

Diff output is concise in text mode and machine-readable in JSON mode.

Text mode:

DIFF: First divergence at event[1] (type: output)
  $.answer: "4" -> "5"
  $.note: <missing> -> "wrong!"

Suggested fix: Re-record baseline: forkline ci record --entrypoint <script> --out <path>

JSON mode:

{
  "identical": false,
  "first_divergent_index": 1,
  "event_type": "output",
  "expected": {"type": "output", "payload": {"answer": "4"}},
  "actual": {"type": "output", "payload": {"answer": "5", "note": "wrong!"}},
  "payload_diff": [
    {"op": "add", "path": "$.note", "value": "wrong!"},
    {"op": "replace", "path": "$.answer", "old": "4", "new": "5"}
  ],
  "suggestion": "Re-record baseline: forkline ci record --entrypoint <script> --out <path>"
}

7. CLI Integration (`forkline/cli/init.py`)

The ci subcommand is wired into the existing forkline CLI with full argparse integration:

$ forkline ci --help
usage: forkline ci [-h] {record,replay,diff,check,normalize} ...

Commands for CI/CD pipelines: record baselines, replay artifacts,
diff for behavioral changes, and gate merges on behavioral identity.

All subcommands support --help and produce meaningful error messages on invalid usage.

8. Documentation (`docs/ci.md`)

Full CI guide covering:

Quick start (3-command workflow)
All commands with flags and descriptions
Offline mode details
Artifact normalization behavior
Recommended repo layout (tests/testdata/*.run.json)
Re-recording baselines
Python test helper usage
GitHub Actions example (copy-paste ready)
Programmatic API usage

9. Examples

Four new runnable examples:

File	Demonstrates
`examples/ci_record_and_diff.py`	Record baseline, diff identical and changed behavior, JSON output
`examples/ci_check_gate.py`	All-in-one build gate with `ci_check`
`examples/ci_offline_enforcement.py`	Offline mode blocking all network calls
`examples/ci_test_helper.py`	`assert_no_diff` for pytest/unittest

Tests

60 new tests in tests/unit/test_ci.py. All hermetic — no network, no external dependencies.

TestExitCodes (2 tests)

Test	Validates
`test_exit_code_values`	Each code has its documented integer value
`test_all_distinct`	All 7 codes are unique

TestOfflineMode (7 tests)

Test	Validates
`test_offline_context_blocks_socket`	`socket.connect` raises `ForklineOfflineError`
`test_offline_blocks_create_connection`	`socket.create_connection` blocked
`test_offline_blocks_getaddrinfo`	DNS resolution blocked
`test_offline_error_is_deterministic`	Same call → same error message
`test_offline_restores_after_context`	Normal access restored on exit
`test_enable_disable_idempotent`	Double-enable/disable is safe
`test_offline_error_attributes`	Error has `operation` attribute, includes `FORKLINE_OFFLINE`

TestNormalization (9 tests)

Test	Validates
`test_timestamps_normalized`	All timestamp fields → sentinel
`test_metadata_stripped`	Platform metadata removed
`test_metadata_preserved_when_disabled`	Opt-out works
`test_timestamps_preserved_when_disabled`	Opt-out works
`test_events_ordered_by_event_id`	Stable sort
`test_normalize_ids`	IDs replaced with sequential values
`test_normalize_deterministic`	Same input → same output
`test_normalize_json_roundtrip`	JSON string → normalize → parse
`test_original_not_mutated`	Input dict unchanged

TestCIRecord (6 tests)

Test	Validates
`test_record_success`	Produces valid artifact with schema_version and events
`test_record_missing_entrypoint`	Exit 2
`test_record_failed_script`	Exit 3
`test_record_creates_directories`	Nested output path created
`test_record_artifact_is_normalized`	Timestamps are sentinel values
`test_record_deterministic`	Two recordings → identical normalized output

TestCIReplay (6 tests)

Test	Validates
`test_replay_valid_artifact`	Exit 0, JSON output with status/event_count
`test_replay_missing_file`	Exit 2
`test_replay_invalid_json`	Exit 5
`test_replay_missing_schema_version`	Exit 5
`test_replay_strict_empty_payload`	Exit 5 in strict mode
`test_replay_not_strict_allows_empty_payload`	Exit 0 in default mode

TestCIDiff (9 tests)

Test	Validates
`test_identical_artifacts`	Exit 0, "No differences" text
`test_different_artifacts`	Exit 1, "DIFF" in output
`test_diff_json_format`	JSON output with `first_divergent_index` and `suggestion`
`test_diff_json_identical`	JSON output with `identical: true`
`test_diff_missing_expected` / `test_diff_missing_actual`	Exit 2
`test_diff_event_count_mismatch`	Exit 1, "mismatch" in output
`test_diff_bad_json`	Exit 5
`test_diff_normalizes_timestamps`	Artifacts at different times still match

TestCINormalize (4 tests)

Test	Validates
`test_normalize_in_place`	Overwrites file, timestamps normalized
`test_normalize_to_new_path`	Writes to separate output
`test_normalize_missing_file`	Exit 2
`test_normalize_bad_json`	Exit 5

TestCICheck (4 ...

Assets 2

23 Feb 23:54

sauravvenkat

v0.4.2

236d6a9

v0.4.2 - Tool Invocation Recording + Deterministic Redaction

Forkline v0.4.2 Release Notes

Tool Invocation Recording + Deterministic Redaction

Agents without tool visibility are blind. Agents with unsafe logs are
unusable in real systems. This release solves both: tool calls are now
first-class events, and sensitive data is redacted deterministically
before anything touches disk.

New: Tool Call Events

Every tool invocation — DB queries, API calls, file operations — is
now recorded as a tool_call event in the run artifact with a
canonical schema:

{
  "tool_name": "http.request",
  "invocation_id": "a1b2c3...",
  "request": { "url": "https://api.example.com" },
  "response": { "status": 200 },
  "error": null,
  "timing": {
    "started_at": "2026-02-23T10:00:00Z",
    "ended_at": "2026-02-23T10:00:00.250Z",
    "duration_ms": 250.0
  },
  "metadata": { "bytes_read": 1024, "cache_hit": false }
}

Three ways to record:

# Context manager — full control over request/response/metadata
with ToolCallRecorder(recorder, run_id, "http.get") as tc:
    tc.set_request({"url": "https://api.example.com"})
    resp = requests.get("https://api.example.com")
    tc.set_response({"status": resp.status_code})
    tc.set_metadata({"bytes_read": len(resp.content)})

# Decorator — wrap existing functions
@record_tool_call(recorder, run_id, "db.query")
def query_db(sql):
    return db.execute(sql).fetchall()

# Convenience method — manual construction
recorder.log_tool_call(
    run_id=run_id,
    tool_name="file.read",
    request={"path": "/tmp/data.txt"},
    response={"content": "hello"},
)

Replay integration: ToolCallRecorder enforces determinism
guardrails — live tool calls are blocked during replay mode. Set
allow_in_replay=True for re-execution scenarios.

New: Regex-Based Redaction

The redaction engine now supports three matching strategies:

Strategy	Matches on	Example
Key-based	Dict key names (substring, case-insensitive)	`password`, `api_key`
Path-based	Dot-separated paths	`headers.authorization`
Regex-based (new)	String values anywhere in the payload	JWTs, Bearer tokens, AWS keys

Default policy now includes regex rules that catch secrets embedded in
string values, not just key names:

eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOi...  →  [REDACTED:jwt]
Bearer sk-12345abcdef                 →  Bearer [REDACTED]
AKIAIOSFODNN7EXAMPLE                 →  [REDACTED:aws_key]

New: Redaction Config Files

Configure redaction via YAML or JSON instead of code:

# forkline.redact.yaml
fields:
  redact_keys:
    - password
    - token
    - api_key
  redact_paths:
    - "headers.Authorization"
  redact_regex:
    - name: jwt
      pattern: "eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+"
      replacement: "[REDACTED:jwt]"
    - name: connection_string
      pattern: "://[^:]+:[^@]+@"
      replacement: "://[REDACTED:credentials]@"

forkline run my_agent.py --redact-config forkline.redact.yaml

JSON configs work with zero extra dependencies. YAML requires pyyaml.

Improved: Determinism Guarantees

Dict keys are now traversed in sorted order during redaction,
eliminating dependence on construction order
Hash action uses json.dumps(sort_keys=True) for stable dict hashing
Custom replacement strings (e.g. [REDACTED:jwt]) for traceability

New API Surface

Symbol	Module	Description
`ToolCallRecorder`	`forkline.core.tool_call`	Context manager for recording tool calls
`record_tool_call`	`forkline.core.tool_call`	Decorator for wrapping functions
`ToolCallPayload`	`forkline.core.tool_call`	Canonical tool_call event schema
`ToolCallTiming`	`forkline.core.tool_call`	Timing fields dataclass
`RegexRedactionRule`	`forkline.core.redaction`	Regex-based value redaction rule
`RedactionConfig`	`forkline.core.redaction`	Config object for loading from files
`load_redaction_config()`	`forkline.core.redaction`	Load config from YAML/JSON
`RunRecorder.log_tool_call()`	`forkline.storage.recorder`	Convenience method
`RunRecorder.with_config()`	`forkline.storage.recorder`	Factory with config file
`--redact-config`	CLI	Flag on `forkline run`

Examples

examples/tool_call_basic.py — All three recording APIs
examples/tool_call_production.py — Full agentic workflow: LLM
planning, DB query, HTTP webhook with JWT auth, file write, failing
upstream call. Demonstrates redaction of connection strings, Bearer
tokens, cookies, API keys, and JWTs with zero secrets in stored
artifacts.

Test Coverage

58 new tests (316 total), covering:

Tool call payload serialization, roundtrip, JSON export/import
Context manager timing, error capture, metadata, unique invocation IDs
Decorator captures calls, dict returns, exceptions
End-to-end "no raw secrets persisted" verification
Replay mode guardrails (blocks live calls, allows re-exec)
Regex redaction: JWT, Bearer, AWS keys, nested structures
Sorted key traversal determinism across construction orders
Config loading from JSON, rule order stability
Hash determinism for dicts with different key order

Documentation

docs/tool_visibility.md — Event schema, recording APIs, replay
integration, event ordering contract
docs/redaction.md — Matching strategies, rule application order,
config format, default policy reference, determinism guarantees,
redaction pipeline diagram

Migration Notes

No breaking changes. All existing APIs and stored artifacts
continue to work unchanged.
Default redaction policy expanded. The default policy now includes
passwd as a key pattern and 3 regex rules (jwt, bearer, aws_key).
Payloads that previously passed through unredacted may now be
redacted if they contain these patterns.
Sorted key traversal. Redacted output dicts now have sorted keys.
This is semantically identical but may affect tests that assert on
key ordering of redacted output.
Zero new dependencies. YAML config support is optional
(pip install pyyaml).

Assets 2

23 Feb 23:24

sauravvenkat

v0.4.1

45afb11

v0.4.1 — Versioned Artifact Schema

Forkline v0.4.1 Release Notes — Versioned Artifact Schema

Release: v0.4.1
Date: 2026-02-23
Milestone: Versioned Artifact Schema (v1.0)

Summary

This release delivers a documented, forward-compatible artifact schema for Forkline run artifacts. Every artifact now carries a mandatory schema_version field, older artifacts are migrated transparently via a deterministic migration pipeline, and unknown fields from newer versions are tolerated without crashing.

This is foundational infrastructure. Schema versioning protects determinism, stability, auditability, diff integrity, and long-term replay trust. Forkline can now evolve its artifact format without breaking history.

What's New

1. Canonical Artifact Schema (`forkline/artifact/schema.py`)

Typed, versioned models for run artifacts, implemented as frozen dataclasses with zero new dependencies.

Models:

RunArtifact — Top-level artifact with mandatory schema_version, run_id, entrypoint, started_at, optional ended_at, status, forkline_version, events, and extensible metadata dict.
ArtifactEvent — Single event with event_id, run_id, ts, type, and payload.
SchemaVersionError — Exception for missing or unsupported schema versions.

Guarantees:

schema_version is mandatory. RunArtifact.from_dict() raises SchemaVersionError if missing.
Unknown fields are silently ignored in from_dict() — forward compatibility by design.
Artifacts are immutable (frozen dataclasses).
validate() returns a list of structural errors without raising.
to_json() / from_json() provide deterministic JSON roundtrip.

Schema version: "1.0" (SemVer-style, replacing the legacy "recording_v0" format).

2. Deterministic Migration Registry (`forkline/artifact/migrate.py`)

A versioned migration pipeline that transforms older artifact schemas to the current canonical format.

Primary entry point:

from forkline.artifact import migrate_artifact

migrated = migrate_artifact(raw_json_dict)

Behavior:

schema_version == "1.0" → returns a deep copy unchanged.
schema_version is older (e.g. "recording_v0") → routes through chained migration functions.
schema_version is newer (e.g. "2.0") → warns but returns unchanged (best-effort forward compat).
schema_version is missing → raises SchemaVersionError.

Migration registration pattern:

from forkline.artifact import register_migration

def migrate_1_0_to_1_1(raw: dict) -> dict:
    result = dict(raw)
    result.setdefault("new_field", "default_value")
    return result

register_migration("1.0", "1.1", migrate_1_0_to_1_1)

Built-in migration — recording_v0 → 1.0:

Environment fields (python_version, platform, cwd) moved to metadata dict.
Event timestamps normalized from created_at to ts.
schema_version updated to "1.0".

Migration invariants:

Deterministic: same input always produces same output.
Side-effect free: no I/O, no network, no state mutation.
Input is never mutated: deep copy is always made.
Chains compose: recording_v0 → 1.0 → 1.1 applied sequentially.

3. Storage Integration

Both storage backends now support schema-aware artifact loading and canonical JSON export.

RunRecorder (flat event model):

load_artifact(run_id) -> Optional[RunArtifact] — loads run as canonical artifact, applies migration if needed.
export_artifact_json(run_id) -> Optional[str] — exports run as canonical JSON with schema_version.

SQLiteStore (step-based model):

load_artifact(run_id) -> Optional[RunArtifact] — flattens step hierarchy into flat event list, applies migration.
export_artifact_json(run_id) -> Optional[str] — exports run as canonical JSON.

Both methods handle legacy databases transparently. Artifacts with schema_version: "recording_v0" are migrated to "1.0" on load.

4. Replay Engine — Version Validation

The replay engine now validates schema_version at the load boundary.

Behavior on ReplayEngine.load_run():

Current version ("1.0"): loaded normally.
Older version: loaded via migration layer (transparent to caller).
Newer version: warning issued, best-effort replay.
Missing version: warning issued, default assumptions applied.

Critical invariant: Replay never crashes due to version mismatches. Degradation is always graceful.

5. CLI JSON Output — `schema_version` Included

All CLI JSON output now includes schema_version and forkline_version:

forkline list --json — each run includes schema_version.
forkline replay <run_id> --json — artifact includes schema_version and forkline_version.

6. Version Constants Updated (`forkline/version.py`)

Constant	Old Value	New Value	Purpose
`SCHEMA_VERSION`	`"recording_v0"`	`"1.0"`	Stamped on all new artifacts
`LEGACY_SCHEMA_VERSION`	(new)	`"recording_v0"`	Identifies pre-v1.0 artifacts
`DEFAULT_SCHEMA_VERSION`	`"recording_v0"`	`"recording_v0"`	Backward compat for NULL columns

7. Documentation

docs/artifact_schema.md — Full artifact schema specification including:

Versioning policy (SemVer-style major/minor rules)
Schema v1.0 field tables (RunArtifact, Event)
Example artifact JSON
Backward compatibility matrix
Migration guarantees and registration pattern
SQLite and JSON storage format details
Replay integration behavior
Stability guarantees
Design principles

README.md — Added "Artifact Stability Guarantee" section:

Forkline guarantees replay compatibility across minor versions. Breaking changes require a major version increment and migration support.

8. Module Exports

New public symbols exported from forkline:

Symbol	Module	Description
`RunArtifact`	`forkline.artifact.schema`	Canonical run artifact model
`ArtifactEvent`	`forkline.artifact.schema`	Canonical event model
`SchemaVersionError`	`forkline.artifact.schema`	Missing/unsupported version exception
`migrate_artifact`	`forkline.artifact.migrate`	Primary migration entry point
`register_migration`	`forkline.artifact.migrate`	Migration registration for future versions

Tests

34 new tests in tests/unit/test_artifact_schema.py. All hermetic — no network, deterministic.

TestRunArtifactSchema (9 tests)

Test	Validates
`test_schema_version_required`	`schema_version` field is present
`test_to_dict_includes_schema_version`	Serialized dict includes `schema_version`
`test_to_json_roundtrip`	JSON serialize/deserialize preserves all data
`test_from_dict_rejects_missing_schema_version`	`SchemaVersionError` raised when missing
`test_from_dict_ignores_unknown_fields`	Unknown fields silently dropped
`test_validate_catches_missing_required_fields`	Validation reports empty required fields
`test_validate_passes_for_valid_artifact`	Valid artifact returns no errors
`test_metadata_extensibility`	Arbitrary keys in metadata dict
`test_immutability`	Frozen dataclass rejects mutation

TestArtifactEvent (2 tests)

Test	Validates
`test_from_dict_ignores_unknown_fields`	Unknown event fields silently dropped
`test_to_dict_roundtrip`	Event roundtrip preserves data

TestMigrationRegistry (7 tests)

Test	Validates
`test_migrate_current_version_is_noop`	`"1.0"` → `"1.0"` returns copy unchanged
`test_migrate_current_version_returns_deep_copy`	Deep copy, not same object
`test_migrate_missing_schema_version_raises`	`SchemaVersionError` on missing version
`test_migrate_recording_v0_to_1_0`	Full migration: env fields → metadata, ts normalization
`test_migrate_is_deterministic`	Same input → same output across invocations
`test_migrate_does_not_mutate_input`	Input dict unchanged after migration
`test_newer_version_returns_with_warning`	Warning issued, data preserved
`test_migrate_non_dict_raises`	Non-dict input raises `SchemaVersionError`

TestVersionComparison (6 tests)

Test	Validates
`test_compare_equal`	`"1.0"` == `"1.0"`
`test_compare_less`	`"1.0"` < `"2.0"`
`test_compare_greater`	`"2.0"` > `"1.0"`
`test_legacy_less_than_semver`	`"recording_v0"` < `"1.0"`
`test_migration_path_exists`	Path from `recording_v0` → `1.0` found
`test_migration_path_same_version`	Same version returns empty path

TestStorageArtifactIntegration (6 tests)

Test	Validates
`test_recorder_load_artifact`	`RunRecorder.load_artifact()` returns valid `RunArtifact`
`test_recorder_export_json`	JSON export includes `schema_version`
`test_recorder_load_artifact_nonexistent`	Returns `None` for missing run
`test_sqlitestore_load_artifact`	`SQLiteStore.load_artifact()` with step flattening
`test_sqlitestore_export_json`	JSON export from step-based store
`test_legacy_db_migrates_on_load_artifact`	Legacy `recording_v0` DB migrates to `"1.0"`

TestSchemaVersionConsistency (2 tests)

Test	Validates
`test_versions_match`	`SCHEMA_VERSION` == `CURRENT_SCHEMA_VERSION`
`test_schema_version_is_1_0`	Current version is `"1.0"`

Updated Tests (2 tests in `test_version_schema.py`)

Test	Change
`test_schema_version_format`	Updated to validate `"major.minor"` numeric format instead of `"recording_"` prefix
`test_default_versions_are_reasonable`	Added assertion that current schema differs from default

Total test count after this release: 258 (22...

Assets 2

23 Feb 01:13

sauravvenkat

v0.4

cd420bb

v0.4 - CLI

Forkline v0.4 Release Notes — CLI

Release: v0.4.0
Date: 2026-02-23
Milestone: v0.4 — CLI (Roadmap item 4 of 5)

Summary

This release delivers the full forkline CLI: four subcommands that let you run, list, replay, and diff agent runs from the terminal. This is the adoption wedge — Forkline is now usable without writing any Python.

The CLI is thin by design: parse args → call library APIs → render output. No business logic lives in the CLI layer.

What's New

1. `forkline run` — Execute under tracing

Run any Python script under Forkline tracing. Records execution metadata (script path, timestamps, exit code) and prints the assigned run ID.

$ forkline run examples/ollama_qwen3.py
Calling qwen3 ...
Response: A fork bomb is a denial-of-service attack that recursively spawns
an infinite number of processes to exhaust system resources, causing a crash
or severe performance degradation.
run_id: b015f49f45c04002a3c489fe84b45c5c

Behavior:

Validates the script file exists (exits 2 if not)
Executes the script in a subprocess via sys.executable
Sets environment variables for script integration: FORKLINE_TRACING=1, FORKLINE_RUN_ID=<id>, FORKLINE_DB=<path>
Records run start/end timestamps and exit code via RunRecorder
On non-zero exit: stores status=failed, prints run_id, and propagates the exit code
Script arguments are passed after --: forkline run script.py -- --arg1 value

2. `forkline list` — List stored runs

Show all recorded runs, newest first.

$ forkline list
ID                                    Created               Script                          Status
------------------------------------------------------------------------------------------------------
7b08ac5e533d456daa7a24921c0d1687      2026-02-23 01:04:34   examples/ollama_qwen3.py        ok
b015f49f45c04002a3c489fe84b45c5c      2026-02-23 01:04:20   examples/ollama_qwen3.py        ok

Options:

Flag	Default	Description
`--limit N`	all	Maximum number of runs to show
`--json`	off	Output as JSON array
`--db PATH`	`runs.db`	SQLite database path

JSON output:

[
  {
    "created_at": "2026-02-23T01:04:34.989096+00:00",
    "ended_at": "2026-02-23T01:04:45.039067+00:00",
    "entrypoint": "examples/ollama_qwen3.py",
    "run_id": "7b08ac5e533d456daa7a24921c0d1687",
    "status": "ok"
  }
]

Output is deterministic: runs are ordered by started_at DESC, JSON keys are sorted.

3. `forkline replay` — Replay a recorded run

Load a recorded run by ID and print a summary of its events, duration, and status.

$ forkline replay b015f49f45c04002a3c489fe84b45c5c
Run: b015f49f45c04002a3c489fe84b45c5c
Script: examples/ollama_qwen3.py
Status: ok
Duration: 10.74s
Total events: 2
Events by type:
  input: 1
  output: 1

Options:

Flag	Default	Description
`--json`	off	Output full run and events as JSON
`--db PATH`	`runs.db`	SQLite database path

Exits 2 with a stderr message if the run ID is not found.

4. `forkline diff` — Diff two runs

Compare two recorded runs event-by-event and report the first point of divergence.

$ forkline diff b015f49f... 7b08ac5e...
Step 1 diverged:
  old.type: output
  old.payload: {"model": "qwen3", "response": "A fork bomb is a denial-of-service attack tha...
  new.type: output
  new.payload: {"model": "qwen3", "response": "A fork bomb is a type of denial-of-service at...

Options:

Flag	Default	Description
`--format pretty\|json`	`pretty`	Output format
`--first-divergence`	on	Stop at first divergence
`--db PATH`	`runs.db`	SQLite database path

Identical runs: prints No differences and exits 0.

JSON output:

{
  "divergence_index": 1,
  "identical": false,
  "new": {
    "payload": {"model": "qwen3", "response": "A fork bomb is a type of..."},
    "type": "output"
  },
  "old": {
    "payload": {"model": "qwen3", "response": "A fork bomb is a denial-of..."},
    "type": "output"
  },
  "run_a": "b015f49f45c04002a3c489fe84b45c5c",
  "run_b": "7b08ac5e533d456daa7a24921c0d1687",
  "total_events_a": 2,
  "total_events_b": 2
}

Diff engine: compares events by index. At each position, checks type and payload for equality. If event counts differ, reports different_event_count at the index where the shorter run ends. Always reports the first divergence.

5. Exit Code Contract

All commands follow a consistent exit code convention for CI scriptability:

Code	Meaning
`0`	Success (command completed, runs match for diff)
`1`	Divergence found (diff only)
`2`	Input error (missing run ID, file not found, invalid args)

6. Environment Variable Bridge

forkline run sets three environment variables before executing the script subprocess:

Variable	Value	Purpose
`FORKLINE_TRACING`	`1`	Signal that tracing is active
`FORKLINE_RUN_ID`	`<hex>`	Run ID assigned by the CLI
`FORKLINE_DB`	`<path>`	Database path for event logging

Scripts can read these to log events to the same run:

import os
from forkline.storage.recorder import RunRecorder

db = os.environ.get("FORKLINE_DB", "runs.db")
run_id = os.environ.get("FORKLINE_RUN_ID")
recorder = RunRecorder(db_path=db)
recorder.log_event(run_id, "input", {"prompt": "hello"})

7. `RunRecorder.list_runs()` — New API

Added list_runs(limit=None) to RunRecorder. Returns runs ordered by started_at DESC with backward-compatible version defaults.

recorder = RunRecorder()
runs = recorder.list_runs(limit=10)  # newest 10 runs

8. Ollama Qwen3 Example (`examples/ollama_qwen3.py`)

A live example that calls Ollama's Qwen3 model and records the input/output as Forkline events. Demonstrates nondeterminism detection: same prompt, same model, different response.

$ forkline run examples/ollama_qwen3.py    # run 1
$ forkline run examples/ollama_qwen3.py    # run 2
$ forkline diff <run_1> <run_2>            # nondeterminism caught

Uses only urllib.request from the standard library — no new dependencies.

Tests

38 new tests in tests/unit/test_cli.py. All hermetic except TestCLIRun which spawns real subprocesses against temporary scripts and databases.

TestRenderRunResult (1 test)

Test	Validates
`test_format`	Output is `run_id: <id>`

TestRenderListTable (2 tests)

Test	Validates
`test_empty`	`"No runs found."` for empty list
`test_header_and_row`	Header columns, timestamp formatting, values present

TestRenderListJSON (1 test)

Test	Validates
`test_json_array`	Valid JSON array with correct fields

TestRenderReplaySummary (1 test)

Test	Validates
`test_contains_fields`	Run ID, status, duration, event counts, events by type

TestRenderReplayJSON (2 tests)

Test	Validates
`test_valid_json_with_all_fields`	All fields present and correct in parsed JSON
`test_empty_events`	Zero events, null timestamps handled

TestRenderDiffPretty (2 tests)

Test	Validates
`test_identical`	`"No differences"`
`test_diverged`	`"Step N diverged:"` with old/new type and payload

TestRenderDiffJSON (2 tests)

Test	Validates
`test_identical`	`{"identical": true}`
`test_diverged`	`divergence_index` and old/new present

TestDiffEvents (6 tests)

Test	Validates
`test_identical`	Same events → `identical: true`
`test_type_mismatch`	Different event types → divergence at index 0
`test_payload_mismatch`	Same type, different payload → divergence at index 0
`test_different_lengths`	Shorter list → `reason: different_event_count`
`test_both_empty`	Two empty lists → identical
`test_finds_first_divergence`	Three events, divergence at index 1 (not 2)

TestListRuns (3 tests)

Test	Validates
`test_empty`	Empty database → empty list
`test_ordered_newest_first`	Most recent run is first
`test_limit`	`limit=2` returns exactly 2 from 3 runs

TestCLIList (3 tests)

Test	Validates
`test_list_shows_runs`	Run ID and script name in table output
`test_list_json`	Valid JSON array with correct run ID
`test_list_empty`	`"No runs found"` for empty database

TestCLIReplay (3 tests)

Test	Validates
`test_replay_success`	Run ID, status, event count in output; exit 0
`test_replay_missing_run`	Exit 2 for nonexistent run
`test_replay_json`	Valid JSON with `run_id` and `total_events`

TestCLIDiff (6 tests)

Test	Validates
`test_identical_runs`	`"No differences"`; exit 0
`test_different_runs`	`"Step 0 diverged"`; exit 1
`test_diff_json_format`	JSON with `"identical": true`
`test_diff_missing_run`	Exit 2 for nonexistent run
`test_diff_different_event_counts`	`"Event count differs"` message
`test_diff_json_diverged`	JSON with `divergence_index` and old/new

TestCLIRun (4 tests)

Test	Validates
`test_run_missing_file`	Exit 2 for nonexistent script
`test_run_success`	`run_id:` in output; run stored with `...

Assets 2

21 Feb 17:46

sauravvenkat

v0.3

8f159c7

v0.3 — First-Divergence Diffing

Forkline v0.3 Release Notes — First-Divergence Diffing

Release: v0.3.0
Date: 2026-02-21
Milestone: v0.3 — First-Divergence Diffing (Roadmap item 3 of 5)

Summary

This release delivers first-divergence diffing: given two recorded runs, Forkline now compares them step-by-step and returns the first point of divergence with deterministic classification, structured JSON diff patches, and rule-based explanations.

This is the core feature that turns Forkline from a recording/replay tool into a forensic debugging tool — answering not just that two runs differ, but where, how, and what changed.

What's New

1. Deterministic Canonicalization (`forkline/core/canon.py`)

A canonicalization layer that produces stable, deterministic byte representations of any value before hashing or diffing.

Functions:

canon(value, profile="strict") -> bytes — Canonicalize any value to bytes
sha256_hex(data: bytes) -> str — SHA-256 hex digest
bytes_preview(data: bytes) -> str — Human-readable sha256:<hash>:<hex_prefix> format

Canonicalization guarantees:

Dict key order is irrelevant. Keys are sorted lexicographically before serialization.
Unicode is NFC-normalized. "café" (precomposed) and "café" (decomposed e + combining accent) produce identical output.
Newlines are normalized to LF. \r\n and \r are collapsed to \n.
Floats use 17-significant-digit precision. -0.0 collapses to 0.0. NaN and Inf are serialized as stable strings.
Booleans and integers are distinct. True and 1 produce different canonical bytes.
Bytes pass through unchanged. Binary data is not re-encoded; hashing uses SHA-256 with a hex prefix preview for display.
Compact JSON encoding. No whitespace separators ("," and ":"), ensure_ascii=False.

Zero dependencies. Uses only hashlib, json, math, unicodedata from the standard library.

2. Deterministic JSON Diff Patches (`forkline/core/json_diff.py`)

A recursive JSON diff algorithm that produces a stable, ordered list of patch operations for any two JSON-like values.

Function:

json_diff(old, new, path="$") -> List[Dict]

Patch operation format:

[
  {"op": "remove", "path": "$.a.b", "old": "<removed_value>"},
  {"op": "add",    "path": "$.x",   "value": "<added_value>"},
  {"op": "replace","path": "$.k",   "old": "<old_value>", "new": "<new_value>"}
]

Ordering guarantees (deterministic across invocations):

Dicts: removed keys (sorted) → added keys (sorted) → common keys (sorted, recursed).
Lists: compared by index; removes at tail, then adds at tail.
Type mismatch: replace whole node.
Numeric compatibility: int vs float compared as numeric, not as type mismatch.

Paths use JSONPath-style notation: $.outer.inner, $.list[0], $.nested.array[2].field.

3. First-Divergence Engine (`forkline/core/first_divergence.py`)

The core diffing algorithm: compare two Run objects step-by-step, classify the first mismatch, and return a structured result.

Algorithm

Lockstep comparison. Walk both runs at the same index. At each step, classify by comparing (in priority order): step name → input hash → error state → output hash → all events hash.
Resync window. On mismatch, search within a configurable window (default W=10) for matching "soft signatures" — (step_name, input_hash) tuples. The search iterates by increasing combined distance from the mismatch point, finding the nearest resync.
Gap classification.
- Resync with gap_a > 0, gap_b == 0 → missing_steps (steps in run_a absent from run_b)
- Resync with gap_b > 0, gap_a == 0 → extra_steps (steps in run_b not in run_a)
- Both gaps > 0 → falls through to classify the mismatch at current position
- No resync → classify by what differs at current position
Length mismatch. If one run is longer after lockstep exhausts the shorter, classify as missing_steps or extra_steps.

Divergence Types

Type	Trigger	Explanation Pattern
`exact_match`	Runs identical	`"Runs are identical (N steps compared)"`
`op_divergence`	Step names differ	`"Step 3: operation mismatch ('tool_call' vs 'llm_call')"`
`input_divergence`	Same name, different input	`"Step 3 'tool_call': input differs"`
`output_divergence`	Same name + input, different output	`"Step 3 'tool_call': output differs (same input)"`
`error_divergence`	Error presence or content differs	`"Step 3 'tool_call': error state differs"`
`missing_steps`	Steps in run_a not in run_b	`"Step 5 from run_a missing in run_b"`
`extra_steps`	Steps in run_b not in run_a	`"Steps 3..4 in run_b not present in run_a"`

All explanations are deterministic and rule-based — no LLM narration, no randomness.

Classification Priority

When two steps share a name but differ, classification follows strict priority:

Input divergence — checked first because differing inputs explain differing outputs
Error divergence — error presence/absence or content differs
Output divergence — same input but different output (nondeterminism signal)
All-events fallback — catches differences in tool_call, artifact_ref, or other event types

Data Models

StepSummary — Compact step representation included in results:

StepSummary(
    idx=2,
    name="generate_response",
    input_hash="a1b2c3d4...",
    output_hash="e5f6a7b8...",
    event_count=3,
    has_error=False,
)

FirstDivergenceResult — Complete result object:

FirstDivergenceResult(
    status="output_divergence",        # DivergenceType
    idx_a=2,                           # Index in run_a at divergence
    idx_b=2,                           # Index in run_b at divergence
    explanation="Step 2 'generate_response': output differs (same input)",
    old_step=StepSummary(...),         # Step from run_a
    new_step=StepSummary(...),         # Step from run_b
    input_diff=None,                   # JSON patch (when applicable)
    output_diff=[{"op": "replace", "path": "$[0].text", ...}],
    last_equal_idx=1,                  # Last step where both matched
    context_a=[StepSummary(...),...],   # 2 steps before/after in run_a
    context_b=[StepSummary(...),...],   # 2 steps before/after in run_b
)

Both models are frozen dataclasses with .to_dict() for JSON serialization.

API

from forkline.core.first_divergence import find_first_divergence, DivergenceType

result = find_first_divergence(
    run_a,
    run_b,
    window=10,          # Resync window size
    context_size=2,     # Steps before/after divergence in context
    show="both",        # "input", "output", or "both"
)

# JSON-serializable output
import json
print(json.dumps(result.to_dict(), indent=2))

4. CLI — `forkline diff` (`forkline/cli.py`)

The first CLI subcommand, establishing the forkline command-line interface.

Usage:

forkline diff --first <run_a> <run_b> [OPTIONS]

Options:

Flag	Default	Description
`--first`	`true`	Show first divergence only
`--window N`	`10`	Resync window size
`--format json\|text`	`text`	Output format
`--show input\|output\|both`	`both`	Which diffs to include
`--canon strict`	`strict`	Canonicalization profile
`--db PATH`	`forkline.db`	SQLite database path

Exit codes:

0 — Runs are identical (exact_match)
1 — Divergence detected (any other status)

This makes forkline diff directly usable in CI pipelines and shell scripts.

Text output sample:

First divergence: output_divergence
  Step 2 'generate_response': output differs (same input)

  Run A step 2 'generate_response':
    input_hash:  a1b2c3d4e5f6a7b8...
    output_hash: 1234567890abcdef...
    events: 3
    has_error: False

  Run B step 2 'generate_response':
    input_hash:  a1b2c3d4e5f6a7b8...
    output_hash: fedcba0987654321...
    events: 3
    has_error: False

  Output diff:
    replace $[0].text: "Expected response" -> "Different response"

  Last equal: step 1
  Context A: [step 0 'init', step 1 'prepare', step 2 'generate_response']
  Context B: [step 0 'init', step 1 'prepare', step 2 'generate_response']

Entry point: Registered as forkline = "forkline.cli:main" in pyproject.toml ([project.scripts]).

5. Module Exports

New public symbols exported from forkline and forkline.core:

Symbol	Module	Description
`find_first_divergence`	`forkline.core.first_divergence`	Main engine function
`FirstDivergenceResult`	`forkline.core.first_divergence`	Result dataclass
`StepSummary`	`forkline.core.first_divergence`	Compact step summary
`DivergenceType`	`forkline.core.first_divergence`	Type classification constants
`canon`	`forkline.core.canon`	Value → canonical bytes
`sha256_hex`	`forkline.core.canon`	Bytes → SHA-256 hex
`bytes_preview`	`forkline.core.canon`	Bytes → human-readable hash preview
`json_diff`	`forkline.core.json_diff`	Deterministic JSON diff patches

Tests

45 new tests across 3 test classes in tests/unit/test_first_divergence.py. All hermetic — no database, no disk I/O, no network.

TestCanonStability (14 tests)

Test	Validates
`test_dict_key_order_irrelevant`	`{"z":1,"a":2}` == `{"a":2,"z":1}`
`test_nested_dict_stability`	Deep nesting with mixed key order
`test_unicode_normalization`	NFC: `\u00e9` == `e\u0301`
`test_newline_normalization`	`\r...

Assets 2

03 Feb 01:48

sauravvenkat

v0.1.1

defd8fb

v0.1.1 - Recording & Artifact Foundations

Forkline v0.1.1 — Recording & Artifact Foundations

Release focus: establish Forkline’s core recording primitives and artifact model for deterministic agent workflows.

v0.1.1 intentionally does not include replay. This release lays the groundwork by making runs recordable, inspectable, and diffable in a local-first, immutable format.

✨ What’s New

Deterministic Run Recording (Foundational)

Introduced a structured run recording model for agentic workflows.
Captures ordered execution steps including:
- LLM inputs and outputs
- Tool invocations
- Execution metadata
Artifacts are written locally and treated as immutable once persisted.

This establishes Forkline’s core abstraction:
a run is a durable, replayable artifact — not a log stream.

Artifact Schema v0

Added a first-pass, explicit artifact schema (recording_v0).
Clearly separates:
- Run-level metadata
- Step-level inputs and outputs
- Execution ordering
Schema is designed for future replay and diffing, not observability.

This schema defines the baseline contract for Forkline artifacts.

Redaction Support

Introduced a redaction layer for recorded artifacts.
Enables sensitive fields (API keys, tokens, PII) to be:
- Redacted at record time, or
- Scrubbed before persistence
Redaction is explicit and policy-driven — never implicit.

This makes Forkline artifacts safe for local inspection and sharing.

Run & Step Diffing (Structural)

Added initial diff utilities for comparing recorded runs or steps.
Focuses on structural and semantic differences, not textual logs.
Intended for offline analysis and future replay divergence detection.

This is a foundational capability, not a visualization layer.

Core Types & Invariants

Formalized core domain types:
- Runs
- Steps
- Artifacts
- Diff results
Introduced the concept of core invariants:
- Ordering matters
- Artifacts are the source of truth
- No mutation after persistence

These invariants guide all future Forkline features.

❌ Explicit Non-Goals (v0.1.1)

To avoid confusion, v0.1.1 does not include:

Replay or re-execution
First-divergence detection
OpenTelemetry integration
Observability, tracing, or metrics
Production or distributed runtime support

These exclusions are deliberate.

Why This Release Matters

v0.1.1 is about credibility, not completeness.

It answers one question clearly:

Can Forkline reliably capture an agent run as a durable, inspectable artifact?

The answer is now yes.

Replay, divergence detection, and developer-facing workflows are built on top of this foundation.

What’s Next

Deterministic replay from recorded artifacts
First-divergence detection
Golden replay tests
Minimal replay demos

(Tracked for the next release.)

What's Changed

Replay Engine by @sauravvenkat in #16

Full Changelog: v0.1.0...v0.1.1

Contributors

sauravvenkat

Assets 2

19 Jan 00:33

sauravvenkat

v0.1.0

b629fd2

v0.1.0 - Deterministic Run Recording

Forkline v0.1.0

First release of Forkline: local-first, replay-first tracing for agentic AI workflows.

What's in v0.1

✅ Deterministic recording of agent runs
✅ Self-contained artifacts stored in SQLite
✅ Security-first with automatic redaction (SAFE mode)
✅ Human-inspectable with sqlite3 or helper scripts
✅ Append-only logging with versioned schema

Quick Start

git clone https://github.com/sauravvenkat/forkline.git
cd forkline
source dev.env
python examples/minimal.py
python scripts/inspect_runs.py

What's Changed

Boilerplate by @sauravvenkat in #1
Update README.md and adding ROADMAP in docs folder with design/roadmap for v0.* until v1.0 by @sauravvenkat in #10
Deterministic run recording v0 + repo module restructure (core/, storage/, tracer/) by @sauravvenkat in #11
Adding REDACTION_POLICY.md doc by @sauravvenkat in #13
Implement RedactionPolicy v0 by @sauravvenkat in #14

New Contributors

@sauravvenkat made their first contribution in #1

Full Changelog: https://github.com/sauravvenkat/forkline/commits/v0.1.0

Contributors

sauravvenkat

Assets 2

0 Join discussion

Releases: sauravvenkat/forkline

v0.5.0 — CI Integration

Forkline v0.5.0 Release Notes — CI Integration

Summary

What's New

1. forkline ci Command Suite (forkline/ci/commands.py)

2. Offline Enforcement (forkline/ci/offline.py)

3. Artifact Normalization (forkline/ci/normalize.py)

4. Exit Code Contract (forkline/ci/exitcodes.py)

5. Python Test Helper (forkline/testing.py)

6. Diff Output (forkline/ci/commands.py)

7. CLI Integration (forkline/cli/__init__.py)

8. Documentation (docs/ci.md)

9. Examples

Tests

TestExitCodes (2 tests)

TestOfflineMode (7 tests)

TestNormalization (9 tests)

TestCIRecord (6 tests)

TestCIReplay (6 tests)

TestCIDiff (9 tests)

TestCINormalize (4 tests)

TestCICheck (4 ...

Uh oh!

v0.4.2 - Tool Invocation Recording + Deterministic Redaction

Forkline v0.4.2 Release Notes

Tool Invocation Recording + Deterministic Redaction

New: Tool Call Events

New: Regex-Based Redaction

New: Redaction Config Files

Improved: Determinism Guarantees

New API Surface

Examples

Test Coverage

Documentation

Migration Notes

Uh oh!

v0.4.1 — Versioned Artifact Schema

Forkline v0.4.1 Release Notes — Versioned Artifact Schema

Summary

What's New

1. Canonical Artifact Schema (forkline/artifact/schema.py)

2. Deterministic Migration Registry (forkline/artifact/migrate.py)

3. Storage Integration

4. Replay Engine — Version Validation

5. CLI JSON Output — schema_version Included

6. Version Constants Updated (forkline/version.py)

7. Documentation

8. Module Exports

Tests

TestRunArtifactSchema (9 tests)

TestArtifactEvent (2 tests)

TestMigrationRegistry (7 tests)

TestVersionComparison (6 tests)

TestStorageArtifactIntegration (6 tests)

TestSchemaVersionConsistency (2 tests)

Updated Tests (2 tests in test_version_schema.py)

Uh oh!

v0.4 - CLI

Forkline v0.4 Release Notes — CLI

Summary

What's New

1. forkline run — Execute under tracing

2. forkline list — List stored runs

3. forkline replay — Replay a recorded run

4. forkline diff — Diff two runs

5. Exit Code Contract

6. Environment Variable Bridge

7. RunRecorder.list_runs() — New API

8. Ollama Qwen3 Example (examples/ollama_qwen3.py)

Tests

TestRenderRunResult (1 test)

TestRenderListTable (2 tests)

TestRenderListJSON (1 test)

TestRenderReplaySummary (1 test)

TestRenderReplayJSON (2 tests)

TestRenderDiffPretty (2 tests)

TestRenderDiffJSON (2 tests)

TestDiffEvents (6 tests)

TestListRuns (3 tests)

1. `forkline ci` Command Suite (`forkline/ci/commands.py`)

2. Offline Enforcement (`forkline/ci/offline.py`)

3. Artifact Normalization (`forkline/ci/normalize.py`)

4. Exit Code Contract (`forkline/ci/exitcodes.py`)

5. Python Test Helper (`forkline/testing.py`)

6. Diff Output (`forkline/ci/commands.py`)

7. CLI Integration (`forkline/cli/init.py`)

8. Documentation (`docs/ci.md`)

1. Canonical Artifact Schema (`forkline/artifact/schema.py`)

2. Deterministic Migration Registry (`forkline/artifact/migrate.py`)

5. CLI JSON Output — `schema_version` Included

6. Version Constants Updated (`forkline/version.py`)

Updated Tests (2 tests in `test_version_schema.py`)

1. `forkline run` — Execute under tracing

2. `forkline list` — List stored runs

3. `forkline replay` — Replay a recorded run

4. `forkline diff` — Diff two runs

7. `RunRecorder.list_runs()` — New API

8. Ollama Qwen3 Example (`examples/ollama_qwen3.py`)

1. Deterministic Canonicalization (`forkline/core/canon.py`)

2. Deterministic JSON Diff Patches (`forkline/core/json_diff.py`)

3. First-Divergence Engine (`forkline/core/first_divergence.py`)

4. CLI — `forkline diff` (`forkline/cli.py`)