Skip to content

Session Management and Harness Profiles#2

Open
kumanday wants to merge 3 commits intomainfrom
leonardogonzalez/coe-228-session-management-and-harness-profiles
Open

Session Management and Harness Profiles#2
kumanday wants to merge 3 commits intomainfrom
leonardogonzalez/coe-228-session-management-and-harness-profiles

Conversation

@kumanday
Copy link
Collaborator

Summary

Implements COE-228: Session lifecycle, session-scoped credentials, and harness env rendering.

Key Features

Session Manager Service

  • Full session lifecycle with validated state transitions
  • Session-scoped proxy credentials with unique aliases
  • Git metadata capture from active repository
  • Session notes and artifact registry

CLI Commands

  • bench session create - Creates session with benchmark metadata
  • bench session finalize - Records status and end time
  • bench session note - Adds notes to sessions
  • bench session artifact - Registers exported artifacts

Harness Profiles

  • Environment rendering for multiple harness types
  • Supports shell, dotenv, and JSON output formats
  • Variant overrides included deterministically
  • Secrets never written to tracked files

Configuration

  • Typed config schemas for providers, harnesses, variants, experiments
  • Example configs for Anthropic and OpenAI-surface harness profiles

Acceptance Criteria Verified

  • Session creation writes benchmark metadata before harness launch
  • Session finalization records status and end time
  • Git metadata is captured from the active repository
  • Every created session gets a unique proxy credential
  • Key alias and metadata can be joined back to the session
  • Secrets are not persisted in plaintext beyond intended storage
  • Rendered output uses correct variable names for each harness profile
  • Variant overrides are included deterministically
  • Rendered output never writes secrets into tracked files
  • Operators can finalize a session with valid outcome state
  • Exports can be attached to a session or experiment as artifacts
  • Invalid sessions remain visible for audit but excluded from comparisons

Test Results

41 passed, 136 warnings in 1.30s

Test Plan Coverage

Unit Tests

  • Service tests for valid and invalid lifecycle transitions
  • Credential metadata builder tests
  • Rendering tests for multiple harness profiles
  • Outcome-state validation tests

Integration Tests

  • CLI create/finalize flow against local DB
  • Session create command emits usable shell and dotenv outputs
  • Session finalize with note and artifact registration

Implements COE-228: Session lifecycle, credentials, and harness rendering

Key features:
- Session manager service with lifecycle transitions
- Session-scoped proxy credentials with unique aliases
- Harness profile env rendering (shell, dotenv, json)
- Git metadata capture from active repository
- Outcome states and artifact registry

All 41 tests pass.
@kumanday kumanday added the symphony Symphony orchestrated task label Mar 21, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 270a567c-c959-44bf-8865-7de4365b3e23

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonardogonzalez/coe-228-session-management-and-harness-profiles

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80bcacb8c0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +66 to +67
repository = InMemorySessionRepository()
manager = SessionManager(settings=settings, session_repository=repository)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist sessions across CLI invocations

Each subcommand instantiates its own InMemorySessionRepository (create here, and the same pattern repeats in finalize, note, show, and list). After bench session create exits, that in-memory store is discarded, so a later bench session finalize <id> or bench session show <id> cannot retrieve the session and the documented multi-step workflow is unusable from the CLI.

Useful? React with 👍 / 👎.

Comment on lines +215 to +218
session_obj = await manager.finalize_session(
UUID(session_id),
outcome=outcome,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass a SessionFinalize model to finalize_session

SessionManager.finalize_session takes a single SessionFinalize argument, but this call passes a UUID plus an outcome keyword. Running bench session finalize will therefore raise a TypeError before it even attempts to load the session, so the finalize command cannot succeed.

Useful? React with 👍 / 👎.

Comment on lines +92 to +95
session = Session(
operator_label=create_input.operator_label,
git_metadata=git_metadata,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve chosen variant/task metadata on new sessions

SessionCreate carries experiment_name, variant_name, task_card_name, and harness_profile_name, but create_session only copies operator_label and git_metadata into the saved Session. As a result every created session loses the benchmark configuration it was launched with, leaving the core correlation fields unset and making later comparisons/reporting unable to distinguish one variant/task selection from another.

Useful? React with 👍 / 👎.

Comment on lines +160 to +164
lines.append("# Anthropic-surface harness")
lines.append("export ANTHROPIC_BASE_URL=\"${STACKPERF_PROXY_BASE_URL}/v1\"")
lines.append("export ANTHROPIC_API_KEY=\"${STACKPERF_SESSION_API_KEY}\"")
lines.append("")
lines.append("# OpenAI-surface harness")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Render env from the selected harness profile

bench session create ignores --harness and --variant here and emits a hard-coded Anthropic/OpenAI snippet instead of loading the chosen configs through HarnessRenderer. That means harness-specific variable names, model aliases, and variant overrides never reach the operator's env file, so launching a non-default harness or a variant with overrides will use the wrong settings.

Useful? React with 👍 / 👎.

Comment on lines +103 to +104
# Transition to pending
session.status = SessionStatus.PENDING

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Provide an activation path before completion

create_session leaves every new record in pending, and I checked src/cli/session.py but there is no command that calls activate_session. Because SessionManager.finalize_session only allows the default completed transition from ACTIVE, the normal create → run harness → finalize workflow can never record a successful session.

Useful? React with 👍 / 👎.

- Consolidated config files to use plural naming (providers, harnesses, etc.)
- Added description and metadata fields to plural config models
- Updated test_renderer.py to include all required benchmark tags
- Removed duplicate singular config files (provider.py, harness.py, etc.)
- All 63 tests passing
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e0768cd84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".



# Import and register commands after groups are defined
from . import session as session_commands # noqa: E402, F401

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register the config command group on the CLI entry point

src/cli/main.py only defines/imports session; I also checked src/cli/ for any add_command(config) or from . import config wiring and found none. That makes every new bench config ... command introduced in src/cli/config.py unreachable from the installed bench script, so operators cannot run the documented config validation/list/show workflow at all.

Useful? React with 👍 / 👎.

Comment on lines +117 to +122
gitignore_path = Path(".gitignore")
if gitignore_path.exists():
gitignore_content = gitignore_path.read_text()
if output_dir not in gitignore_content:
with open(gitignore_path, "a") as f:
f.write(f"\n# StackPerf session outputs\n{output_dir}/\n.env.local\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ensure rendered session secrets are written under an ignored path

This only appends ignore rules when .gitignore already exists, but the command always writes session-env.* with the raw API key immediately afterward. In repositories without a pre-existing .gitignore, a normal git add . will stage the generated credential file, which violates the repo’s “do not write secrets into tracked files” requirement.

Useful? React with 👍 / 👎.

Comment on lines +134 to +135
session.status = SessionStatus.ACTIVE
session.updated_at = datetime.utcnow()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start timing when the session becomes active

The architecture here creates sessions before the harness is launched, but activate_session() only flips the status and leaves started_at at the timestamp assigned during create_session(). Any delay between bench session create and the actual harness start will therefore inflate session duration and any later rollups/comparisons with pre-launch idle time instead of benchmark runtime.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

symphony Symphony orchestrated task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant