This repository contains a full Solace Agent Mesh (SAM OSE) multi-agent purchase-order experiment with:
- 5 specialized agents (
Orchestrator,Inventory,Supplier,Compliance,Finance) - 5 custom Python tools under
tools/ - 16 evaluation test cases under
test_suites/test_cases/ - smoke + full evaluation suites
- analysis automation and chart generation
- blog-ready write-up at
BLOG_POST.md
agents/- 5 SAM agent configuration filestools/- deterministic mock SAP/supplier/compliance/finance tool implementationstest_suites/- smoke/full suite definitions and 16 test case JSON filesconfigs/- shared SAM config and eval backend configscripts/run_eval.sh- run smoke/trace/full suites and post-analysisscripts/analyze_results.py- parse evaluation outputs and generate charts/analysisevaluation_results/- copied result artifacts and generated chartsANALYSIS.md- generated analysis summaryBLOG_POST.md- long-form write-up for publication
- Python 3.10+
- Internet access for model API calls
- OpenAI-compatible endpoint and API key
- Local Solace broker reachable at
SOLACE_BROKER_URLforsam eval
cd /Users/raphaelcaillon/Documents/github/sam-evals-experiments
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install solace-agent-mesh
pip install "sam-rest-gateway @ git+https://github.com/SolaceLabs/solace-agent-mesh-core-plugins#subdirectory=sam-rest-gateway"SAM source is vendored for reference at:
vendor/solace-agent-meshThe project uses .env (ignored by git). Key defaults included:
SOLACE_DEV_MODE=trueREST_API_HOST=127.0.0.1REST_API_PORT=8080EXPERIMENT_ROOT=/Users/raphaelcaillon/Documents/github/sam-evals-experimentsLLM_SERVICE_ENDPOINTLLM_SERVICE_API_KEYLLM_SERVICE_PLANNING_MODEL_NAMELLM_EVALUATOR_MODEL_NAME
sam eval requires a broker subscriber connection and does not fully work with dev-mode-only in-memory transport. Start a local broker before running evaluations:
docker rm -f sam-local-broker >/dev/null 2>&1 || true
docker run -d --name sam-local-broker \
--shm-size=1g \
-p 8008:8008 \
-p 18080:8080 \
-e username_admin_globalaccesslevel=admin \
-e username_admin_password=admin \
-e system_scaling_maxconnectioncount=100 \
-e system_scaling_maxqueues=200 \
-e system_scaling_maxtopicendpoints=200 \
solace/solace-pubsub-standard:latestDefault .env broker settings should match:
SOLACE_BROKER_URL=ws://localhost:8008SOLACE_BROKER_USERNAME=defaultSOLACE_BROKER_PASSWORD=defaultSOLACE_BROKER_VPN=default
Smoke only:
./scripts/run_eval.sh smokeTrace-focused suite (new tc11-tc16 only):
./scripts/run_eval.sh traceFull only:
./scripts/run_eval.sh fullSmoke then full (default):
./scripts/run_eval.sh allNote: scripts/run_eval.sh automatically forces SOLACE_DEV_MODE=false for evaluation compatibility.
The evaluation backend (configs/eval_backend.yaml, port 8080) is API-only and does not serve a frontend at /.
To launch the SAM Web UI gateway:
./scripts/run_webui.shThen open:
http://127.0.0.1:8000/
If needed, override host/port:
FASTAPI_HOST=127.0.0.1 FASTAPI_PORT=8000 ./scripts/run_webui.shsource .venv/bin/activate
set -a && source .env && set +a
export SOLACE_DEV_MODE=false
sam eval test_suites/po_eval_smoke.json --verbose
sam eval test_suites/po_eval_trace_focus.json --verbose
sam eval test_suites/po_eval_full.json --verbose
python scripts/analyze_results.py --smoke-dir results/po-eval-smoke --full-dir results/po-eval-full --output-dir evaluation_results --analysis-md ANALYSIS.mdSAM writes raw outputs to:
results/po-eval-smoke/results/po-eval-trace-focus/results/po-eval-full/
Automation copies them into:
evaluation_results/po-eval-smoke/evaluation_results/po-eval-trace-focus/evaluation_results/po-eval-full/
Analysis assets:
evaluation_results/analysis.jsonevaluation_results/charts/pass_rates.(png|html)evaluation_results/charts/agreement_heatmap.(png|html)evaluation_results/charts/latency_by_test_case.(png|html)evaluation_results/charts/trace_signals.(png|html)ANALYSIS.mdBLOG_POST.md