| title | Enterprise Workflow OpenEnv |
|---|---|
| emoji | π’ |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| pinned | false |
A real-world OpenEnv-compatible environment for training AI agents on enterprise procurement workflows β built for the Meta Γ PyTorch Γ Hugging Face OpenEnv Hackathon hosted by Scaler School of Technology (SST).
Enterprise Workflow OpenEnv simulates a realistic enterprise procurement pipeline where AI agents must complete multi-step workflows β from parsing purchase requisitions to drafting purchase orders and flagging approvals.
Designed to benchmark and train agents on structured decision-making in real-world business contexts.
| Difficulty | Description | Steps | Score |
|---|---|---|---|
| π’ Easy | Parse purchase requisition β match correct inventory item | 1 | 0.92 |
| π‘ Medium | Parse requisition β check inventory β draft purchase order | 3 | 0.92 |
| π΄ Hard | Full pipeline: parse β inventory β supplier β PO β approval | 5 | 0.92 |
| Method | Endpoint | Description |
|---|---|---|
POST |
/reset |
Reset environment β works with or without body |
POST |
/step |
Execute an agent action and get observation |
GET |
/state/{task_id} |
Get current state for a task |
GET |
/tasks |
List all tasks with action schemas |
GET |
/grader |
Run all graders β returns sigmoid scores in (0,1) |
GET |
/baseline |
Baseline scores for comparison |
- Financial Guardrails β validates
total_cost == quantity Γ unit_priceon every PO draft - Prerequisite Enforcement β agents cannot skip steps (no PO before inventory check)
- Graceful Error Handling β unknown actions return
done=Falseallowing agent recovery - Sigmoid Grading β smooth scoring strictly within (0, 1) using sigmoid normalization
- Partial Reward Signals β rewards at every step, not just episode end
- Skill System β composable, ordered workflow action definitions per task
- Episode Memory β remembers past workflow decisions across resets
- Jittered Backoff Retry β decorrelated exponential retry on LLM failures
- Trajectory Compressor β compresses long episode histories to fit token budgets
- Observation-Driven Loop β LLM decides next action based on environment observation
parse_requisition β payload: {req_id, item_id} check_inventory β payload: {item_id} message_supplier β payload: {item_id} draft_po β payload: {item_id, quantity, total_cost, department} flag_approval β payload: {approver}
{
"task_id": "easy|medium|hard",
"step": 0,
"result": {},
"reward": 0.0,
"done": false,
"info": ""
}βββ inference.py # Main inference script (observation-driven LLM agent)
βββ openenv.yaml # OpenEnv configuration
βββ Dockerfile # Container definition (python:3.11-slim, port 7860)
βββ requirements.txt # Dependencies
βββ server/
β βββ main.py # Launch the server of uvicorn (app, host="0.0.0.0", port=8000)
βββ app/
β βββ main.py # FastAPI routes and endpoints
β βββ environment.py # WorkflowEnvironment β state machine + reward logic
β βββ models.py # Pydantic typed models
β βββ grader.py # Sigmoid grading logic
β βββ mock_backend.py # Mock enterprise backend (15 items, 13 requisitions)
βββ agent/
β βββ skills.py # Composable skill system with episode memory
β βββ retry_utils.py # Jittered backoff retry
β βββ trajectory.py # Trajectory compressor
β βββ baseline.py # Multi-episode baseline runner
βββ tasks/
β βββ easy.py # Easy task definition
β βββ medium.py # Medium task definition
β βββ hard.py # Hard task definition
βββ tests/
βββ debug_inference.py
βββ final_validation.py # Full environment logic validation (5 tests)
βββ test_with_mock.py # Mocked LLM tests
curl -X POST https://ma4ku2-enterprise-workflow-env.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "hard"}'curl -X POST https://ma4ku2-enterprise-workflow-env.hf.space/step \
-H "Content-Type: application/json" \
-d '{"task_id": "easy", "action_type": "parse_requisition", "payload": {"req_id": "REQ-001", "item_id": "ITM-001"}}'curl https://ma4ku2-enterprise-workflow-env.hf.space/graderexport OPENAI_API_KEY="<your_openrouter_key>"
export API_BASE_URL="https://openrouter.ai/api/v1"
export MODEL_NAME="openai/gpt-oss-120b:free"
python3 inference.py# Clone the repo
git clone https://huggingface.co/spaces/MA4KU2/enterprise-workflow-env
cd enterprise-workflow-env
# Install dependencies
pip install -r requirements.txt
# Set your API key
export OPENROUTER_API_KEY="your-key-here"
export API_BASE_URL="https://openrouter.ai/api/v1"
export MODEL_NAME="openai/gpt-oss-120b:free"
# Launch the server
PYTHONPATH=$(pwd) python server/main.py
# β Server running at http://0.0.0.0:8000Or use the convenience launcher:
./run.sh[START] task=easy env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.99 done=true error=null
[END] success=true steps=1 score=0.990 rewards=0.99
[START] task=medium env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.33 done=false error=null
[STEP] step=2 action=check_inventory reward=0.33 done=false error=null
[STEP] step=3 action=draft_po reward=0.33 done=true error=null
[END] success=true steps=3 score=0.990 rewards=0.33,0.33,0.33
[START] task=hard env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.20 done=false error=null
[STEP] step=2 action=check_inventory reward=0.20 done=false error=null
[STEP] step=3 action=message_supplier reward=0.20 done=false error=null
[STEP] step=4 action=draft_po reward=0.20 done=false error=null
[STEP] step=5 action=flag_approval reward=0.19 done=true error=null
[END] success=true steps=5 score=0.990 rewards=0.20,0.20,0.20,0.20,0.19
Built entirely on a Raspberry Pi 5 (8GB RAM) running Kali Linux (aarch64/ARM64) OS β no cloud compute, no GPU.
All Docker builds, local testing, and HuggingFace deployments executed directly on ARM64 hardware, proving the environment is lightweight and portable.
Uses OpenRouter as API gateway with free-tier models β entire development pipeline completely free.
Compatible with any OpenAI-compatible endpoint via environment variables:
OPENAI_API_KEYβ your API keyAPI_BASE_URLβ swap to any providerMODEL_NAMEβ swap to any model
- Temporal.io for durable long-running workflow execution
- Merge.dev for ERP integration (SAP, Oracle, Workday)
- Pinecone/Supabase for corporate procurement memory via RAG
- LangSmith for LLM observability and audit trails
- NeMo Guardrails for prompt injection protection
- CLI Interface to directly interact with the Agent Directly
- n8n for no-code workflow automation layer
- LangFlow for visual agent pipeline builder
- MCP servers for dynamic tool discovery and capability extension
- OpenClaw multi-agent orchestration for parallel procurement workflows
- Redis for short-term agent memory and API call caching
- Slack/Teams integration for Human-in-the-Loop approvals
- FastAPI β High-performance API framework
- OpenEnv β Environment standard for AI agent training
- Docker β Containerized deployment
- Hugging Face Spaces β Deployment platform
- OpenRouter β Zero-cost LLM inference gateway
Submitted to the Meta PyTorch OpenEnv Hackathon, hosted by Scaler School of Technology (SST), Hugging Face, PyTorch, and Meta.
MIT License
Made with β€οΈ for the AI agent research community.