Skip to content

MA4KU2/enterprise-workflow-env

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

53 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

title Enterprise Workflow OpenEnv
emoji 🏒
colorFrom blue
colorTo indigo
sdk docker
pinned false

🏒 Enterprise Workflow OpenEnv

A real-world OpenEnv-compatible environment for training AI agents on enterprise procurement workflows β€” built for the Meta Γ— PyTorch Γ— Hugging Face OpenEnv Hackathon hosted by Scaler School of Technology (SST).

HF Space

Demo


πŸš€ Overview

Enterprise Workflow OpenEnv simulates a realistic enterprise procurement pipeline where AI agents must complete multi-step workflows β€” from parsing purchase requisitions to drafting purchase orders and flagging approvals.

Designed to benchmark and train agents on structured decision-making in real-world business contexts.


🧩 Tasks

Difficulty Description Steps Score
🟒 Easy Parse purchase requisition β†’ match correct inventory item 1 0.92
🟑 Medium Parse requisition β†’ check inventory β†’ draft purchase order 3 0.92
πŸ”΄ Hard Full pipeline: parse β†’ inventory β†’ supplier β†’ PO β†’ approval 5 0.92

πŸ”Œ API Endpoints

Method Endpoint Description
POST /reset Reset environment β€” works with or without body
POST /step Execute an agent action and get observation
GET /state/{task_id} Get current state for a task
GET /tasks List all tasks with action schemas
GET /grader Run all graders β€” returns sigmoid scores in (0,1)
GET /baseline Baseline scores for comparison

πŸ—οΈ Architecture

Environment Design

  • Financial Guardrails β€” validates total_cost == quantity Γ— unit_price on every PO draft
  • Prerequisite Enforcement β€” agents cannot skip steps (no PO before inventory check)
  • Graceful Error Handling β€” unknown actions return done=False allowing agent recovery
  • Sigmoid Grading β€” smooth scoring strictly within (0, 1) using sigmoid normalization
  • Partial Reward Signals β€” rewards at every step, not just episode end

Agent Design

  • Skill System β€” composable, ordered workflow action definitions per task
  • Episode Memory β€” remembers past workflow decisions across resets
  • Jittered Backoff Retry β€” decorrelated exponential retry on LLM failures
  • Trajectory Compressor β€” compresses long episode histories to fit token budgets
  • Observation-Driven Loop β€” LLM decides next action based on environment observation

Action Space

parse_requisition β†’ payload: {req_id, item_id} check_inventory β†’ payload: {item_id} message_supplier β†’ payload: {item_id} draft_po β†’ payload: {item_id, quantity, total_cost, department} flag_approval β†’ payload: {approver}

Observation Space

{
  "task_id": "easy|medium|hard",
  "step": 0,
  "result": {},
  "reward": 0.0,
  "done": false,
  "info": ""
}

πŸ—‚οΈ Project Structure

β”œβ”€β”€ inference.py            # Main inference script (observation-driven LLM agent)
β”œβ”€β”€ openenv.yaml            # OpenEnv configuration
β”œβ”€β”€ Dockerfile              # Container definition (python:3.11-slim, port 7860)
β”œβ”€β”€ requirements.txt        # Dependencies
β”œβ”€β”€ server/
β”‚   └── main.py             # Launch the server of uvicorn (app, host="0.0.0.0", port=8000)
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py             # FastAPI routes and endpoints
β”‚   β”œβ”€β”€ environment.py      # WorkflowEnvironment β€” state machine + reward logic
β”‚   β”œβ”€β”€ models.py           # Pydantic typed models
β”‚   β”œβ”€β”€ grader.py           # Sigmoid grading logic
β”‚   └── mock_backend.py     # Mock enterprise backend (15 items, 13 requisitions)
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ skills.py           # Composable skill system with episode memory
β”‚   β”œβ”€β”€ retry_utils.py      # Jittered backoff retry
β”‚   β”œβ”€β”€ trajectory.py       # Trajectory compressor
β”‚   └── baseline.py         # Multi-episode baseline runner
β”œβ”€β”€ tasks/
β”‚   β”œβ”€β”€ easy.py             # Easy task definition
β”‚   β”œβ”€β”€ medium.py           # Medium task definition
β”‚   └── hard.py             # Hard task definition
└── tests/
    β”œβ”€β”€ debug_inference.py
    β”œβ”€β”€ final_validation.py # Full environment logic validation (5 tests)
    └── test_with_mock.py   # Mocked LLM tests

⚑ Quick Start

Reset Environment

curl -X POST https://ma4ku2-enterprise-workflow-env.hf.space/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "hard"}'

Execute a Step

curl -X POST https://ma4ku2-enterprise-workflow-env.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{"task_id": "easy", "action_type": "parse_requisition", "payload": {"req_id": "REQ-001", "item_id": "ITM-001"}}'

Check Grader Scores

curl https://ma4ku2-enterprise-workflow-env.hf.space/grader

Run Inference Agent

export OPENAI_API_KEY="<your_openrouter_key>"
export API_BASE_URL="https://openrouter.ai/api/v1"
export MODEL_NAME="openai/gpt-oss-120b:free"
python3 inference.py

πŸ–₯️ Self-Host Locally

# Clone the repo
git clone https://huggingface.co/spaces/MA4KU2/enterprise-workflow-env
cd enterprise-workflow-env

# Install dependencies
pip install -r requirements.txt

# Set your API key
export OPENROUTER_API_KEY="your-key-here"
export API_BASE_URL="https://openrouter.ai/api/v1"
export MODEL_NAME="openai/gpt-oss-120b:free"

# Launch the server
PYTHONPATH=$(pwd) python server/main.py
# β†’ Server running at http://0.0.0.0:8000

Or use the convenience launcher:

./run.sh

πŸ“‹ Baseline Scores

[START] task=easy env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.99 done=true error=null
[END] success=true steps=1 score=0.990 rewards=0.99
[START] task=medium env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.33 done=false error=null
[STEP] step=2 action=check_inventory reward=0.33 done=false error=null
[STEP] step=3 action=draft_po reward=0.33 done=true error=null
[END] success=true steps=3 score=0.990 rewards=0.33,0.33,0.33
[START] task=hard env=enterprise-workflow-env model=openai/gpt-oss-20b:free
[STEP] step=1 action=parse_requisition reward=0.20 done=false error=null
[STEP] step=2 action=check_inventory reward=0.20 done=false error=null
[STEP] step=3 action=message_supplier reward=0.20 done=false error=null
[STEP] step=4 action=draft_po reward=0.20 done=false error=null
[STEP] step=5 action=flag_approval reward=0.19 done=true error=null
[END] success=true steps=5 score=0.990 rewards=0.20,0.20,0.20,0.20,0.19

πŸ’» Built On Constrained Hardware

Built entirely on a Raspberry Pi 5 (8GB RAM) running Kali Linux (aarch64/ARM64) OS β€” no cloud compute, no GPU.

All Docker builds, local testing, and HuggingFace deployments executed directly on ARM64 hardware, proving the environment is lightweight and portable.


πŸ”‘ Zero-Cost Inference Stack

Uses OpenRouter as API gateway with free-tier models β€” entire development pipeline completely free.

Compatible with any OpenAI-compatible endpoint via environment variables:

  • OPENAI_API_KEY β€” your API key
  • API_BASE_URL β€” swap to any provider
  • MODEL_NAME β€” swap to any model

πŸ—ΊοΈ Production Roadmap

Enterprise Integrations

  • Temporal.io for durable long-running workflow execution
  • Merge.dev for ERP integration (SAP, Oracle, Workday)
  • Pinecone/Supabase for corporate procurement memory via RAG
  • LangSmith for LLM observability and audit trails
  • NeMo Guardrails for prompt injection protection
  • CLI Interface to directly interact with the Agent Directly

Automation & Orchestration

  • n8n for no-code workflow automation layer
  • LangFlow for visual agent pipeline builder
  • MCP servers for dynamic tool discovery and capability extension

Agent Infrastructure

  • OpenClaw multi-agent orchestration for parallel procurement workflows
  • Redis for short-term agent memory and API call caching
  • Slack/Teams integration for Human-in-the-Loop approvals

πŸ› οΈ Built With

  • FastAPI β€” High-performance API framework
  • OpenEnv β€” Environment standard for AI agent training
  • Docker β€” Containerized deployment
  • Hugging Face Spaces β€” Deployment platform
  • OpenRouter β€” Zero-cost LLM inference gateway

πŸ† Hackathon

Submitted to the Meta PyTorch OpenEnv Hackathon, hosted by Scaler School of Technology (SST), Hugging Face, PyTorch, and Meta.


πŸ“„ License

MIT License

Made with ❀️ for the AI agent research community.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages