Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
97ac736
spec file for regression test
Mar 18, 2025
2fefc4a
nlp initial commit beforre pull request
Apr 17, 2025
22d4162
adding master and data files to nlp branch
Apr 17, 2025
3c0f62f
Added LLM training from scratch, finetuning, RAG, and some agentic ca…
Aug 16, 2025
cf01c9b
Update smlp_subgroups.py
zurabksmlp Oct 24, 2025
3a11e21
Update smlp_utils.py
zurabksmlp Oct 24, 2025
81172c9
Added support for LLM generated result quality validation using LLM-a…
Jan 7, 2026
2c8d5a5
code changes to support LLM-as-Judge capability
Jan 7, 2026
44e50bc
added test for LLM-as-Judge flow
Jan 7, 2026
6b950e5
Merge branch 'nlp_text.rebased' of github.com:fbrausse/smlp into nlp_…
Jan 7, 2026
af3b3d8
small fixes to SMLP MCP
Jan 8, 2026
158d8ab
updated gitignore
Jan 8, 2026
da815d9
RL enhancemnt to SMLP Agent
Feb 11, 2026
d341ee5
Merge branch 'master' into nlp_text.rebased
Feb 11, 2026
4dd15fe
Add SMLP RL Agent core files
Feb 23, 2026
29d3d52
Complete SMLP RL Agent implementation
Feb 23, 2026
044878c
small fixes in SMLP RL Agent
Feb 25, 2026
d25e637
Remove obsolete RL Agent documentation and integration files
Feb 25, 2026
3c3676a
Add comprehensive test suite for SMLP RL Agent
Feb 26, 2026
dd5af58
Merge master into nlp_text.rebased after resolving conflicts
Mar 28, 2026
7f9292f
Restore stashed work and resolve merge conflicts
Mar 29, 2026
3bfc4fb
Merge master into nlp_text.rebased and resolve conflicts
Mar 29, 2026
f4b0f5f
re-add "extended" manual
fbrausse Apr 1, 2026
d39cbe6
merge from master
May 9, 2026
091af41
removed redundant assertion from smlp_subgroups.py
May 9, 2026
b750cd4
Record regression model JSONs
May 9, 2026
fdce7fa
Ignore local model artifacts and update Test228 regression masters
May 9, 2026
1093859
update master files
May 10, 2026
5fb8b71
update master json files
May 10, 2026
9e36b05
initial support for mlflow based model saving and loading in RAG mode
May 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
*.d
*.ipynb*

/regr_smlp/code/Test*.*
/regr_smlp/code/test*.*
/regr_smlp/code/__pycache__/
/regr_smlp/code/all_log.txt
/regr_smlp/code/logs.log

# RL Agent runtime files
src/smlp_last_run.log
src/smlp_rl_feedback.jsonl
src/demo_feedback.jsonl
src/smlp_rl_checkpoints/
src/demo_checkpoints/

# Local-only model artifacts (do not commit)
regr_smlp/finetune_models/
regr_smlp/llm_models/
regr_smlp/rag_models/
Binary file added doc/smlp_manual_extended.pdf
Binary file not shown.
102 changes: 102 additions & 0 deletions pytest/README_pytest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# SMLP RL Agent Tests

This directory contains tests for the SMLP RL Agent implementation.

## Test Structure

### Unit Tests (`test_rl_agent.py`)
Tests core RL components without LLM dependencies. **Fully deterministic.**

**Components tested:**
- RewardModel: reward computation logic
- ExampleStore: example storage and retrieval
- PromptOptimizer: UCB selection algorithm
- Utility functions: noise filtering, error detection

**Run:**
```bash
pytest test_rl_agent.py -v
```

### Integration Tests (`test_rl_server.py`)
Tests API endpoints with mocked LLM responses. **Deterministic.**

**Endpoints tested:**
- `/generate`: Command generation
- `/feedback`: Feedback submission
- `/execute`: Command execution (dry-run)
- `/stats`: Training statistics
- `/store`: Example store management
- `/config`: Server configuration

**Run:**
```bash
pytest test_rl_server.py -v
```

### End-to-End Tests (`test_e2e_manual.py`)
Full workflow tests with real LLM. **Non-deterministic - manual testing only.**

**Prerequisites:**
1. Start server: `python smlp_agent_rl_server.py --port 8000`
2. Pre-load model: `ollama run deepseek-r1:1.5b "test"`

**Run:**
```bash
python test_e2e_manual.py
```

**Note:** E2E tests may fail due to LLM flakiness. This is expected and acceptable.

## Running All Tests

```bash
# Run deterministic tests (suitable for CI/CD)
pytest test_rl_agent.py test_rl_server.py -v

# Run all tests including manual E2E
pytest test_rl_agent.py test_rl_server.py -v && python test_e2e_manual.py
```

## Test Coverage

| Component | Unit | Integration | E2E |
|-----------|------|-------------|-----|
| RewardModel | ✅ | ✅ | ✅ |
| ExampleStore | ✅ | ✅ | ✅ |
| PromptOptimizer | ✅ | ⚠️ | ✅ |
| Server API | ➖ | ✅ | ✅ |
| LLM Integration | ➖ | ⚠️ (mocked) | ✅ |
| Full Workflow | ➖ | ➖ | ✅ |

✅ = Fully covered
⚠️ = Partially covered
➖ = Not applicable

## CI/CD Integration

**Include in CI/CD:**
- `test_rl_agent.py` ✅
- `test_rl_server.py` ✅

**Exclude from CI/CD:**
- `test_e2e_manual.py` ❌ (non-deterministic, requires running server)

## Dependencies

```bash
pip install pytest pytest-cov
```

## Future Extensions

As SMLP grows, add test directories for other modules:
```
pytest/
├── test_rl_agent.py # RL Agent tests
├── test_rl_server.py # RL Server tests
├── test_e2e_manual.py # E2E tests
├── test_smlp_flows.py # (future) SMLP flows tests
├── test_smlp_agent.py # (future) Standard agent tests
└── README.md # This file
```
Loading