Skip to content

Commit c38d717

Browse files
fdidonatofrdidonato
authored andcommitted
Introduce mypy check and modified git ci workflow to reproduce this test too
1 parent 499633b commit c38d717

61 files changed

Lines changed: 695 additions & 393 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,29 @@ jobs:
1818
with:
1919
python-version: ${{ matrix.python-version }}
2020

21-
# One shell session so `source venv/bin/activate` applies to pip, black, pytest.
22-
- name: Install deps and run checks
21+
# One shell session so `source venv/bin/activate` applies to all tools.
22+
- name: Install dependencies
2323
run: |
2424
python -m venv venv
2525
source venv/bin/activate
2626
python -m pip install --upgrade pip
2727
pip install -e ".[dev,ui]"
28+
29+
- name: Lint & Format
30+
run: |
31+
source venv/bin/activate
32+
ruff check .
2833
black --check .
29-
pytest --maxfail=1 --disable-warnings -q
34+
35+
- name: Type Check
36+
run: |
37+
source venv/bin/activate
38+
mypy moralstack --ignore-missing-imports
39+
40+
- name: Tests with Coverage
41+
run: |
42+
source venv/bin/activate
43+
pytest --cov=moralstack --cov-report=xml --cov-report=term --maxfail=3
44+
45+
- name: Upload Coverage
46+
uses: codecov/codecov-action@v4

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ env/
1919

2020
# Pytest
2121
.pytest_cache/
22+
.coverage
23+
coverage.xml
24+
htmlcov/
2225

2326
# mypy
2427
.mypy_cache/

.pre-commit-config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,11 @@ repos:
1616
hooks:
1717
- id: black
1818
language_version: python3.11
19+
20+
- repo: local
21+
hooks:
22+
- id: mypy
23+
name: mypy
24+
entry: mypy moralstack --ignore-missing-imports
25+
language: system
26+
pass_filenames: false

INSTALL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ points.
2929
pip install -e ".[dev,ui]"
3030
```
3131

32-
**Development only (pytest, ruff):**
32+
**Development only (pytest, pytest-cov, ruff, black, mypy):**
3333

3434
```bash
3535
pip install -e .[dev]

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
![Status](https://img.shields.io/badge/status-research--stage-yellow)
88
![Compliance](https://img.shields.io/badge/benchmark-98.8%25%20compliance-brightgreen)
99
![Model](https://img.shields.io/badge/model-GPT--4o-412991)
10+
[![CI](https://github.com/fdidonato/moralstack/actions/workflows/ci.yml/badge.svg)](https://github.com/fdidonato/moralstack/actions/workflows/ci.yml)
11+
[![codecov](https://codecov.io/gh/fdidonato/moralstack/graph/badge.svg)](https://codecov.io/gh/fdidonato/moralstack)
1012

1113
MoralStack is a governance layer that decides **whether**, **how**, and **under what constraints** a response should be
1214
generated before text generation starts.

docs/DEVELOPMENT.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,14 @@ pip install -e .[dev]
1111
## Tools
1212

1313
- **pytest** — Run tests: `pytest`
14+
- **pytest-cov** — Coverage reports: `pytest --cov=moralstack --cov-report=xml --cov-report=term`
1415
- **ruff** — Linting and formatting: `ruff check .` / `ruff format .`
1516
- **black** — Format check: `black --check .` (or `black .` to reformat)
1617
- **mypy** — Type checking: `mypy moralstack`
1718

1819
## Pre-commit Hooks
1920

20-
Pre-commit hooks run cheap checks (format, lint, whitespace) automatically before every commit.
21+
Pre-commit hooks run automated checks (format, lint, whitespace, type checks) before every commit.
2122

2223
**Setup (one-time):**
2324

@@ -38,11 +39,13 @@ pre-commit run --all-files
3839
git commit --no-verify
3940
```
4041

41-
Active hooks: `trailing-whitespace`, `end-of-file-fixer`, `ruff check --fix`, `black`.
42+
Active hooks: `trailing-whitespace`, `end-of-file-fixer`, `ruff check --fix`, `black`, `mypy moralstack`.
4243

4344
## CI
4445

45-
The workflow in `.github/workflows/ci.yml` runs tests on Python 3.11 and 3.12 with `pip install -e .[dev]` and `pytest`.
46+
The workflow in `.github/workflows/ci.yml` runs on Python 3.11 and 3.12 with `pip install -e .[dev,ui]`, then executes:
47+
`ruff check .`, `black --check .`, `mypy moralstack --ignore-missing-imports`, and `pytest --cov=moralstack
48+
--cov-report=xml --cov-report=term --maxfail=3`.
4649

4750
## Generated Artifacts
4851

docs/architecture_spec.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,7 @@ class ResponseMetadata:
783783
intent_clarity: str = ""
784784
misuse_plausibility: str = ""
785785
actionability_risk: str = ""
786+
decision_correctness: dict[str, Any] | None = None # optional DCF payload (diagnostics.attach_decision_correctness)
786787
```
787788

788789
**Construction**: ResponseMetadata must be built via factory methods so all paths produce consistent metadata. Do not construct `ResponseMetadata` manually for request flows. Use:

docs/modules/orchestrator.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,7 @@ class ResponseMetadata:
402402
intent_clarity: str # LOW | MEDIUM | HIGH (semantic signals)
403403
misuse_plausibility: str # LOW | MEDIUM | HIGH
404404
actionability_risk: str # LOW | MEDIUM | HIGH
405+
decision_correctness: dict[str, Any] | None # optional DCF payload from diagnostics
405406
```
406407

407408
**Construction**: Always build metadata via factory methods for consistency across paths (fast, deliberative, safe_complete, domain_excluded, system error). Use `ResponseMetadata.from_decision(...)` for flows that have a `Decision` (and optional `DecisionExplanation`); use `ResponseMetadata.for_system_error(...)`, `for_domain_excluded(...)`, or `for_fail_safe(...)` for timeout, domain-excluded, and FAIL_SAFE fallback respectively. See `docs/architecture_spec.md` (ResponseMetadata Construction) for the full list.

moralstack/cli/mocks.py

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Mock modules for MoralStack CLI testing without API.
33
"""
44

5-
from typing import Any
5+
from typing import Any, Literal
66

77

88
class MockPolicy:
@@ -66,32 +66,55 @@ def estimate(self, prompt: str) -> Any:
6666
class MockCritic:
6767
"""Constitutional critic mock."""
6868

69-
def critique(self, request: Any, response: str, constitution: Any = None, **kwargs) -> Any:
69+
def critique(self, request: Any, response: str, constitution: Any = None, **kwargs: Any) -> Any:
7070
"""Mock critique."""
7171
from dataclasses import dataclass, field
7272

7373
@dataclass
7474
class MockCritique:
75-
violations: list = field(default_factory=list)
75+
violations: list[Any] = field(default_factory=list)
7676
severity_score: float = 0.0
7777
has_critical_violations: bool = False
7878
revision_guidance: str = ""
7979

8080
return MockCritique()
8181

82+
def critique_with_relevant_principles(
83+
self,
84+
request: str,
85+
response: str,
86+
domain: str | None = None,
87+
request_id: str = "",
88+
delib_context: Any = None,
89+
context_mode: Literal["full", "thin"] = "full",
90+
previous_violations: str = "",
91+
previous_guidance: str = "",
92+
) -> Any:
93+
"""Mock path aligned with LLMConstitutionalCritic (delegates to critique)."""
94+
return self.critique(
95+
request,
96+
response,
97+
None,
98+
request_id=request_id,
99+
delib_context=delib_context,
100+
context_mode=context_mode,
101+
previous_violations=previous_violations,
102+
previous_guidance=previous_guidance,
103+
)
104+
82105

83106
class MockSimulator:
84107
"""Consequence simulator mock."""
85108

86-
def simulate(self, request: Any, response: str, num_scenarios: int = 3, **kwargs) -> list:
109+
def simulate(self, request: Any, response: str, num_scenarios: int = 3, **kwargs: Any) -> list[Any]:
87110
"""Simulates mock consequences."""
88111
return []
89112

90113

91114
class MockHindsight:
92115
"""Hindsight evaluator mock."""
93116

94-
def evaluate(self, request: str, response: str, consequences: list, **kwargs) -> Any:
117+
def evaluate(self, request: str, response: str, consequences: list[Any], **kwargs: Any) -> Any:
95118
"""Mock hindsight evaluation."""
96119
from dataclasses import dataclass, field
97120

@@ -111,7 +134,7 @@ class MockAggregatedHindsight:
111134

112135
@dataclass
113136
class MockHindsightResult:
114-
evaluations: list = field(default_factory=list)
137+
evaluations: list[Any] = field(default_factory=list)
115138
aggregated: MockAggregatedHindsight = field(default_factory=MockAggregatedHindsight)
116139

117140
return MockHindsightResult()
@@ -120,19 +143,19 @@ class MockHindsightResult:
120143
class MockPerspectives:
121144
"""Perspective ensemble mock."""
122145

123-
def evaluate(self, request: Any, response: str, **kwargs) -> Any:
146+
def evaluate(self, request: Any, response: str, **kwargs: Any) -> Any:
124147
"""Mock perspectives evaluation."""
125148
from dataclasses import dataclass, field
126149

127150
@dataclass
128151
class MockPerspectiveAggregation:
129152
overall_score: float = 0.8
130-
concerns: list = field(default_factory=list)
153+
concerns: list[Any] = field(default_factory=list)
131154
consensus_level: float = 0.9
132155

133156
@dataclass
134157
class MockPerspectiveResult:
135-
results: list = field(default_factory=list)
158+
results: list[Any] = field(default_factory=list)
136159
aggregation: MockPerspectiveAggregation = field(default_factory=MockPerspectiveAggregation)
137160

138161
return MockPerspectiveResult()
@@ -147,10 +170,10 @@ def get_constitution(self, domain: str | None = None) -> Any:
147170

148171
@dataclass
149172
class MockConstitution:
150-
principles: list = field(default_factory=list)
173+
principles: list[Any] = field(default_factory=list)
151174

152175
return MockConstitution()
153176

154-
def get_relevant_principles(self, query: str, top_k: int = 10, domain: str | None = None) -> list:
177+
def get_relevant_principles(self, query: str, top_k: int = 10, domain: str | None = None) -> list[Any]:
155178
"""Returns empty list (no principles needed for mock)."""
156179
return []

moralstack/cli/models.py

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,9 @@ class PhaseResult:
7777
output_summary: str
7878
decision: Optional[str] = None
7979
decision_reason: Optional[str] = None
80-
details: dict = field(default_factory=dict)
81-
errors: list = field(default_factory=list)
82-
warnings: list = field(default_factory=list)
80+
details: dict[str, Any] = field(default_factory=dict)
81+
errors: list[str] = field(default_factory=list)
82+
warnings: list[str] = field(default_factory=list)
8383

8484

8585
@dataclass
@@ -99,18 +99,18 @@ class TraceParseResult:
9999
phase_type: PhaseType
100100
decision: Optional[str] = None
101101
decision_reason: Optional[str] = None
102-
details: dict = field(default_factory=dict)
103-
errors: list = field(default_factory=list)
104-
warnings: list = field(default_factory=list)
102+
details: dict[str, Any] = field(default_factory=dict)
103+
errors: list[str] = field(default_factory=list)
104+
warnings: list[str] = field(default_factory=list)
105105
risk_score: Optional[float] = None
106106
risk_category: Optional[str] = None
107-
draft_revisions: list = field(default_factory=list)
107+
draft_revisions: list[DraftRevision] = field(default_factory=list)
108108

109109

110-
def _parse_risk_trace(call: dict) -> TraceParseResult:
110+
def _parse_risk_trace(call: dict[str, Any]) -> TraceParseResult:
111111
"""Parse risk_estimator call into trace phase data."""
112112
response = call.get("full_response", call.get("response", ""))
113-
details: dict = {}
113+
details: dict[str, Any] = {}
114114
risk_score: Optional[float] = None
115115
risk_category: Optional[str] = None
116116

@@ -155,7 +155,7 @@ def _parse_risk_trace(call: dict) -> TraceParseResult:
155155
)
156156

157157

158-
def _parse_policy_trace(call: dict, current_cycle: int) -> Optional[TraceParseResult]:
158+
def _parse_policy_trace(call: dict[str, Any], current_cycle: int) -> Optional[TraceParseResult]:
159159
"""Parse policy call into trace phase data."""
160160
action = call.get("action", "")
161161
prompt = call.get("full_prompt", call.get("prompt", ""))
@@ -213,12 +213,12 @@ def _parse_policy_trace(call: dict, current_cycle: int) -> Optional[TraceParseRe
213213
return None
214214

215215

216-
def _parse_critic_trace(call: dict) -> TraceParseResult:
216+
def _parse_critic_trace(call: dict[str, Any]) -> TraceParseResult:
217217
"""Parse critic call into trace phase data."""
218218
action = call.get("action", "")
219219
response = call.get("full_response", call.get("response", ""))
220-
details: dict = {}
221-
errors: list = []
220+
details: dict[str, Any] = {}
221+
errors: list[str] = []
222222
decision = None
223223
decision_reason = None
224224

@@ -275,10 +275,10 @@ def _parse_critic_trace(call: dict) -> TraceParseResult:
275275
)
276276

277277

278-
def _parse_simulator_trace(call: dict) -> TraceParseResult:
278+
def _parse_simulator_trace(call: dict[str, Any]) -> TraceParseResult:
279279
"""Parse simulator call into trace phase data."""
280280
response = call.get("full_response", call.get("response", ""))
281-
details: dict = {}
281+
details: dict[str, Any] = {}
282282
decision = None
283283
decision_reason = None
284284

@@ -342,10 +342,10 @@ def _parse_simulator_trace(call: dict) -> TraceParseResult:
342342
)
343343

344344

345-
def _parse_hindsight_trace(call: dict) -> TraceParseResult:
345+
def _parse_hindsight_trace(call: dict[str, Any]) -> TraceParseResult:
346346
"""Parse hindsight call into trace phase data."""
347347
response = call.get("full_response", call.get("response", ""))
348-
details: dict = {}
348+
details: dict[str, Any] = {}
349349
decision = None
350350
decision_reason = None
351351

@@ -381,10 +381,10 @@ def _parse_hindsight_trace(call: dict) -> TraceParseResult:
381381
)
382382

383383

384-
def _parse_perspectives_trace(call: dict) -> TraceParseResult:
384+
def _parse_perspectives_trace(call: dict[str, Any]) -> TraceParseResult:
385385
"""Parse perspectives call into trace phase data."""
386386
response = call.get("full_response", call.get("response", ""))
387-
details: dict = {}
387+
details: dict[str, Any] = {}
388388
decision = None
389389
decision_reason = None
390390

@@ -436,13 +436,13 @@ class DeliberationTrace:
436436
# Risk estimation
437437
risk_score: float = 0.0
438438
risk_category: str = ""
439-
risk_signals: list = field(default_factory=list)
439+
risk_signals: list[str] = field(default_factory=list)
440440

441441
# Phases
442-
phases: list = field(default_factory=list)
442+
phases: list[PhaseResult] = field(default_factory=list)
443443

444444
# Draft revision history
445-
draft_history: list = field(default_factory=list) # List[DraftRevision]
445+
draft_history: list[DraftRevision] = field(default_factory=list)
446446

447447
# Final outcome
448448
response_type: str = ""
@@ -451,12 +451,12 @@ class DeliberationTrace:
451451
converged: bool = False
452452

453453
# Errors and warnings
454-
errors: list = field(default_factory=list)
455-
warnings: list = field(default_factory=list)
454+
errors: list[str] = field(default_factory=list)
455+
warnings: list[str] = field(default_factory=list)
456456

457457
# Constitution
458-
relevant_principles: list = field(default_factory=list)
459-
triggered_principles: list = field(default_factory=list)
458+
relevant_principles: list[str] = field(default_factory=list)
459+
triggered_principles: list[str] = field(default_factory=list)
460460

461461
def add_phase(self, phase: PhaseResult) -> None:
462462
"""Adds a phase to the trace."""
@@ -466,11 +466,11 @@ def total_duration_ms(self) -> float:
466466
"""Total duration in milliseconds."""
467467
if self.end_time > 0 and self.start_time > 0:
468468
return (self.end_time - self.start_time) * 1000
469-
return sum(p.duration_ms for p in self.phases)
469+
return float(sum(p.duration_ms for p in self.phases))
470470

471-
def get_phases_by_cycle(self) -> dict:
471+
def get_phases_by_cycle(self) -> dict[int, list[PhaseResult]]:
472472
"""Groups phases by cycle."""
473-
by_cycle: dict[int, list[Any]] = {}
473+
by_cycle: dict[int, list[PhaseResult]] = {}
474474
for phase in self.phases:
475475
if phase.cycle not in by_cycle:
476476
by_cycle[phase.cycle] = []

0 commit comments

Comments
 (0)