From 390e1158dc7e08f42d35d5ee95837edba6f52270 Mon Sep 17 00:00:00 2001 From: Alex Moskowitz Date: Thu, 14 May 2026 12:12:24 -0400 Subject: [PATCH 1/3] fix(ontology): move :AnnexIII_Condition_1a and :AnnexIII_Condition_5b to governance extension Deduplicates universal regulatory content out of two per-fixture instance files into ARCO_governance_extension.ttl. Closes regulatory_alignment FAIL and traceability FAIL on the Adversarial Decoy and Blanknode Ghost fixtures by making the regulatory condition declarations visible to every fixture that imports the governance extension. Why Both audit queries (check_regulatory_alignment.sparql, check_assessment_ traceability.sparql) require the condition to be typed :RegulatoryContent in the merged graph. The conditions were declared only in ARCO_instances_sentinel.ttl (1(a)) and ARCO_instances_creditscoring.ttl (5(b)). Fixtures that don't import either file (Decoy, Ghost) referenced the conditions via iao:0000136 but the type assertion wasn't present, so both audit queries returned FAIL for fixture-distribution reasons, not fixture-semantics reasons. What changes - ARCO_governance_extension.ttl: new section "3a) REGULATORY CONTENT" adds :AnnexIII_List, :AnnexIII_Condition_1a, :AnnexIII_Condition_5b with the same triples previously in the two instance files. Triples preserved verbatim (rdfs:label, rdfs:comment, cco:prescribes, iao:0000136 targets). - ARCO_instances_sentinel.ttl: removes 10 lines (the declaration block); section header replaced with a brief migration comment. The three references via iao:0000136 :AnnexIII_Condition_1a are preserved. - ARCO_instances_creditscoring.ttl: removes 13 lines (the declaration block + the self-containedness comment); section header replaced with a brief migration comment. The three references via iao:0000136 :AnnexIII_Condition_5b are preserved. Tests (all 7 fixtures, pre vs post pipeline diff) - Sentinel, CreditScorer, VerificationKiosk: identical except entailed- triples count (+34 to +89 from the additional universal regulatory content now visible to every fixture). - DecoySystem_001: regulatory_alignment FAIL -> PASS; traceability FAIL -> PASS (closes the documented goal). - GhostSystem_001: regulatory_alignment FAIL -> PASS; traceability FAIL -> PASS; all_checks_passed false -> true. - FlagTest_BiometricSystem_WithDerogationClaim and FlagTest_CreditSystem_WithFraudProcess: no audit-row flip. Their :AssessmentDocumentation instances do not link to any regulatory condition via iao:0000136 in the source TTL, so the audit query's AssessmentDoc -> condition path is empty independent of where the condition is declared. The plan predicted these fixtures would flip; the actual cause is a separate fixture-authoring gap in ARCO_instances_flag_tests.ttl lines 90-92, 155-157. Closing that is a separate fixture edit outside this PR's scope. - Regression: test_gate_removal.py PASS; test_scenarios.py PASS (all 7 scenarios); test_kiosk_html_no_false_concretization.py PASS; test_output_provenance.py 1 failure (unchanged baseline). - HermiT vs OWL-RL cross-check: agree on every (fixture, system, query) tuple in the certificate-grade set. - SHACL conforms PASS on every fixture (unchanged). - No classification flip on any fixture; no SHACL change; no other audit-row change. Downstream consumer audit Grep across 03_TECHNICAL_CORE/, docs/, mcp/, .github/ for every reader of :AnnexIII_Condition_1a / _5b / _List / :RegulatoryContent. All reference sites either load ARCO_governance_extension.ttl (via every pipeline / test / cross-check loader) or are documentation mentions of the IRI itself. No consumer depends on the conditions being declared in a specific instance file. Deferred - :AnnexIII_Condition_1a_Exclusion in ARCO_instances_verification.ttl is a different class (verification-kiosk exclusion documentation per Recital 22 / Art 3(41)). Whether to also generalize the exclusion pattern is a separate future decision. - FlagTest fixtures' AssessmentDocs do not link to any regulatory condition; closing their regulatory_alignment FAIL is a separate fixture-authoring change. Revert git revert HEAD --- .../ontology/ARCO_governance_extension.ttl | 38 +++++++++++++++++++ .../ontology/ARCO_instances_creditscoring.ttl | 21 ++-------- .../ontology/ARCO_instances_sentinel.ttl | 16 ++------ 3 files changed, 46 insertions(+), 29 deletions(-) diff --git a/03_TECHNICAL_CORE/ontology/ARCO_governance_extension.ttl b/03_TECHNICAL_CORE/ontology/ARCO_governance_extension.ttl index 43e7fc1..57b17da 100644 --- a/03_TECHNICAL_CORE/ontology/ARCO_governance_extension.ttl +++ b/03_TECHNICAL_CORE/ontology/ARCO_governance_extension.ttl @@ -149,6 +149,44 @@ cco:Organization rdf:type owl:Class ; rdfs:label "Organization"@en ; rdfs:subClassOf bfo:0000027 . # Object Aggregate +################################################################# +# 3a) REGULATORY CONTENT — Annex III conditions (universal) +# +# These ICE instances describe what EU AI Act Regulation (EU) 2024/1689 +# Annex III prescribes for the modeled categories. They are universal +# regulatory content (one ICE per Annex III condition modeled), not +# per-fixture data. Every fixture references them via iao:0000136 +# from its :AssessmentDocumentation. +# +# Pattern: a regulatory condition ICE has type :RegulatoryContent, +# prescribes the regulated process type via cco:prescribes, and is_about +# the capability / process / role universals via iao:0000136. +# +# Moved from per-fixture files (Sentinel, CreditScoring) on 2026-05-14 +# to close regulatory_alignment FAIL on Adversarial and FlagTest fixtures. +################################################################# + +:AnnexIII_List rdf:type :RegulatoryContent ; + rdfs:label "Annex III List" ; + bfo:0000051 :AnnexIII_Condition_1a ; + bfo:0000051 :AnnexIII_Condition_5b . + +:AnnexIII_Condition_1a rdf:type :RegulatoryContent ; + rdfs:label "Annex III 1(a) (Biometric Rule)" ; + rdfs:comment "Annex III item 1(a): biometric identification of natural persons. cco:prescribes targets the regulated process TYPE (class IRI as concept-individual via OWL 2 punning) — the regulation prescribes process types, not deployment-specific tokens." ; + cco:prescribes :RemoteBiometricIdentificationProcess ; + iao:0000136 :BiometricIdentificationCapability ; + iao:0000136 :RemoteBiometricIdentificationProcess ; + iao:0000136 :NaturalPersonRole . + +:AnnexIII_Condition_5b rdf:type :RegulatoryContent ; + rdfs:label "Annex III 5(b) (Creditworthiness Rule)" ; + rdfs:comment "Annex III item 5(b): AI systems intended to evaluate the creditworthiness of natural persons or establish their credit score, with the exception of AI systems used for the purpose of detecting financial fraud. cco:prescribes targets the regulated process TYPE (class IRI as concept-individual via OWL 2 punning)." ; + cco:prescribes :CreditworthinessEvaluationProcess ; + iao:0000136 :CreditworthinessEvaluationCapability ; + iao:0000136 :CreditworthinessEvaluationProcess ; + iao:0000136 :NaturalPersonRole . + ################################################################# # 3b) REGULATORY BRIDGE AXIOMS # diff --git a/03_TECHNICAL_CORE/ontology/ARCO_instances_creditscoring.ttl b/03_TECHNICAL_CORE/ontology/ARCO_instances_creditscoring.ttl index ee9f700..6727edb 100644 --- a/03_TECHNICAL_CORE/ontology/ARCO_instances_creditscoring.ttl +++ b/03_TECHNICAL_CORE/ontology/ARCO_instances_creditscoring.ttl @@ -21,25 +21,12 @@ owl:imports . ################################################################# -# 1) REGULATORY LAYER — Annex III 5(b) +# 1) ANNEX III REGULATORY LAYER +# +# Annex III conditions moved to ARCO_governance_extension.ttl on 2026-05-14; +# references via iao:0000136 stay below. ################################################################# -# Mereological backbone (CLAUDE.md invariant 8): every modeled Annex III condition -# is `bfo:0000051` of `:AnnexIII_List`. The list is also re-asserted here with its -# rdf:type so this fixture is self-contained when loaded standalone (test_scenarios.py -# loads each fixture independently). Duplicate triples across fixtures are deduped -# at union time. See runs/loop/2026-05-09_beverley-research/audit_C_regulatory.md T2. -:AnnexIII_List rdf:type :RegulatoryContent ; - bfo:0000051 :AnnexIII_Condition_5b . - -:AnnexIII_Condition_5b rdf:type :RegulatoryContent ; - rdfs:label "Annex III 5(b) (Creditworthiness Rule)" ; - rdfs:comment "Annex III item 5(b): AI systems intended to be used to evaluate the creditworthiness of natural persons or establish their credit score, with the exception of AI systems used for the purpose of detecting financial fraud. cco:prescribes targets the regulated process TYPE (class IRI as concept-individual via OWL 2 punning) — the regulation prescribes process types, not deployment-specific tokens. This matches the Sentinel pattern and generalizes across multiple 5(b) assessments sharing this single regulatory ICE." ; - cco:prescribes :CreditworthinessEvaluationProcess ; - iao:0000136 :CreditworthinessEvaluationCapability ; - iao:0000136 :CreditworthinessEvaluationProcess ; - iao:0000136 :NaturalPersonRole . - ################################################################# # 2) SYSTEM LAYER (reality-side particulars) ################################################################# diff --git a/03_TECHNICAL_CORE/ontology/ARCO_instances_sentinel.ttl b/03_TECHNICAL_CORE/ontology/ARCO_instances_sentinel.ttl index 2e72cc2..650740f 100644 --- a/03_TECHNICAL_CORE/ontology/ARCO_instances_sentinel.ttl +++ b/03_TECHNICAL_CORE/ontology/ARCO_instances_sentinel.ttl @@ -13,20 +13,12 @@ owl:imports . ################################################################# -# 1) REGULATORY LAYER (ICE grounded to reality) +# 1) ANNEX III REGULATORY LAYER (mereological backbone) +# +# Annex III conditions moved to ARCO_governance_extension.ttl on 2026-05-14 +# as universal regulatory content; references via iao:0000136 stay below. ################################################################# -:AnnexIII_List rdf:type :RegulatoryContent ; - rdfs:label "Annex III List" ; - bfo:0000051 :AnnexIII_Condition_1a . # has part - -:AnnexIII_Condition_1a rdf:type :RegulatoryContent ; - rdfs:label "Annex III 1(a) (Biometric Rule)" ; - cco:prescribes :RemoteBiometricIdentificationProcess ; # directive: prescribes the regulated process type (Three D's — DirectiveICE → Process) - iao:0000136 :BiometricIdentificationCapability ; # is_about the capability universal - iao:0000136 :RemoteBiometricIdentificationProcess ; # is_about the regulated process type - iao:0000136 :NaturalPersonRole . # is_about the affected role - ################################################################# # 2) SYSTEM LAYER (reality-side particulars) - UPDATED ################################################################# From cffcf2e1c3993f271d3aef122d6e967580b4df03 Mon Sep 17 00:00:00 2001 From: Alex Moskowitz Date: Thu, 14 May 2026 13:29:21 -0400 Subject: [PATCH 2/3] docs(post-pr69): refresh fixture header and references after regulatory-content migration Three stale-doc fixes tied to the 2026-05-14 governance-extension move: - ARCO_instances_flag_tests.ttl header: replaces the pre-migration text ("classification PASS but audit FAIL ... minimal instances for flag testing only, without full regulatory content linkage") with the actual post-migration state. Classification and exception flag remain the test target; traceability and regulatory_alignment still FAIL but for a different reason now (local :AssessmentDocumentation -> :AnnexIII_Condition_* iao:0000136 link absent from this fixture, not fixture-distribution). - LIMITATIONS.md sec 9: file reference for the :AnnexIII_Condition_1a cco:prescribes :RemoteBiometricIdentificationProcess class-as-individual triple updated from ARCO_instances_sentinel.ttl to ARCO_governance_extension.ttl per the migration. Adds the 5(b) companion triple. Also notes that gate-removal coverage is now symmetric and adversarial- mechanism tests exist (next commit). - README.md "Gate independence is empirically verified" sentence: drops the "(Symmetric coverage for 5(b) is queued.)" parenthetical; corresponding row in the active-changes table moves from "Active work" to "Landed 2026-05-14". No pipeline behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../ontology/ARCO_instances_flag_tests.ttl | 30 +++++++++++++++---- LIMITATIONS.md | 4 +-- README.md | 4 +-- 3 files changed, 29 insertions(+), 9 deletions(-) diff --git a/03_TECHNICAL_CORE/ontology/ARCO_instances_flag_tests.ttl b/03_TECHNICAL_CORE/ontology/ARCO_instances_flag_tests.ttl index 223c4a1..13c5451 100644 --- a/03_TECHNICAL_CORE/ontology/ARCO_instances_flag_tests.ttl +++ b/03_TECHNICAL_CORE/ontology/ARCO_instances_flag_tests.ttl @@ -11,11 +11,31 @@ rdfs:label "ARCO Flag Test Instances" ; rdfs:comment """Two test cases for the audit-layer exception flags. - NOTE: Running the full pipeline on these instances will show classification PASS - but audit FAIL (traceability and regulatory alignment). This is expected and correct — - these are minimal instances for flag testing only, without full regulatory content - linkage. The same pattern applies to ARCO_instances_adversarial_*.ttl. - The classification layer and flag behavior are the only results under test here. + TEST TARGET: simultaneous OWL classification + audit-layer flag detection + on the same system, demonstrating that classification and audit do not + bleed into each other. + + Expected audit-row outcomes (post 2026-05-14 migration of regulatory content + to ARCO_governance_extension.ttl): + - classification: PASS (all three Annex III gates satisfied) + - exception flag: FLAGGED (derogation or fraud, per fixture) + - traceability: FAIL + - regulatory_alignment: FAIL + The traceability and regulatory_alignment FAILs are NOT the test target. + They persist because the :AssessmentDocumentation instances below do not + carry an iao:0000136 link to :AnnexIII_Condition_1a or :AnnexIII_Condition_5b, + so the audit queries (which require ?doc iao:0000136 ?condition) return false. + This is a fixture-authoring gap, not a defect in the classification or flag + behavior under test. Adding those links would close the audit FAILs without + affecting the classification or flag entailments — separate change. + + Prior to the 2026-05-14 migration, the audit FAILs also covered fixture- + distribution effects (regulatory condition declarations were per-fixture + inside ARCO_instances_sentinel.ttl and ARCO_instances_creditscoring.ttl, so + the universal regulatory content was invisible to this fixture). The + migration moved those declarations to the governance extension, removing + the distribution issue; what remains is the local AssessmentDoc->condition + linkage gap described above. Test A — FlagTest_BiometricSystem_WithDerogationClaim: A system that IS classified as AnnexIII1aApplicableSystem (all three gates satisfied) diff --git a/LIMITATIONS.md b/LIMITATIONS.md index 4ad78cb..af3e666 100644 --- a/LIMITATIONS.md +++ b/LIMITATIONS.md @@ -303,9 +303,9 @@ The pipeline's output layer (`run_pipeline.py` from line 1699 onward, plus `writ ## 9. Engineering gaps -- **Negative test infrastructure is incomplete.** Gate-removal regression tests ([test_gate_removal.py](03_TECHNICAL_CORE/scripts/test_gate_removal.py)) verify each gate is independently necessary by mutating axioms and confirming entailment breaks. What does **not** exist is a parameterized test harness that loads an isolated deliberately-miscategorized instance file and runs the full pipeline against it, because the pipeline currently loads all TTL files into a single graph before reasoning — a negative-case file in the graph would contaminate positive cases. Building this harness is Step 2 of the ADR-001 work plan. Noted in [docs/agent/bfo_cco_alignment_audit.md](docs/agent/bfo_cco_alignment_audit.md) §"Unresolved Engineering Problem." +- **Negative test infrastructure is incomplete.** Gate-removal regression tests ([test_gate_removal.py](03_TECHNICAL_CORE/scripts/test_gate_removal.py)) verify each gate is independently necessary for both modeled Annex III categories (1(a) Sentinel and 5(b) CreditScorer) by mutating axioms and confirming entailment breaks. Symmetric coverage across the two categories was added 2026-05-14. Adversarial-mechanism tests ([test_adversarial_mechanism.py](03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py)) verify that DecoySystem_001 classifies via `owl:equivalentClass` (not direct IRI assertion) and that GhostSystem_001 classifies via blank-node `owl:someValuesFrom` (not a named individual). What does **not** exist is a parameterized test harness that loads an isolated deliberately-miscategorized instance file and runs the full pipeline against it, because the pipeline currently loads all TTL files into a single graph before reasoning — a negative-case file in the graph would contaminate positive cases. Building this harness is Step 2 of the ADR-001 work plan. Noted in [docs/agent/bfo_cco_alignment_audit.md](docs/agent/bfo_cco_alignment_audit.md) §"Unresolved Engineering Problem." - **Dependency pinning.** The pipeline is verified only on: `rdflib==7.6.0`, `pyshacl==0.31.0`, `owlrl==7.1.4`. Upgrading any of these requires re-running the full regression suite. The `owlrl` pin is especially load-bearing: Gate 2 and Gate 3 use anonymous inverse property restrictions whose entailment behavior could change across reasoner versions. -- **`AnnexIII_Condition_1a cco:prescribes :RemoteBiometricIdentificationProcess`** remains in [ARCO_instances_sentinel.ttl](03_TECHNICAL_CORE/ontology/ARCO_instances_sentinel.ttl) line 25 as a class-as-individual triple retained for regulatory traceability. It does not affect current classification. It is a known blocker for a future CCO import (see §4). +- **`AnnexIII_Condition_1a cco:prescribes :RemoteBiometricIdentificationProcess`** remains in [ARCO_governance_extension.ttl](03_TECHNICAL_CORE/ontology/ARCO_governance_extension.ttl) (moved 2026-05-14 from the Sentinel instance file to the governance extension as universal regulatory content) as a class-as-individual triple retained for regulatory traceability. The companion 5(b) triple `:AnnexIII_Condition_5b cco:prescribes :CreditworthinessEvaluationProcess` is in the same file. Neither affects current classification. Both are known blockers for a future CCO import (see §4). - **Single-reasoner portability.** The anonymous inverse property expressions in Gate 2 and Gate 3 equivalentClass axioms have been empirically verified on `owlrl==7.1.4` only. Other OWL-RL reasoners may or may not materialize these the same way. - **No automated tracking of regulatory amendments.** Annex III categories and Article 6(3) derogation criteria may change via EU delegated acts. Updates to the ontology are manual. diff --git a/README.md b/README.md index 642f368..0ae4ebc 100644 --- a/README.md +++ b/README.md @@ -128,7 +128,7 @@ The architecture grounds in BFO 2020 (ISO/IEC 21838-2:2021) and uses the seven-b **Layer separation is verified by fixtures.** Two flag-test fixtures present cases where all three Annex III gates are satisfied AND an audit-layer flag (a provider-asserted `:DerogationClaim`, or a `:FraudDetectionProcess` token) is also present. The OWL classification fires regardless of the audit flag; the flag fires alongside the classification. Classification and audit do not bleed into each other. -**Gate independence is empirically verified.** A regression test removes the supporting triples for each Annex III 1(a) gate in turn and confirms the classification fails. Each gate is independently necessary; removing any one breaks the entailment. (Symmetric coverage for 5(b) is queued.) +**Gate independence is empirically verified.** A regression test removes the supporting triples for each Annex III 1(a) and 5(b) gate in turn and confirms the classification fails. Each gate is independently necessary in both categories; removing any one breaks the entailment. Content-mutation variants (wrong process type, wrong designation target) verify that the gates check content, not just existence. **The certificate's classification binds to graph queries.** Classification field and evidence path are bound to SPARQL queries against the reasoned graph; the contract lives in `03_TECHNICAL_CORE/scripts/output_manifest_v2.yaml`, enforced by `test_output_provenance.py` (failing-by-design). Tightening provenance labels across surrounding fields is active work ([`OPEN_PROBLEMS.md L4.4-L4.6`](OPEN_PROBLEMS.md), [`LIMITATIONS.md §7.5`](LIMITATIONS.md)). @@ -151,7 +151,7 @@ The three-gate pattern (capability + intended use + affected role) generalizes b | Replacing the kiosk demo's hypothetical vendor packet with a real vendor document, so the demo runs on actual source evidence rather than hypothetical content | Active work | | Publishing the full reasoning output (around 20,000 derived facts per run) plus the second reasoner's separate result per fixture, so anyone can independently check both the conclusions and that two different reasoners agree | Active work | | Labeling every certificate field with where its value came from (a graph query result, the run's metadata, or a scope-disclosure note) and adding a CI check that verifies every field traces back to its declared source | Active work | -| Extending the test that confirms each Gate is necessary from Annex III 1(a) to also cover 5(b), so both classifications have the same proof that none of the three gates is decorative | Active work | +| Extending the test that confirms each Gate is necessary from Annex III 1(a) to also cover 5(b), so both classifications have the same proof that none of the three gates is decorative | Landed 2026-05-14 | Day-to-day rows live in `OPEN_PROBLEMS.md` (internal); the public roadmap with verified core, resolved modeling decisions, and execution sequence is at [`docs/MODELING_ROADMAP.md`](docs/MODELING_ROADMAP.md). From d74e76a10d1301a2f13d8a21d2c2fce1543bb00a Mon Sep 17 00:00:00 2001 From: Alex Moskowitz Date: Thu, 14 May 2026 13:29:40 -0400 Subject: [PATCH 3/3] test(coverage): add 5(b) gate-removal symmetry and adversarial-mechanism assertions Two coverage gaps closed against README claims: - test_gate_removal.py: parameterized over both modeled Annex III categories. CATEGORY_1A (Sentinel) preserves the original 7 tests (5 gate removals + 2 content mutations) verbatim by triple; CATEGORY_5B (CreditScorer) adds the symmetric 7 against AnnexIII5bApplicableSystem. README "Gate independence is empirically verified" previously disclosed the 5(b) gap as queued; closes that. - test_adversarial_mechanism.py (new): asserts that DecoySystem_001's Gate 1 entailment routes through owl:equivalentClass propagation (the disposition is typed only as :WeirdScanner pre-reasoning; :BiometricIdentificationCapability is absent from the asserted triples and entailed post-reasoning), and that GhostSystem_001's disposition is a blank node (no named individual) that still satisfies owl:someValuesFrom. test_scenarios.py asserts the entailment fires; this test asserts HOW. - .github/workflows/arco-smoke-test.yml and arco-demo.yml: both workflows run the new test alongside the existing three regression tests. Pipeline behavior unchanged. test_output_provenance.py failure count unchanged at 1 (baseline). Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/arco-demo.yml | 11 +- .github/workflows/arco-smoke-test.yml | 3 + .../scripts/test_adversarial_mechanism.py | 237 ++++++++++++ .../scripts/test_gate_removal.py | 361 ++++++++++++------ 4 files changed, 484 insertions(+), 128 deletions(-) create mode 100644 03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py diff --git a/.github/workflows/arco-demo.yml b/.github/workflows/arco-demo.yml index ff74464..676909c 100644 --- a/.github/workflows/arco-demo.yml +++ b/.github/workflows/arco-demo.yml @@ -7,10 +7,11 @@ name: ARCO Demo Run # # What this gates (any failure fails the build): # - "ALL CHECKS PASSED" signal in the pipeline output. -# - Three regression test scripts return 0: -# * test_gate_removal.py (each Annex III 1(a) gate is independently necessary) +# - Four regression test scripts return 0: +# * test_gate_removal.py (each Annex III 1(a) and 5(b) gate is independently necessary) # * test_scenarios.py (multi-scenario classification correctness) # * test_kiosk_html_no_false_concretization.py (L4.7 regression) +# * test_adversarial_mechanism.py (decoy and ghost classification mechanism) # - Five expected artifact files exist in runs/demo/: # certificate.txt, summary.json, evidence.json, shacl_report.txt, # determination_view.html. @@ -95,6 +96,12 @@ jobs: set -euo pipefail python -u 03_TECHNICAL_CORE/scripts/test_kiosk_html_no_false_concretization.py + - name: Run adversarial-mechanism regression tests + shell: bash + run: | + set -euo pipefail + python -u 03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py + - name: Verify artifact files exist shell: bash run: | diff --git a/.github/workflows/arco-smoke-test.yml b/.github/workflows/arco-smoke-test.yml index 213e254..a98b296 100644 --- a/.github/workflows/arco-smoke-test.yml +++ b/.github/workflows/arco-smoke-test.yml @@ -46,3 +46,6 @@ jobs: - name: Run kiosk HTML no-false-concretization regression test (L4.7) run: python 03_TECHNICAL_CORE/scripts/test_kiosk_html_no_false_concretization.py + + - name: Run adversarial-mechanism regression tests + run: python 03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py diff --git a/03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py b/03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py new file mode 100644 index 0000000..4ce05c6 --- /dev/null +++ b/03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py @@ -0,0 +1,237 @@ +""" +Adversarial-mechanism regression tests. + +`test_scenarios.py` already asserts that DecoySystem_001 and GhostSystem_001 +classify correctly under OWL-RL. This file verifies HOW the classification +fires, not just that it fires. Each assertion catches a failure mode that +would let a pattern-matching pipeline pass test_scenarios.py with the wrong +mechanism in place. + + Decoy fixture (ARCO_instances_adversarial_decoy.ttl): + The disposition is typed only as :WeirdScanner. :WeirdScanner is declared + owl:equivalentClass :BiometricIdentificationCapability in the same fixture. + Pre-reasoning, no asserted triple types the disposition as + :BiometricIdentificationCapability. Post-reasoning under OWL-RL, + the equivalentClass propagation entails the typing and Gate 1 fires. + + If a pipeline did IRI-name pattern matching, it would see no + :BiometricIdentificationCapability triple in the input and miss the + classification. The OWL reasoner does not. + + Ghost fixture (ARCO_instances_adversarial_blanknode.ttl): + The disposition is an anonymous individual (blank node) typed as + :BiometricIdentificationCapability. owl:someValuesFrom requires + existence of an instance satisfying the restriction; it does not + require a named individual. + + If a pipeline required named individuals for evidence walks, it would + miss this case. The OWL reasoner does not require names; existential + quantification is satisfied by any witness, including blank nodes. + +Run from repo root: + python 03_TECHNICAL_CORE/scripts/test_adversarial_mechanism.py +""" + +from __future__ import annotations + +import sys +from pathlib import Path +from rdflib import Graph, Namespace, URIRef +from rdflib.term import BNode + +try: + import owlrl +except ImportError: + print("ERROR: owlrl is required. Install: pip install owlrl") + sys.exit(1) + +REPO_ROOT = Path(__file__).resolve().parents[2] +ONTOLOGY_DIR = REPO_ROOT / "03_TECHNICAL_CORE" / "ontology" + +BFO_2020 = ONTOLOGY_DIR / "imports" / "bfo-2020.owl" +IAO_BOT = ONTOLOGY_DIR / "imports" / "iao_bot.owl" +RO_BOT = ONTOLOGY_DIR / "imports" / "ro_bot.owl" +CCO_BOT = ONTOLOGY_DIR / "imports" / "cco_bot.owl" +CORE = ONTOLOGY_DIR / "ARCO_core.ttl" +GOV = ONTOLOGY_DIR / "ARCO_governance_extension.ttl" + +ARCO = Namespace("https://arco.ai/ontology/core#") +RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#") +RO = Namespace("http://purl.obolibrary.org/obo/RO_") + + +def parse_fixture(instances_path: Path) -> Graph: + """Parse the full ontology + fixture WITHOUT running the reasoner.""" + g = Graph() + for p in (BFO_2020, IAO_BOT, RO_BOT, CCO_BOT, CORE, GOV, instances_path): + if not p.exists(): + raise FileNotFoundError(f"Missing: {p}") + fmt = "xml" if p.suffix == ".owl" else "turtle" + g.parse(p.as_posix(), format=fmt) + return g + + +def reason(g: Graph) -> Graph: + owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(g) + return g + + +def test_decoy_classifies_via_equivalent_class() -> bool: + """ + Verify that DecoySystem_001's Gate 1 entailment fires via + owl:equivalentClass propagation, not via direct IRI assertion. + + Pre-reasoning expectation: + :Decoy_Disposition rdf:type :WeirdScanner present + :Decoy_Disposition rdf:type :BiometricIdentificationCapability ABSENT + + Post-reasoning expectation: + :Decoy_Disposition rdf:type :BiometricIdentificationCapability present + :DecoySystem_001 rdf:type :AnnexIII1aApplicableSystem present + """ + print("\n--- DECOY: classification via owl:equivalentClass ---") + fixture = ONTOLOGY_DIR / "ARCO_instances_adversarial_decoy.ttl" + g_pre = parse_fixture(fixture) + + decoy_disp = ARCO["Decoy_Disposition"] + weird_scanner = ARCO["WeirdScanner"] + bio_cap = ARCO["BiometricIdentificationCapability"] + decoy_system = ARCO["DecoySystem_001"] + annex_1a = ARCO["AnnexIII1aApplicableSystem"] + + # Precondition 1: disposition is typed as :WeirdScanner (the decoy class) + has_weird_pre = (decoy_disp, RDF["type"], weird_scanner) in g_pre + # Precondition 2: disposition is NOT directly typed as + # :BiometricIdentificationCapability before reasoning + has_bio_pre = (decoy_disp, RDF["type"], bio_cap) in g_pre + # Precondition 3: System is NOT yet classified as 1(a) applicable + is_annex_pre = (decoy_system, RDF["type"], annex_1a) in g_pre + + ok = True + print(f" pre-reasoning :Decoy_Disposition rdf:type :WeirdScanner: {has_weird_pre} (expected True)") + if not has_weird_pre: + ok = False + print(f" pre-reasoning :Decoy_Disposition rdf:type :BiometricIdentificationCapability: {has_bio_pre} (expected False)") + if has_bio_pre: + print(" FAIL: decoy disposition asserted directly as BiometricIdentificationCapability; " + "test premise (equivalence-only routing) is invalid.") + ok = False + print(f" pre-reasoning :DecoySystem_001 rdf:type :AnnexIII1aApplicableSystem: {is_annex_pre} (expected False)") + if is_annex_pre: + ok = False + + # Reason and re-check + g_post = reason(g_pre) + has_bio_post = (decoy_disp, RDF["type"], bio_cap) in g_post + is_annex_post = (decoy_system, RDF["type"], annex_1a) in g_post + + print(f" post-reasoning :Decoy_Disposition rdf:type :BiometricIdentificationCapability: {has_bio_post} (expected True)") + if not has_bio_post: + ok = False + print(f" post-reasoning :DecoySystem_001 rdf:type :AnnexIII1aApplicableSystem: {is_annex_post} (expected True)") + if not is_annex_post: + ok = False + + if ok: + print(" RESULT: classification routed via owl:equivalentClass, not via direct IRI.") + return ok + + +def test_ghost_classifies_via_blank_node_disposition() -> bool: + """ + Verify that GhostSystem_001's Gate 1 entailment fires via a blank-node + disposition satisfying owl:someValuesFrom, not via a named individual. + + Pre-reasoning expectation: + :Ghost_Module ro:0000091 _:b where _:b is a blank node typed + as :BiometricIdentificationCapability + + Post-reasoning expectation: + :GhostSystem_001 rdf:type :AnnexIII1aApplicableSystem present + """ + print("\n--- GHOST: classification via blank-node owl:someValuesFrom ---") + fixture = ONTOLOGY_DIR / "ARCO_instances_adversarial_blanknode.ttl" + g_pre = parse_fixture(fixture) + + ghost_module = ARCO["Ghost_Module"] + has_disposition = RO["0000091"] + bio_cap = ARCO["BiometricIdentificationCapability"] + ghost_system = ARCO["GhostSystem_001"] + annex_1a = ARCO["AnnexIII1aApplicableSystem"] + + # Find disposition objects of :Ghost_Module via ro:0000091 + disposition_objs = list(g_pre.objects(ghost_module, has_disposition)) + + ok = True + print(f" :Ghost_Module ro:0000091 ? -> {len(disposition_objs)} disposition object(s)") + if not disposition_objs: + print(" FAIL: no disposition object found") + return False + + # The disposition must be a blank node, not a named IRI + blank_node_dispositions = [d for d in disposition_objs if isinstance(d, BNode)] + named_dispositions = [d for d in disposition_objs if not isinstance(d, BNode)] + + print(f" blank-node dispositions: {len(blank_node_dispositions)} (expected >= 1)") + print(f" named dispositions: {len(named_dispositions)} (expected 0 for this fixture's test target)") + if not blank_node_dispositions: + print(" FAIL: no blank-node disposition; test premise (anonymous individual) is invalid.") + ok = False + if named_dispositions: + print(" FAIL: a named disposition is present; test premise (anonymous individual) is invalid.") + ok = False + + # The blank-node disposition must be typed as :BiometricIdentificationCapability + if blank_node_dispositions: + bn = blank_node_dispositions[0] + bn_typed = (bn, RDF["type"], bio_cap) in g_pre + print(f" blank-node typed as :BiometricIdentificationCapability: {bn_typed} (expected True)") + if not bn_typed: + ok = False + + # Pre-reasoning: system is not yet classified + is_annex_pre = (ghost_system, RDF["type"], annex_1a) in g_pre + print(f" pre-reasoning :GhostSystem_001 rdf:type :AnnexIII1aApplicableSystem: {is_annex_pre} (expected False)") + if is_annex_pre: + ok = False + + # Reason and re-check + g_post = reason(g_pre) + is_annex_post = (ghost_system, RDF["type"], annex_1a) in g_post + print(f" post-reasoning :GhostSystem_001 rdf:type :AnnexIII1aApplicableSystem: {is_annex_post} (expected True)") + if not is_annex_post: + ok = False + + if ok: + print(" RESULT: classification satisfied by blank-node owl:someValuesFrom witness.") + return ok + + +def main() -> int: + print("=" * 72) + print("ARCO ADVERSARIAL-MECHANISM TEST") + print("=" * 72) + + all_pass = True + if not test_decoy_classifies_via_equivalent_class(): + all_pass = False + if not test_ghost_classifies_via_blank_node_disposition(): + all_pass = False + + print() + print("=" * 72) + if all_pass: + print("ALL ADVERSARIAL-MECHANISM TESTS PASSED") + print("Decoy classification routes via owl:equivalentClass.") + print("Ghost classification routes via blank-node owl:someValuesFrom witness.") + print("Neither relies on a direct IRI assertion or a named individual.") + else: + print("SOME ADVERSARIAL-MECHANISM TESTS FAILED") + print("=" * 72) + return 1 + print("=" * 72) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/03_TECHNICAL_CORE/scripts/test_gate_removal.py b/03_TECHNICAL_CORE/scripts/test_gate_removal.py index 7f409db..ddc70f5 100644 --- a/03_TECHNICAL_CORE/scripts/test_gate_removal.py +++ b/03_TECHNICAL_CORE/scripts/test_gate_removal.py @@ -1,9 +1,13 @@ """ -Gate-removal regression test for the three-gate Annex III 1(a) classification. +Gate-removal regression test for the three-gate Annex III classification. -Removes each gate's key triple independently, re-reasons, and confirms -AnnexIII1aApplicableSystem disappears. Also verifies that HighRiskSystem -(capability-only axiom) is unaffected by gates 2 and 3. +For each modeled Annex III category (1(a) and 5(b)), removes each gate's key +triple independently, re-reasons, and confirms the category's applicable-system +class disappears. Also verifies that HighRiskSystem (capability-only axiom) is +unaffected by gates 2 and 3. + +Symmetric coverage across categories verifies that the three-gate pattern +generalizes; no gate in either category is decorative. """ from __future__ import annotations @@ -23,7 +27,6 @@ CORE = ONTOLOGY_DIR / "ARCO_core.ttl" GOV = ONTOLOGY_DIR / "ARCO_governance_extension.ttl" -INSTANCES = ONTOLOGY_DIR / "ARCO_instances_sentinel.ttl" ARCO = Namespace("https://arco.ai/ontology/core#") RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#") @@ -31,106 +34,190 @@ RO = Namespace("http://purl.obolibrary.org/obo/RO_") CCO = Namespace("http://www.ontologyrepository.com/CommonCoreOntologies/") -# The triple removals that knock out each gate -GATE_REMOVALS = { - "gate1_capability": ( - ARCO["Sentinel_FaceID_Module"], - RO["0000091"], # has_disposition - ARCO["Sentinel_FaceID_Disposition"], - ), - "gate2_intended_use": ( - ARCO["Sentinel_IntendedUse_001"], - IAO["0000136"], # is_about - ARCO["Sentinel_ID_System"], - ), - "gate3_use_scenario": ( - ARCO["Sentinel_UseScenario_001"], - IAO["0000136"], # is_about - ARCO["Sentinel_ID_System"], - ), - # Content-based gate failures: - # Gate 2 must require cco:prescribes an instance of the regulated process class, not just - # existence of IUS. Triple references the token individual, not the class IRI. - "gate2_prescribes_removed": ( - ARCO["Sentinel_IntendedUse_001"], - CCO["prescribes"], # cco:prescribes - ARCO["Sentinel_RBIP_Process"], # the typed process token (not the class IRI) - ), - # Gate 3 must require cco:designates :NaturalPersonRole. The use scenario - # spec designates the affected role universal (class-level designation via - # the typed CCO designation property; same shape as Gate 2 with cco:prescribes). - "gate3_missing_role": ( - ARCO["Sentinel_UseScenario_001"], - CCO["designates"], - ARCO["NaturalPersonRole"], # the role universal as designation target - ), -} - -# Expected entailment results after each gate removal -EXPECTED = { - "gate1_capability": { - "AnnexIII1aApplicableSystem": False, - "HighRiskSystem": False, # capability-only axiom also breaks - }, - "gate2_intended_use": { - "AnnexIII1aApplicableSystem": False, - "HighRiskSystem": True, # HighRiskSystem only needs capability - }, - "gate3_use_scenario": { - "AnnexIII1aApplicableSystem": False, - "HighRiskSystem": True, # HighRiskSystem only needs capability - }, - "gate2_prescribes_removed": { - "AnnexIII1aApplicableSystem": False, # prescribes removed -> Gate 2 fails - "HighRiskSystem": True, # capability unchanged - }, - "gate3_missing_role": { - "AnnexIII1aApplicableSystem": False, # NaturalPersonRole removed -> Gate 3 fails - "HighRiskSystem": True, # capability unchanged - }, -} - -# Mutation tests: remove one triple and add a replacement (wrong value). -# These verify that wrong content, not just absence, also breaks the gate. -GATE_MUTATIONS = { - "gate2_wrong_process_type": { - "remove": ( +# --------------------------------------------------------------------------- +# Annex III 1(a) — Sentinel fixture +# --------------------------------------------------------------------------- +CATEGORY_1A = { + "label": "Annex III 1(a) — Sentinel (Biometric Identification)", + "instances": ONTOLOGY_DIR / "ARCO_instances_sentinel.ttl", + "system": ARCO["Sentinel_ID_System"], + "applicable_class": ARCO["AnnexIII1aApplicableSystem"], + "gate_removals": { + "gate1_capability": ( + ARCO["Sentinel_FaceID_Module"], + RO["0000091"], # has_disposition + ARCO["Sentinel_FaceID_Disposition"], + ), + "gate2_intended_use": ( ARCO["Sentinel_IntendedUse_001"], - CCO["prescribes"], - ARCO["Sentinel_RBIP_Process"], # the typed process token + IAO["0000136"], # is_about + ARCO["Sentinel_ID_System"], ), - "add": ( + "gate3_use_scenario": ( + ARCO["Sentinel_UseScenario_001"], + IAO["0000136"], # is_about + ARCO["Sentinel_ID_System"], + ), + # Content-based gate failures + "gate2_prescribes_removed": ( ARCO["Sentinel_IntendedUse_001"], CCO["prescribes"], - ARCO["SomeOtherProcess"], # wrong process type + ARCO["Sentinel_RBIP_Process"], # the typed process token ), - "expected": { - "AnnexIII1aApplicableSystem": False, # Gate 2 fails: wrong process - "HighRiskSystem": True, # capability unchanged - }, - }, - "gate3_wrong_designation_target": { - "remove": ( + "gate3_missing_role": ( ARCO["Sentinel_UseScenario_001"], CCO["designates"], - ARCO["NaturalPersonRole"], # the regulated role universal + ARCO["NaturalPersonRole"], ), - "add": ( - ARCO["Sentinel_UseScenario_001"], + }, + "expected": { + "gate1_capability": { + "applicable": False, + "HighRiskSystem": False, # capability-only axiom also breaks + }, + "gate2_intended_use": { + "applicable": False, + "HighRiskSystem": True, # HighRiskSystem only needs capability + }, + "gate3_use_scenario": { + "applicable": False, + "HighRiskSystem": True, + }, + "gate2_prescribes_removed": { + "applicable": False, + "HighRiskSystem": True, + }, + "gate3_missing_role": { + "applicable": False, + "HighRiskSystem": True, + }, + }, + "gate_mutations": { + "gate2_wrong_process_type": { + "remove": ( + ARCO["Sentinel_IntendedUse_001"], + CCO["prescribes"], + ARCO["Sentinel_RBIP_Process"], + ), + "add": ( + ARCO["Sentinel_IntendedUse_001"], + CCO["prescribes"], + ARCO["SomeOtherProcess"], + ), + "expected": {"applicable": False, "HighRiskSystem": True}, + }, + "gate3_wrong_designation_target": { + "remove": ( + ARCO["Sentinel_UseScenario_001"], + CCO["designates"], + ARCO["NaturalPersonRole"], + ), + "add": ( + ARCO["Sentinel_UseScenario_001"], + CCO["designates"], + ARCO["SomeOtherRole"], + ), + "expected": {"applicable": False, "HighRiskSystem": True}, + }, + }, +} + +# --------------------------------------------------------------------------- +# Annex III 5(b) — CreditScorer fixture +# Symmetric coverage with the 1(a) block above. Each gate-removal verifies a +# gate that is independently necessary for AnnexIII5bApplicableSystem. +# --------------------------------------------------------------------------- +CATEGORY_5B = { + "label": "Annex III 5(b) — CreditScorer (Creditworthiness Evaluation)", + "instances": ONTOLOGY_DIR / "ARCO_instances_creditscoring.ttl", + "system": ARCO["CreditScorer_001"], + "applicable_class": ARCO["AnnexIII5bApplicableSystem"], + "gate_removals": { + "gate1_capability": ( + ARCO["CreditScorer_Processing_Module"], + RO["0000091"], # has_disposition + ARCO["CreditScorer_Eval_Disposition"], + ), + "gate2_intended_use": ( + ARCO["CreditScorer_IntendedUse_001"], + IAO["0000136"], # is_about + ARCO["CreditScorer_001"], + ), + "gate3_use_scenario": ( + ARCO["CreditScorer_UseScenario_001"], + IAO["0000136"], # is_about + ARCO["CreditScorer_001"], + ), + # Content-based gate failures + "gate2_prescribes_removed": ( + ARCO["CreditScorer_IntendedUse_001"], + CCO["prescribes"], + ARCO["CreditScorer_EvalProcess_Token"], + ), + "gate3_missing_role": ( + ARCO["CreditScorer_UseScenario_001"], CCO["designates"], - ARCO["SomeOtherRole"], # wrong target (not the regulated role) + ARCO["NaturalPersonRole"], ), - "expected": { - "AnnexIII1aApplicableSystem": False, # Gate 3 fails: wrong designation target - "HighRiskSystem": True, # capability unchanged + }, + "expected": { + "gate1_capability": { + "applicable": False, + "HighRiskSystem": False, + }, + "gate2_intended_use": { + "applicable": False, + "HighRiskSystem": True, + }, + "gate3_use_scenario": { + "applicable": False, + "HighRiskSystem": True, + }, + "gate2_prescribes_removed": { + "applicable": False, + "HighRiskSystem": True, + }, + "gate3_missing_role": { + "applicable": False, + "HighRiskSystem": True, + }, + }, + "gate_mutations": { + "gate2_wrong_process_type": { + "remove": ( + ARCO["CreditScorer_IntendedUse_001"], + CCO["prescribes"], + ARCO["CreditScorer_EvalProcess_Token"], + ), + "add": ( + ARCO["CreditScorer_IntendedUse_001"], + CCO["prescribes"], + ARCO["SomeOtherProcess"], + ), + "expected": {"applicable": False, "HighRiskSystem": True}, + }, + "gate3_wrong_designation_target": { + "remove": ( + ARCO["CreditScorer_UseScenario_001"], + CCO["designates"], + ARCO["NaturalPersonRole"], + ), + "add": ( + ARCO["CreditScorer_UseScenario_001"], + CCO["designates"], + ARCO["SomeOtherRole"], + ), + "expected": {"applicable": False, "HighRiskSystem": True}, }, }, } +CATEGORIES = [CATEGORY_1A, CATEGORY_5B] -def load_graph() -> Graph: + +def load_graph(instances_path: Path) -> Graph: g = Graph() - for p in (CORE, GOV, INSTANCES): + for p in (CORE, GOV, instances_path): if not p.exists(): raise FileNotFoundError(f"Missing: {p}") g.parse(p.as_posix(), format="turtle") @@ -146,9 +233,9 @@ def check_type(g: Graph, individual: URIRef, cls: URIRef) -> bool: return (individual, RDF["type"], cls) in g -def run_mutation_test(gate_name: str, mutation: dict) -> dict: +def run_mutation_test(category: dict, gate_name: str, mutation: dict) -> dict: """Remove one triple, add a replacement with wrong content, reason, check entailments.""" - g = load_graph() + g = load_graph(category["instances"]) remove_triple = mutation["remove"] add_triple = mutation["add"] @@ -160,86 +247,93 @@ def run_mutation_test(gate_name: str, mutation: dict) -> dict: g.add(add_triple) reason(g) - system = ARCO["Sentinel_ID_System"] + system = category["system"] return { "gate": gate_name, "mutated": f"replaced <{o}> with <{add_triple[2]}>", - "AnnexIII1aApplicableSystem": check_type(g, system, ARCO["AnnexIII1aApplicableSystem"]), + "applicable": check_type(g, system, category["applicable_class"]), "HighRiskSystem": check_type(g, system, ARCO["HighRiskSystem"]), } -def run_test(gate_name: str, triple_to_remove: tuple) -> dict: +def run_test(category: dict, gate_name: str, triple_to_remove: tuple) -> dict: """Remove one triple, reason, check entailments.""" - g = load_graph() + g = load_graph(category["instances"]) s, p, o = triple_to_remove - # Verify the triple exists before removing if (s, p, o) not in g: return {"gate": gate_name, "error": f"Triple not found: ({s}, {p}, {o})"} g.remove((s, p, o)) reason(g) - system = ARCO["Sentinel_ID_System"] - results = { + system = category["system"] + return { "gate": gate_name, "removed": f"<{s}> <{p}> <{o}>", - "AnnexIII1aApplicableSystem": check_type(g, system, ARCO["AnnexIII1aApplicableSystem"]), + "applicable": check_type(g, system, category["applicable_class"]), "HighRiskSystem": check_type(g, system, ARCO["HighRiskSystem"]), } - return results -def main() -> None: - print("=" * 72) - print("ARCO GATE-REMOVAL REGRESSION TEST") +def run_category(category: dict) -> bool: + """Run baseline + gate-removal + mutation tests for one Annex III category. + + Returns True iff every assertion in this category passes. + """ + label = category["label"] + applicable_name = category["applicable_class"].split("#")[-1] + + print("\n" + "=" * 72) + print(label) print("=" * 72) - # Baseline: full graph should have both entailments + # Baseline print("\n--- BASELINE (all gates present) ---") - g_full = load_graph() + g_full = load_graph(category["instances"]) initial = len(g_full) reason(g_full) - system = ARCO["Sentinel_ID_System"] + system = category["system"] - annex_ok = check_type(g_full, system, ARCO["AnnexIII1aApplicableSystem"]) + applicable_ok = check_type(g_full, system, category["applicable_class"]) hr_ok = check_type(g_full, system, ARCO["HighRiskSystem"]) print(f" Triples: {initial} -> {len(g_full)}") - print(f" AnnexIII1aApplicableSystem: {annex_ok}") + print(f" {applicable_name}: {applicable_ok}") print(f" HighRiskSystem: {hr_ok}") - if not annex_ok or not hr_ok: + if not applicable_ok or not hr_ok: print("\nFAIL: Baseline entailments missing. Cannot test gate removal.") - sys.exit(1) + return False print(" Baseline: OK") - # Gate removal tests all_pass = True - for gate_name, triple in GATE_REMOVALS.items(): + + # Gate removal tests + for gate_name, triple in category["gate_removals"].items(): print(f"\n--- {gate_name.upper()} ---") - result = run_test(gate_name, triple) + result = run_test(category, gate_name, triple) if "error" in result: print(f" ERROR: {result['error']}") all_pass = False continue - expected = EXPECTED[gate_name] - for cls_name, expected_val in expected.items(): - actual = result[cls_name] + expected = category["expected"][gate_name] + for key, expected_val in expected.items(): + cls_label = applicable_name if key == "applicable" else key + actual = result[key] status = "OK" if actual == expected_val else "FAIL" if status == "FAIL": all_pass = False - print(f" {cls_name}: {actual} (expected {expected_val}) [{status}]") + print(f" {cls_label}: {actual} (expected {expected_val}) [{status}]") - # Mutation tests: wrong content (not just absence) also breaks the gate + # Mutation tests print("\n--- CONTENT-MUTATION TESTS ---") - for gate_name, mutation in GATE_MUTATIONS.items(): + for gate_name, mutation in category["gate_mutations"].items(): print(f"\n--- {gate_name.upper()} ---") - result = run_mutation_test(gate_name, mutation) + result = run_mutation_test(category, gate_name, mutation) if "error" in result: print(f" ERROR: {result['error']}") @@ -248,19 +342,34 @@ def main() -> None: print(f" Mutation: {result['mutated']}") expected = mutation["expected"] - for cls_name, expected_val in expected.items(): - actual = result[cls_name] + for key, expected_val in expected.items(): + cls_label = applicable_name if key == "applicable" else key + actual = result[key] status = "OK" if actual == expected_val else "FAIL" if status == "FAIL": all_pass = False - print(f" {cls_name}: {actual} (expected {expected_val}) [{status}]") + print(f" {cls_label}: {actual} (expected {expected_val}) [{status}]") + + return all_pass + + +def main() -> None: + print("=" * 72) + print("ARCO GATE-REMOVAL REGRESSION TEST") + print("=" * 72) + + all_pass = True + for category in CATEGORIES: + if not run_category(category): + all_pass = False - # Summary print("\n" + "=" * 72) if all_pass: print("ALL GATE-REMOVAL TESTS PASSED") - print("Each gate is independently necessary for AnnexIII1aApplicableSystem.") - print("Wrong content (not just absence) also breaks the gate.") + print("Each gate is independently necessary for AnnexIII1aApplicableSystem") + print("and AnnexIII5bApplicableSystem. Wrong content (not just absence) also") + print("breaks the gate. Coverage is symmetric across the two modeled Annex III") + print("categories.") else: print("SOME GATE-REMOVAL TESTS FAILED") sys.exit(1)