Trust-Aware Multi-Signal Document Retrieval
Graph-Based Compliance Scoring | Gap Attribution | EU AI Act Ready
Standard RAG fails for regulatory AI. Vector similarity treats "shall ensure compliance" and "may consider compliance" as identical. Scores are opaque. Knowledge is ephemeral. There's no audit trail.
TAMR+ fixes this. A three-stage pipeline where 65% of retrieval scoring comes from structural signals, not vector similarity. Every score is explained. Every gap is attributed. Every response is auditable.
| Component | Description | Status |
|---|---|---|
| TRACE Scoring | 5-dimension compliance scoring mapped to EU AI Act articles | Spec + Formulas |
| Gap Attribution | 5-category taxonomy decomposing score gaps into actionable causes | Spec + Examples |
| EU-RegQA-100 | 100 regulatory questions across 5 difficulty tiers | Benchmark |
| MedRegQA-50 | 50 medical device regulation questions | Benchmark |
| FinRegQA-50 | 50 financial services regulation questions | Benchmark |
| CrimNet-50 | 50 law enforcement regulation questions | Benchmark |
| HashGNN | Training-free graph embeddings via MinHash (pure NumPy) | Reference Impl |
| Paper | arXiv preprint (v2.3) | PDF + LaTeX |
Pipeline: 207ms avg latency | $0.03/workspace | Zero LLM calls during retrieval
| System | EU-RegQA | MedRegQA | FinRegQA | CrimNet | Avg |
|---|---|---|---|---|---|
| TAMR+ v2.3 (3-hop) | 0.74 | 0.69 | 0.66 | 0.63 | 0.680 |
| TAMR+ v2.3 (1-hop) | 0.67 | 0.63 | 0.61 | 0.59 | 0.625 |
| GraphCompliance | 0.554 | --- | --- | --- | 0.554 |
| Vector-only RAG | 0.41 | 0.38 | 0.39 | 0.36 | 0.385 |
Ablation: Removing any single component degrades performance by 6-27%. Vector-only scores 38.8% below the full pipeline (p<0.001).
Query
|
v
[Stage 1: Document Manifest Selector] ~10ms, zero LLM
| 5 deterministic signals
v
[Stage 2: Multi-Phase Retrieval] ~275ms
| P1: Vector ANN (35%)
| P2: KG Alignment (30%)
| P3: Causal Density (10%)
| P4: Marginal Selection (15%)
| P5: SHA-256 Lineage (-10% redundancy)
v
[Stage 3: TRACE + Gap Attribution]
| T: Transparency (Art. 13, 50)
| R: Reasoning (Art. 15)
| A: Auditability (Art. 51)
| C: Compliance (Art. 9, 14, 26)
| E: Explainability (Art. 13)
v
Score + Gap Attribution + Confidence Tier
Every response gets a deterministic score mapped to EU AI Act articles:
TRACE = (T + R + A + C + E) / 5 # Each dimension in [0, 1]| Tier | Score | Meaning | Action |
|---|---|---|---|
| GREEN | >= 0.76 | Very high | Autonomous |
| BLUE | 0.66-0.75 | High | Optional review |
| YELLOW | 0.50-0.65 | Moderate | Mandatory review |
| RED | 0.20-0.49 | Low | Expert review |
| GRAY | < 0.20 | Insufficient | Blocked |
The gap is the feature: A 67% score with full gap attribution (SCG 42%, PKC 28%, DLT 8%, ADG 12%, FSC 10%) tells you exactly what to fix.
- Link Prediction for Gap Detection — Graph-based regulatory gap prediction (AUC-ROC 0.847)
- Multi-Signal Scoring — 65% structural signals, ablation-validated
- HashGNN Embeddings — 128-dim via MinHash, no GPU, no training, pure NumPy
- Cross-Domain Benchmarks — 250 questions across 4 regulatory domains
- Multi-Hop Traversal — Decay-weighted scoring, entity coverage 63.6% to 84.1%
- Cypher-Native GraphRAG — Single-query vector + graph retrieval
import json
# Load EU-RegQA-100
with open("benchmarks/eu-regqa-100/eu_regqa_100.json") as f:
questions = json.load(f)
# Evaluate your RAG system
for q in questions:
response = your_system.query(q["question"])
# Score against ground truth using TRACE methodologySee trace-scoring/spec.md for the complete specification. All formulas are deterministic and can be implemented in any language.
from hashgnn import HashGNN
model = HashGNN(dim=128, metapaths=4, rounds=3)
embeddings = model.fit_transform(knowledge_graph)
# embeddings: {node_id: np.array([0,1,0,1,...], dtype=bool)}We report production scores (60-74%) alongside the 97% theoretical ceiling. The 20+ percentage point gap is not hidden but analyzed, explained, and attributed:
- Source Coverage Gap (42%): Small workspace (4 docs). Fix: add more documents.
- Parametric Knowledge Cost (28%): LLM fills gaps. Fix: domain-specific sources.
- Domain Language Tax (8%): Regulatory vocabulary. Fix: glossary expansion.
- Attribution Density Gap (12%): Formatting over evidence. Fix: citation improvements.
- Structural Ceiling (10%): Irreducible (3% system-wide). Disclosed per Art. 13.
This repo contains research artifacts and a reference implementation. The production TAMR+ system (tracegov.ai) includes proprietary components not released here:
- Production pipeline source code and deployment infrastructure
- Neo4j Cypher query templates and graph schema
- Regex-based document classification rules
- Causal density computation internals
- SHA-256 lineage chain implementation
- Tier routing and escalation logic
See NOTICE for full details. Methods are protected by European Patent Applications EP26162901.8 and EP26166054.2.
| Feature | TAMR+ | RAGAS | DeepEval | COMPL-AI | GraphCompliance |
|---|---|---|---|---|---|
| Gap attribution | 5 categories | No | No | No | No |
| Predictive gaps | Yes | No | No | No | No |
| Formula-based (no ML) | Yes | No | No | Partial | Partial |
| EU AI Act mapping | 8/8 articles | 0/8 | 0/8 | 3/8 | 0/8 |
| Cross-domain | 4 domains | N/A | N/A | 1 | 1 |
| Audit trail | Yes (Art. 51) | No | No | No | No |
| Production deployed | Yes | N/A | N/A | No | No |
@article{kumar2026tamrplus,
title={TAMR+: Trust-Aware Multi-Signal Document Retrieval with
Graph-Based Compliance Scoring and Gap Attribution
for Regulatory AI Systems},
author={Kumar, Harish},
journal={SSRN Electronic Journal},
year={2026},
note={European Patent Applications EP26162901.8 and EP26166054.2}
}- Paper: SSRN 6359818
- Production: tracegov.ai
- Patents: EP26162901.8 (filed 2026-03-06) and EP26166054.2 (filed 2026-03-19)
- Company: Quantamix Solutions B.V.
Apache 2.0 — See LICENSE.
The methods are covered by European Patent Applications EP26162901.8 and EP26166054.2. The Apache 2.0 license includes a patent grant for use of the open-source components.
Built by Quantamix Solutions B.V. | Uithoorn, The Netherlands