Skip to content

quantamixsol/tamr-plus

Repository files navigation

TAMR+

Trust-Aware Multi-Signal Document Retrieval
Graph-Based Compliance Scoring | Gap Attribution | EU AI Act Ready

SSRN License Production Patent Patent DIV EU AI Act


Standard RAG fails for regulatory AI. Vector similarity treats "shall ensure compliance" and "may consider compliance" as identical. Scores are opaque. Knowledge is ephemeral. There's no audit trail.

TAMR+ fixes this. A three-stage pipeline where 65% of retrieval scoring comes from structural signals, not vector similarity. Every score is explained. Every gap is attributed. Every response is auditable.

What's Inside

Component Description Status
TRACE Scoring 5-dimension compliance scoring mapped to EU AI Act articles Spec + Formulas
Gap Attribution 5-category taxonomy decomposing score gaps into actionable causes Spec + Examples
EU-RegQA-100 100 regulatory questions across 5 difficulty tiers Benchmark
MedRegQA-50 50 medical device regulation questions Benchmark
FinRegQA-50 50 financial services regulation questions Benchmark
CrimNet-50 50 law enforcement regulation questions Benchmark
HashGNN Training-free graph embeddings via MinHash (pure NumPy) Reference Impl
Paper arXiv preprint (v2.3) PDF + LaTeX

Key Results

Pipeline: 207ms avg latency | $0.03/workspace | Zero LLM calls during retrieval
System EU-RegQA MedRegQA FinRegQA CrimNet Avg
TAMR+ v2.3 (3-hop) 0.74 0.69 0.66 0.63 0.680
TAMR+ v2.3 (1-hop) 0.67 0.63 0.61 0.59 0.625
GraphCompliance 0.554 --- --- --- 0.554
Vector-only RAG 0.41 0.38 0.39 0.36 0.385

Ablation: Removing any single component degrades performance by 6-27%. Vector-only scores 38.8% below the full pipeline (p<0.001).

Architecture

Query
  |
  v
[Stage 1: Document Manifest Selector]     ~10ms, zero LLM
  | 5 deterministic signals
  v
[Stage 2: Multi-Phase Retrieval]           ~275ms
  | P1: Vector ANN (35%)
  | P2: KG Alignment (30%)
  | P3: Causal Density (10%)
  | P4: Marginal Selection (15%)
  | P5: SHA-256 Lineage (-10% redundancy)
  v
[Stage 3: TRACE + Gap Attribution]
  | T: Transparency (Art. 13, 50)
  | R: Reasoning (Art. 15)
  | A: Auditability (Art. 51)
  | C: Compliance (Art. 9, 14, 26)
  | E: Explainability (Art. 13)
  v
Score + Gap Attribution + Confidence Tier

TRACE Scoring

Every response gets a deterministic score mapped to EU AI Act articles:

TRACE = (T + R + A + C + E) / 5  # Each dimension in [0, 1]
Tier Score Meaning Action
GREEN >= 0.76 Very high Autonomous
BLUE 0.66-0.75 High Optional review
YELLOW 0.50-0.65 Moderate Mandatory review
RED 0.20-0.49 Low Expert review
GRAY < 0.20 Insufficient Blocked

The gap is the feature: A 67% score with full gap attribution (SCG 42%, PKC 28%, DLT 8%, ADG 12%, FSC 10%) tells you exactly what to fix.

Technical Innovations (6 Groups, 18 Patent Claims)

  1. Link Prediction for Gap Detection — Graph-based regulatory gap prediction (AUC-ROC 0.847)
  2. Multi-Signal Scoring — 65% structural signals, ablation-validated
  3. HashGNN Embeddings — 128-dim via MinHash, no GPU, no training, pure NumPy
  4. Cross-Domain Benchmarks — 250 questions across 4 regulatory domains
  5. Multi-Hop Traversal — Decay-weighted scoring, entity coverage 63.6% to 84.1%
  6. Cypher-Native GraphRAG — Single-query vector + graph retrieval

Quick Start

Use the Benchmarks

import json

# Load EU-RegQA-100
with open("benchmarks/eu-regqa-100/eu_regqa_100.json") as f:
    questions = json.load(f)

# Evaluate your RAG system
for q in questions:
    response = your_system.query(q["question"])
    # Score against ground truth using TRACE methodology

Implement TRACE Scoring

See trace-scoring/spec.md for the complete specification. All formulas are deterministic and can be implemented in any language.

Run HashGNN

from hashgnn import HashGNN

model = HashGNN(dim=128, metapaths=4, rounds=3)
embeddings = model.fit_transform(knowledge_graph)
# embeddings: {node_id: np.array([0,1,0,1,...], dtype=bool)}

Honest Disclosure

We report production scores (60-74%) alongside the 97% theoretical ceiling. The 20+ percentage point gap is not hidden but analyzed, explained, and attributed:

  • Source Coverage Gap (42%): Small workspace (4 docs). Fix: add more documents.
  • Parametric Knowledge Cost (28%): LLM fills gaps. Fix: domain-specific sources.
  • Domain Language Tax (8%): Regulatory vocabulary. Fix: glossary expansion.
  • Attribution Density Gap (12%): Formatting over evidence. Fix: citation improvements.
  • Structural Ceiling (10%): Irreducible (3% system-wide). Disclosed per Art. 13.

What's NOT Included (Proprietary)

This repo contains research artifacts and a reference implementation. The production TAMR+ system (tracegov.ai) includes proprietary components not released here:

  • Production pipeline source code and deployment infrastructure
  • Neo4j Cypher query templates and graph schema
  • Regex-based document classification rules
  • Causal density computation internals
  • SHA-256 lineage chain implementation
  • Tier routing and escalation logic

See NOTICE for full details. Methods are protected by European Patent Applications EP26162901.8 and EP26166054.2.

Comparison with Existing Frameworks

Feature TAMR+ RAGAS DeepEval COMPL-AI GraphCompliance
Gap attribution 5 categories No No No No
Predictive gaps Yes No No No No
Formula-based (no ML) Yes No No Partial Partial
EU AI Act mapping 8/8 articles 0/8 0/8 3/8 0/8
Cross-domain 4 domains N/A N/A 1 1
Audit trail Yes (Art. 51) No No No No
Production deployed Yes N/A N/A No No

Citation

@article{kumar2026tamrplus,
  title={TAMR+: Trust-Aware Multi-Signal Document Retrieval with
         Graph-Based Compliance Scoring and Gap Attribution
         for Regulatory AI Systems},
  author={Kumar, Harish},
  journal={SSRN Electronic Journal},
  year={2026},
  note={European Patent Applications EP26162901.8 and EP26166054.2}
}

Links

License

Apache 2.0 — See LICENSE.

The methods are covered by European Patent Applications EP26162901.8 and EP26166054.2. The Apache 2.0 license includes a patent grant for use of the open-source components.


Built by Quantamix Solutions B.V. | Uithoorn, The Netherlands