feat: hybrid evidence engine — word-number amounts, time/id matching,… by AuvroIslam · Pull Request #1 · AuvroIslam/SustHackathonPreli

AuvroIslam · 2026-06-26T16:58:24Z

… guarded LLM case_type

Evidence reasoning (the 35-pt core):

normalize: parse word-number amounts ("five thousand", "panch hajar", "দুই হাজার"), scale words (5k / ২ লাখ), time-of-day + clock hour, and referenced merchant/agent/biller ids
matcher: disambiguate same-amount transfers by time-of-day / id; still returns null (insufficient_data) when nothing cleanly separates candidates
classifier: confidence-aware classify(); tighter phishing precision (strong vs weak signals); broader Banglish/English keyword coverage

Guarded LLM case_type reclassification (hybrid tail):

app/llm/classify.py: LLM picks case_type from the fixed enum ONLY for low-confidence cases with a concrete signal; validated against the enum, confidence-capped, falls back to the rule answer on any error
never touches relevant_transaction_id / verdict / severity / safety; does not fire on vague no-signal complaints (stay 'other')

Safety: expand the unauthorized-promise denylist (conversational phrasings)

Observability: per-decision audit-trail log + case_type/verdict counters

Docs: rewrite README (requirement→solution table, decision/path notes, $0-cost story, colorful Mermaid diagrams); fix stale test counts; PRD §4.1 now matches the implemented matcher

Tests: 102 passing (was 82) incl. word-amount, time/id disambiguation, and monkeypatched LLM-reclassification guardrail tests; 10/10 samples exact with LLM on and off

… guarded LLM case_type Evidence reasoning (the 35-pt core): - normalize: parse word-number amounts ("five thousand", "panch hajar", "দুই হাজার"), scale words (5k / ২ লাখ), time-of-day + clock hour, and referenced merchant/agent/biller ids - matcher: disambiguate same-amount transfers by time-of-day / id; still returns null (insufficient_data) when nothing cleanly separates candidates - classifier: confidence-aware classify(); tighter phishing precision (strong vs weak signals); broader Banglish/English keyword coverage Guarded LLM case_type reclassification (hybrid tail): - app/llm/classify.py: LLM picks case_type from the fixed enum ONLY for low-confidence cases with a concrete signal; validated against the enum, confidence-capped, falls back to the rule answer on any error - never touches relevant_transaction_id / verdict / severity / safety; does not fire on vague no-signal complaints (stay 'other') Safety: expand the unauthorized-promise denylist (conversational phrasings) Observability: per-decision audit-trail log + case_type/verdict counters Docs: rewrite README (requirement→solution table, decision/path notes, $0-cost story, colorful Mermaid diagrams); fix stale test counts; PRD §4.1 now matches the implemented matcher Tests: 102 passing (was 82) incl. word-amount, time/id disambiguation, and monkeypatched LLM-reclassification guardrail tests; 10/10 samples exact with LLM on and off

Copilot

Pull request overview

This PR extends the deterministic evidence engine to better ground complaints to real transactions by extracting richer complaint signals (scaled/word amounts, time-of-day, referenced ids) and using those signals to disambiguate same-amount candidates. It also adds confidence-aware classification plus an optional, rules-gated LLM case_type reclassification path, along with expanded safety promise detection, observability counters/audit logging, updated docs, and new tests.

Changes:

Add normalization for scaled + word-number amounts, time/day hints, and merchant/agent/biller id references; use id/time signals to disambiguate same-amount matches.
Introduce confidence-aware deterministic classification and an optional LLM case_type reclassification helper (enum-validated + timeout-capped), with corresponding tests.
Expand safety unauthorized-promise patterns; add metrics + audit-trail logging; refresh README/PRD and update test counts.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`app/engine/normalize.py`	Adds extraction of scaled/word amounts, time signals, and referenced ids into `Normalized`.
`app/engine/matcher.py`	Adds id/time-based disambiguation logic and allows id as a fallback candidate signal.
`app/engine/classifier.py`	Adds `classify()` returning `(CaseType, confidence)` and refines keyword groups.
`app/llm/classify.py`	New: enum-validated LLM `case_type` classifier used for low-confidence cases.
`app/pipeline.py`	Integrates confidence-aware classify + optional LLM reclassification; adds counters and audit-trail logging.
`app/engine/safety.py`	Expands unauthorized promise denylist patterns.
`app/config.py`	Adds settings knobs for LLM `case_type` reclassification thresholds/timeouts.
`.env.example`	Documents new env vars for LLM reclassification configuration.
`tests/test_matcher.py`	Adds tests for scaled/word amounts, time extraction, and id/time disambiguation behavior.
`tests/test_classifier.py`	Adds tests validating confidence levels for strong/weak signals.
`tests/test_llm_reclassify.py`	New: offline monkeypatched tests for the LLM reclassification guardrails.
`README.md`	Updates architecture/docs, test counts, and adds diagrams and requirement→solution mapping.
`prd.md`	Updates matcher specification to reflect the implemented id/time disambiguation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        scored = [(t, _hour_distance(h, norm.time_hour)) for t in candidates if (h := _txn_hour(t)) is not None]
+        if scored:
+            scored.sort(key=lambda x: x[1])
+            best_t, best_d = scored[0]
+            second_d = scored[1][1] if len(scored) > 1 else 99
+            if best_d <= 2 and second_d - best_d >= 3:
+                flags.append("time_match")
+                return best_t


+# Merchant / agent / biller identifiers referenced in the complaint text, e.g.
+# "MERCHANT-7821", "AGENT-318", "BILLER-DESCO". Used as a counterparty signal.
+_ID_TOKEN = re.compile(r"\b((?:MERCHANT|AGENT|BILLER|TXN)[-_][A-Z0-9]+)\b", re.IGNORECASE)
+


+    cp = t.counterparty.upper().replace("_", "-")
+    return any(cp == i or i in cp or cp in i for i in ids)


+                "signals": {
+                    "amounts": norm.amounts,
+                    "phones": norm.phones,
+                    "ids": norm.ids,


+    case_type, rule_conf = classifier.classify(norm, req.user_type)
+
+    # Optional LLM reclassification — case_type ONLY, low-confidence cases only,
+    # and only when there is a concrete signal (so genuinely vague complaints stay
+    # 'other'). Never touches the transaction match, verdict, severity or safety.
+    llm_reclassified = False
+    if (
+        settings.llm_active
+        and settings.llm_classify_enabled
+        and rule_conf <= settings.classify_confidence_threshold
+        and (norm.amounts or norm.phones or norm.ids
+             or case_type == CaseType.phishing_or_social_engineering)
+    ):
+        suggested = await llm_classify_case_type(safe_complaint, settings)
+        if suggested is not None and suggested != case_type:
+            metrics.incr("classify.llm_override")
+            case_type = suggested
+            llm_reclassified = True
+
    match = match_transaction(norm, txns, case_type)


Copilot AI review requested due to automatic review settings June 26, 2026 16:58

Copilot started reviewing on behalf of AuvroIslam June 26, 2026 16:58 View session

AuvroIslam merged commit bba3c14 into main Jun 26, 2026
3 checks passed

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hybrid evidence engine — word-number amounts, time/id matching,…#1

feat: hybrid evidence engine — word-number amounts, time/id matching,…#1
AuvroIslam merged 1 commit into
mainfrom
feat/hybrid-evidence-engine

AuvroIslam commented Jun 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cp = t.counterparty.upper().replace("_", "-")
		return any(cp == i or i in cp or cp in i for i in ids)

Conversation

AuvroIslam commented Jun 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants