Skip to content

feat: hybrid evidence engine — word-number amounts, time/id matching,…#1

Merged
AuvroIslam merged 1 commit into
mainfrom
feat/hybrid-evidence-engine
Jun 26, 2026
Merged

feat: hybrid evidence engine — word-number amounts, time/id matching,…#1
AuvroIslam merged 1 commit into
mainfrom
feat/hybrid-evidence-engine

Conversation

@AuvroIslam

Copy link
Copy Markdown
Owner

… guarded LLM case_type

Evidence reasoning (the 35-pt core):

  • normalize: parse word-number amounts ("five thousand", "panch hajar", "দুই হাজার"), scale words (5k / ২ লাখ), time-of-day + clock hour, and referenced merchant/agent/biller ids
  • matcher: disambiguate same-amount transfers by time-of-day / id; still returns null (insufficient_data) when nothing cleanly separates candidates
  • classifier: confidence-aware classify(); tighter phishing precision (strong vs weak signals); broader Banglish/English keyword coverage

Guarded LLM case_type reclassification (hybrid tail):

  • app/llm/classify.py: LLM picks case_type from the fixed enum ONLY for low-confidence cases with a concrete signal; validated against the enum, confidence-capped, falls back to the rule answer on any error
  • never touches relevant_transaction_id / verdict / severity / safety; does not fire on vague no-signal complaints (stay 'other')

Safety: expand the unauthorized-promise denylist (conversational phrasings)

Observability: per-decision audit-trail log + case_type/verdict counters

Docs: rewrite README (requirement→solution table, decision/path notes, $0-cost story, colorful Mermaid diagrams); fix stale test counts; PRD §4.1 now matches the implemented matcher

Tests: 102 passing (was 82) incl. word-amount, time/id disambiguation, and monkeypatched LLM-reclassification guardrail tests; 10/10 samples exact with LLM on and off

… guarded LLM case_type

Evidence reasoning (the 35-pt core):
- normalize: parse word-number amounts ("five thousand", "panch hajar",
  "দুই হাজার"), scale words (5k / ২ লাখ), time-of-day + clock hour, and
  referenced merchant/agent/biller ids
- matcher: disambiguate same-amount transfers by time-of-day / id; still
  returns null (insufficient_data) when nothing cleanly separates candidates
- classifier: confidence-aware classify(); tighter phishing precision
  (strong vs weak signals); broader Banglish/English keyword coverage

Guarded LLM case_type reclassification (hybrid tail):
- app/llm/classify.py: LLM picks case_type from the fixed enum ONLY for
  low-confidence cases with a concrete signal; validated against the enum,
  confidence-capped, falls back to the rule answer on any error
- never touches relevant_transaction_id / verdict / severity / safety;
  does not fire on vague no-signal complaints (stay 'other')

Safety: expand the unauthorized-promise denylist (conversational phrasings)

Observability: per-decision audit-trail log + case_type/verdict counters

Docs: rewrite README (requirement→solution table, decision/path notes,
$0-cost story, colorful Mermaid diagrams); fix stale test counts; PRD §4.1
now matches the implemented matcher

Tests: 102 passing (was 82) incl. word-amount, time/id disambiguation, and
monkeypatched LLM-reclassification guardrail tests; 10/10 samples exact with
LLM on and off
Copilot AI review requested due to automatic review settings June 26, 2026 16:58
@AuvroIslam AuvroIslam merged commit bba3c14 into main Jun 26, 2026
3 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the deterministic evidence engine to better ground complaints to real transactions by extracting richer complaint signals (scaled/word amounts, time-of-day, referenced ids) and using those signals to disambiguate same-amount candidates. It also adds confidence-aware classification plus an optional, rules-gated LLM case_type reclassification path, along with expanded safety promise detection, observability counters/audit logging, updated docs, and new tests.

Changes:

  • Add normalization for scaled + word-number amounts, time/day hints, and merchant/agent/biller id references; use id/time signals to disambiguate same-amount matches.
  • Introduce confidence-aware deterministic classification and an optional LLM case_type reclassification helper (enum-validated + timeout-capped), with corresponding tests.
  • Expand safety unauthorized-promise patterns; add metrics + audit-trail logging; refresh README/PRD and update test counts.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/engine/normalize.py Adds extraction of scaled/word amounts, time signals, and referenced ids into Normalized.
app/engine/matcher.py Adds id/time-based disambiguation logic and allows id as a fallback candidate signal.
app/engine/classifier.py Adds classify() returning (CaseType, confidence) and refines keyword groups.
app/llm/classify.py New: enum-validated LLM case_type classifier used for low-confidence cases.
app/pipeline.py Integrates confidence-aware classify + optional LLM reclassification; adds counters and audit-trail logging.
app/engine/safety.py Expands unauthorized promise denylist patterns.
app/config.py Adds settings knobs for LLM case_type reclassification thresholds/timeouts.
.env.example Documents new env vars for LLM reclassification configuration.
tests/test_matcher.py Adds tests for scaled/word amounts, time extraction, and id/time disambiguation behavior.
tests/test_classifier.py Adds tests validating confidence levels for strong/weak signals.
tests/test_llm_reclassify.py New: offline monkeypatched tests for the LLM reclassification guardrails.
README.md Updates architecture/docs, test counts, and adds diagrams and requirement→solution mapping.
prd.md Updates matcher specification to reflect the implemented id/time disambiguation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/engine/matcher.py
Comment on lines +88 to +95
scored = [(t, _hour_distance(h, norm.time_hour)) for t in candidates if (h := _txn_hour(t)) is not None]
if scored:
scored.sort(key=lambda x: x[1])
best_t, best_d = scored[0]
second_d = scored[1][1] if len(scored) > 1 else 99
if best_d <= 2 and second_d - best_d >= 3:
flags.append("time_match")
return best_t
Comment thread app/engine/normalize.py
Comment on lines +312 to +315
# Merchant / agent / biller identifiers referenced in the complaint text, e.g.
# "MERCHANT-7821", "AGENT-318", "BILLER-DESCO". Used as a counterparty signal.
_ID_TOKEN = re.compile(r"\b((?:MERCHANT|AGENT|BILLER|TXN)[-_][A-Z0-9]+)\b", re.IGNORECASE)

Comment thread app/engine/matcher.py
Comment on lines +67 to +68
cp = t.counterparty.upper().replace("_", "-")
return any(cp == i or i in cp or cp in i for i in ids)
Comment thread app/pipeline.py
Comment on lines +156 to +159
"signals": {
"amounts": norm.amounts,
"phones": norm.phones,
"ids": norm.ids,
Comment thread app/pipeline.py
Comment on lines +63 to 82
case_type, rule_conf = classifier.classify(norm, req.user_type)

# Optional LLM reclassification — case_type ONLY, low-confidence cases only,
# and only when there is a concrete signal (so genuinely vague complaints stay
# 'other'). Never touches the transaction match, verdict, severity or safety.
llm_reclassified = False
if (
settings.llm_active
and settings.llm_classify_enabled
and rule_conf <= settings.classify_confidence_threshold
and (norm.amounts or norm.phones or norm.ids
or case_type == CaseType.phishing_or_social_engineering)
):
suggested = await llm_classify_case_type(safe_complaint, settings)
if suggested is not None and suggested != case_type:
metrics.incr("classify.llm_override")
case_type = suggested
llm_reclassified = True

match = match_transaction(norm, txns, case_type)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants