Skip to content

Add transaction image scanner#24

Merged
andrewzolotukhin merged 9 commits into
mainfrom
feat/transaction-image-scanner
Jun 5, 2026
Merged

Add transaction image scanner#24
andrewzolotukhin merged 9 commits into
mainfrom
feat/transaction-image-scanner

Conversation

@andrewzolotukhin

@andrewzolotukhin andrewzolotukhin commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Original request

Implement parsing transactions from invoice photos, screenshots, bank-app screenshots, and bank statements using a multimodal LLM. The scanner should receive existing categories and vendors as context, extract one or more draft transactions, parse amount/date/currency/category/vendor, suggest new categories/subcategories when needed, avoid creating transactions automatically, guide the user through one confirmation/discard step per transaction, and use prior scan corrections to improve future scans.

Follow-up requests addressed in this PR:

  • Diagnose and fix the scan failure seen in preview telemetry.
  • Increase the scan file size limit so images over 1 MB can scan.
  • Suggest category/subcategory creation when no existing category is a strong fit.
  • Internally mark transactions created from scans and retain the original uploaded image so users can review it later.
  • Move long-running scan work off the web HTTP request and show server progress over Cleverbrush Framework WebSocket subscriptions.
  • Fix preview nginx/API-port routing so websocket progress reaches the API service.

What changed

  • Added transactionScans contract routes for synchronous scan creation, asynchronous scan job start, progress subscription, and per-item decision recording.
  • Added scan schemas for draft transactions, field confidence, category/vendor matches, suggested category metadata, duplicate references, correction decisions, scan job tokens, and progress events.
  • Added API persistence for scan sessions/items and stored scan images. Confirmed scanned transactions are linked back to their scan item, and the original image can be reviewed from transaction history.
  • Added a multimodal OpenAI scanner that sends active categories, vendors, recent transactions, and prior scan correction examples with the uploaded image.
  • Added category/subcategory suggestion behavior when no existing category is a strong fit.
  • Increased scan image support to 10 MB and changed the web upload path to chunk images into sub-1 MB JSON requests so nginx request-size limits no longer reject 1 MB+ images before app code runs.
  • Added an API-side in-memory scan job store. The web route now uploads/assembles chunks, stores a local attachment token, starts an authenticated backend scan job, and returns { jobId, token } instead of waiting for OpenAI.
  • Added a public-but-token-scoped Cleverbrush subscription at /api/transaction-scans/jobs/progress. It streams queued/preparing/analyzing/saving/complete/failed events and carries the final scan result.
  • Updated the capture UI to show progress messages while scanning, then open the existing one-step-per-draft wizard when the subscription returns the final scan.
  • Added local/PR compose support for publishing the API port. PR API port is PR_ENV_PORT_BASE + 1000 + PR_NUMBER; PR web port remains PR_ENV_PORT_BASE + PR_NUMBER.
  • Changed scanned-image review to stream image bytes from a web route (/api/transactions/[transactionId]/scan-image) instead of returning multi-megabyte base64 through a Next server action.

Reasoning

The scanner keeps using the existing schema-first API and typed client patterns. The LLM result is still sanitized against database-owned categories/vendors before the UI sees it, and user decisions/corrections remain persisted for future prompt context.

The repeated preview failures had two separate causes in telemetry: first a proxy/body-size rejection around 1 MB, then a web-route client timeout at 60 seconds while the API/OpenAI scan completed around 75 seconds. Chunked upload fixes the size rejection. Async scan jobs plus WebSocket progress remove the long web HTTP wait entirely and give the user useful server progress instead of a silent spinner.

The progress subscription uses a random job token rather than exposing the user API JWT to browser JavaScript. The token only authorizes one short-lived scan job result.

The scan-image review route keeps browser auth server-side and avoids pushing a 5 MB+ base64 payload through React Server Actions, which was causing the preview dialog to remain in Loading image... for large stored images.

Deployment / nginx

Preview nginx was updated on the PR environment host so websocket upgrades for scan progress route to the API container instead of the Next/web container:

  • /etc/nginx/pr-port.js now exports api_port, computed as PR_ENV_PORT_BASE + 1000 + PR_NUMBER.
  • /etc/nginx/sites-available/xpenser-pr-envs.cleverbrush.com now has exact HTTP/HTTPS routes for both /external-api/api/transaction-scans/jobs/progress and /external-api/api/transaction-scans/jobs/progress/.
  • Those routes proxy to http://10.200.1.2:$target_api_port/api/transaction-scans/jobs/progress$is_args$args with proxy_http_version 1.1, Upgrade, Connection, and proxy_read_timeout 120s.
  • nginx -t passes and nginx was reloaded.

The active PR deploy script on the app host was also updated so PR API containers expose API_PORT. For PR 24, the current container ports are:

  • pr24-web-1: 0.0.0.0:3024->3000/tcp
  • pr24-api-1: 0.0.0.0:4024->4000/tcp

WebSocket probes against both trailing and non-trailing progress URLs returned HTTP/1.1 101 Switching Protocols and the expected token-scoped API job not found event for a fake job.

Screenshots / preview evidence

Preview URL: https://xpenser-pr-024.cleverbrush.com

Earlier preview QA used a generated PNG receipt for Fresh Market with date 2026-06-03, total $13.15 USD, and grocery line items. The scanner returned one high-confidence receipt draft, the wizard allowed suggested vendor creation, confirming created the transaction, and the saved transaction appeared in /transactions.

Final preview QA for commit f75d90f used /tmp/xpenser-large-hardware-receipt.jpg, a 5.43 MB JPEG:

  • Chunk upload completed successfully.
  • WebSocket progress reached the UI (Asking AI...) and no longer failed at the web route timeout.
  • The wizard returned one high-confidence receipt draft for Lakeside Hardware, USD 30.36, dated 2026-06-04 10:42.
  • Confirm/save completed and /transactions showed the saved row marked Scanned with amount -$30.36.
  • The scanned image review route returned 200 image/jpeg with 5,696,190 bytes.
  • The review dialog rendered the original image at natural dimensions 3400x4400.

Screenshots captured during QA:

  • /tmp/xpenser-pr-024-scan-wizard.png
  • /tmp/xpenser-pr-024-scan-reviewed.png
  • /tmp/xpenser-pr-024-transactions.png
  • /tmp/xpenser-pr-024-timeout-fix-scan-wizard.png
  • /tmp/xpenser-pr-024-scan-image-review.png

Validation

Local validation for commit f75d90f:

  • npm run lint passed.
  • npm run typecheck passed: 11 Turbo tasks successful.
  • npm test passed: 52 files, 289 tests.

GitHub workflow 26939221086 for commit f75d90f:

  • Lint and test passed.
  • Deploy PR environment passed.
  • Playwright e2e passed.

Telemetry findings that drove this change:

  • Large-image preview attempt showed POST /api/transaction-scans rejected with 413 before API tracing, consistent with a proxy/body-size limit.
  • Later SigNoz trace 2854a12f44815d40e056bd1953e6e3ee showed web POST /api/transaction-scans returned 500 after 60000ms timeout while API createTransactionScan completed successfully after about 74.6s.
  • A later 2-hour SigNoz window still included the old timeout trace and one transient OpenAI 500 span from before the final redeploy.

Post-deploy SigNoz verification for PR 24:

  • Last 15 minutes after final deploy: no trace errors for xpenser-web-pr-24 or xpenser-api-pr-24.
  • Last 15 minutes after final deploy: no ERROR/FATAL logs for either service.
  • Successful traces recorded for GET /api/transactions/[transactionId]/scan-image, the backend API fetch to /api/transactions/8134/scan-image, and scanner-related routes.
  • Inbound HTTP server metrics show recent xpenser-api-pr-24 activity in the same window.

@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 05:01 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 05:28 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 06:21 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 06:42 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 07:00 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 07:08 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 07:29 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 4, 2026 08:08 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin temporarily deployed to pr-24 June 5, 2026 03:07 — with GitHub Actions Inactive
@andrewzolotukhin andrewzolotukhin merged commit 1b875a2 into main Jun 5, 2026
4 checks passed
@andrewzolotukhin andrewzolotukhin deleted the feat/transaction-image-scanner branch June 5, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant