Add transaction image scanner#24
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Original request
Implement parsing transactions from invoice photos, screenshots, bank-app screenshots, and bank statements using a multimodal LLM. The scanner should receive existing categories and vendors as context, extract one or more draft transactions, parse amount/date/currency/category/vendor, suggest new categories/subcategories when needed, avoid creating transactions automatically, guide the user through one confirmation/discard step per transaction, and use prior scan corrections to improve future scans.
Follow-up requests addressed in this PR:
What changed
transactionScanscontract routes for synchronous scan creation, asynchronous scan job start, progress subscription, and per-item decision recording.{ jobId, token }instead of waiting for OpenAI./api/transaction-scans/jobs/progress. It streams queued/preparing/analyzing/saving/complete/failed events and carries the final scan result.PR_ENV_PORT_BASE + 1000 + PR_NUMBER; PR web port remainsPR_ENV_PORT_BASE + PR_NUMBER./api/transactions/[transactionId]/scan-image) instead of returning multi-megabyte base64 through a Next server action.Reasoning
The scanner keeps using the existing schema-first API and typed client patterns. The LLM result is still sanitized against database-owned categories/vendors before the UI sees it, and user decisions/corrections remain persisted for future prompt context.
The repeated preview failures had two separate causes in telemetry: first a proxy/body-size rejection around 1 MB, then a web-route client timeout at 60 seconds while the API/OpenAI scan completed around 75 seconds. Chunked upload fixes the size rejection. Async scan jobs plus WebSocket progress remove the long web HTTP wait entirely and give the user useful server progress instead of a silent spinner.
The progress subscription uses a random job token rather than exposing the user API JWT to browser JavaScript. The token only authorizes one short-lived scan job result.
The scan-image review route keeps browser auth server-side and avoids pushing a 5 MB+ base64 payload through React Server Actions, which was causing the preview dialog to remain in
Loading image...for large stored images.Deployment / nginx
Preview nginx was updated on the PR environment host so websocket upgrades for scan progress route to the API container instead of the Next/web container:
/etc/nginx/pr-port.jsnow exportsapi_port, computed asPR_ENV_PORT_BASE + 1000 + PR_NUMBER./etc/nginx/sites-available/xpenser-pr-envs.cleverbrush.comnow has exact HTTP/HTTPS routes for both/external-api/api/transaction-scans/jobs/progressand/external-api/api/transaction-scans/jobs/progress/.http://10.200.1.2:$target_api_port/api/transaction-scans/jobs/progress$is_args$argswithproxy_http_version 1.1,Upgrade,Connection, andproxy_read_timeout 120s.nginx -tpasses and nginx was reloaded.The active PR deploy script on the app host was also updated so PR API containers expose
API_PORT. For PR 24, the current container ports are:pr24-web-1:0.0.0.0:3024->3000/tcppr24-api-1:0.0.0.0:4024->4000/tcpWebSocket probes against both trailing and non-trailing progress URLs returned
HTTP/1.1 101 Switching Protocolsand the expected token-scoped APIjob not foundevent for a fake job.Screenshots / preview evidence
Preview URL: https://xpenser-pr-024.cleverbrush.com
Earlier preview QA used a generated PNG receipt for Fresh Market with date
2026-06-03, total$13.15 USD, and grocery line items. The scanner returned one high-confidence receipt draft, the wizard allowed suggested vendor creation, confirming created the transaction, and the saved transaction appeared in/transactions.Final preview QA for commit
f75d90fused/tmp/xpenser-large-hardware-receipt.jpg, a 5.43 MB JPEG:Asking AI...) and no longer failed at the web route timeout.USD 30.36, dated2026-06-04 10:42./transactionsshowed the saved row markedScannedwith amount-$30.36.200 image/jpegwith5,696,190bytes.3400x4400.Screenshots captured during QA:
/tmp/xpenser-pr-024-scan-wizard.png/tmp/xpenser-pr-024-scan-reviewed.png/tmp/xpenser-pr-024-transactions.png/tmp/xpenser-pr-024-timeout-fix-scan-wizard.png/tmp/xpenser-pr-024-scan-image-review.pngValidation
Local validation for commit
f75d90f:npm run lintpassed.npm run typecheckpassed: 11 Turbo tasks successful.npm testpassed: 52 files, 289 tests.GitHub workflow
26939221086for commitf75d90f:Lint and testpassed.Deploy PR environmentpassed.Playwright e2epassed.Telemetry findings that drove this change:
POST /api/transaction-scansrejected with413before API tracing, consistent with a proxy/body-size limit.2854a12f44815d40e056bd1953e6e3eeshowed webPOST /api/transaction-scansreturned500after60000mstimeout while APIcreateTransactionScancompleted successfully after about74.6s.500span from before the final redeploy.Post-deploy SigNoz verification for PR 24:
xpenser-web-pr-24orxpenser-api-pr-24.ERROR/FATALlogs for either service.GET /api/transactions/[transactionId]/scan-image, the backend API fetch to/api/transactions/8134/scan-image, and scanner-related routes.xpenser-api-pr-24activity in the same window.