Advanced invoice OCR pipeline built in n8n that combines OCR technology, artificial intelligence, data validation, and automatic synchronization with external accounting system.
Data Flow: Lovable.dev (Invoice Processor) → n8n workflows → fakturaxl.pl
The system enables:
- Invoice upload through web interface in Lovable.dev
- Automatic OCR processing with AI
- Verification and error correction by AI
- Duplicate detection
- Automatic addition to fakturaxl.pl accounting system
- Overview
- Features
- Architecture
- Workflows
- Technology Stack
- Usage
- API Endpoints
- Error Handling
- Google Sheets Structure
The project automates the complete invoice processing pipeline from document upload in the web interface to integration with the accounting system. It uses AI for intelligent OCR correction, validates data mathematically, detects duplicates, and synchronizes with Fakturaxl accounting software.
- Invoice upload through web interface (PDF, images)
- Structured data extraction using Mistral AI OCR
- AI-powered OCR error correction (typos, formatting, value merging)
- Tax ID (NIP) verification in Ministry of Finance database (White List API)
- Mathematical validation of invoice calculations
- AI-powered duplicate detection (similarity scoring)
- Automatic synchronization with Fakturaxl system
- Storage in Google Sheets with line item breakdown
- Email notifications for errors and exceptions
- Multi-format support (PDF, PNG, JPG, TIFF)
- Automatic monthly archiving in Google Drive
- High-accuracy OCR by Mistral AI
- Markdown conversion for structured data extraction
Two-stage AI processing (extraction + correction)
Automatic fixing:
- Typos in company names
- Broken Tax ID numbers
- Invoice number formatting
- Currency decimal separator errors
- Merged VAT values
- Space removal in amounts
Context-aware field mapping for Polish and English invoices
Mathematical Validation:
- Verification: Net + VAT = Gross
- Line item calculation checks
- Sum of items = net amount
- Required field validation
- VAT rate compliance (0%, 5%, 8%, 23%)
- Product array length consistency
Tax ID Verification via White List API
- AI-powered similarity scoring
- Fuzzy matching for invoice numbers
- Comparison: company name + amount
- Configurable threshold (default 85%)
- Detailed duplicate reports
- Primary storage: Google Sheets (invoice data)
- Secondary storage: Google Sheets (line item breakdown)
- Failed OCR tracking sheet
- Monthly Google Drive archival
- External sync: Fakturaxl API
- Real-time processing feedback
- Visual data confirmation interface
- Edit capability before final save
- Comprehensive error messages
- Email notifications for admin intervention
The system consists of two main workflows that communicate via webhooks:
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Lovable.dev) │
│ - File upload interface │
│ - Data review and editing │
│ - Confirmation workflow │
└────────────┬────────────────────────────┬───────────────────┘
│ │
POST /OCR POST /save-invoice
│ │
v v
┌────────────────────────┐ ┌──────────────────────────────┐
│ Workflow 1: OCR │ │ Workflow 2: Validation │
│ & Data Extraction │───▶│ & Storage │
└────────────────────────┘ └──────────────────────────────┘
│ │
v v
┌────────────────────────┐ ┌──────────────────────────────┐
│ Mistral AI OCR │ │ Google Sheets │
│ Groq LLaMA 3.3 70B │ │ Fakturaxl API │
│ White List API │ │ Gmail Notifications │
└────────────────────────┘ └──────────────────────────────┘
This workflow handles the initial document processing and data extraction.
- Receives invoice via webhook (
POST /OCR) - Validates file format
- Prepares binary data for processing
- Searches for monthly folder (
Invoices_YYYY-MM) - Creates folder if doesn't exist
- Archives original document
- Uploads file to Mistral AI
- Generates signed URL
- Performs OCR extraction
- Returns structured markdown
- Processes markdown with Groq LLaMA 3.3 70B
- Extracts all invoice fields
- Maps Polish/English terminology
- Returns structured JSON
- Second AI pass for OCR error fixing
- Corrects typos, formatting, merged values
- Validates Tax ID formats
- Returns cleaned JSON
- Validates Issuer Tax ID against White List API
- Validates Recipient Tax ID
- Checks both against Ministry of Finance database
- Returns validation flags
- Success (200): Returns complete invoice data with validation flags
- Invalid Issuer (400): Error response with details
- Invalid Recipient (400): Error response with possible causes
Entry Point:
Webhook: Receive Invoice Upload- Entry point for file uploads from Lovable.dev interface
FILE INTAKE & ARCHIVAL
Google Drive Archival:
Search files and folders- Searches for monthly folder (format: Invoices_YYYY-MM)Check Folder Exists?- IF condition: checks if monthly folder existsCreate folder- Creates new monthly folder if missingPass ID- Passes folder ID to next stepsMerge File Data- Merges folder ID with file dataUpload file- Archives original invoice to Google Drive
MISTRAL OCR PIPELINE
Mistral OCR Processing:
Upload to Mistral- Uploads file to Mistral AI APIGet Signed URL- Generates temporary signed URL for document accessGet OCR Result- Performs OCR extractionConvert OCR to Markdown- Converts OCR pages array to structured Markdown format
INTELLIGENT DATA EXTRACTION
AI Data Processing:
Extract Invoice- Extracts all invoice fields from MarkdownClean & Normalize Data- Code node that cleans Markdown artifacts from first AI response and parses to JSONAI: Fix OCR Errors- Corrects OCR errors (typos, formatting, merged values)Parse AI Response to JSON- Code node that cleans Markdown artifacts from second AI response and parses to JSONGroq Chat Model- Model configuration (llama-3.3-70b-versatile)
TAX ID VERIFICATION (WHITE LIST)
Tax ID Validation:
Validate Issuer NIP- HTTP Request to White List API for issuer tax ID verificationIs NIP Valid?- IF condition: checks if issuer Tax ID is valid in Ministry of Finance databaseVerify Recipient- HTTP Request to White List API for recipient tax ID verificationIs Recipient Valid?- IF condition: checks if recipient Tax ID is validSet Recipient NIP Valid- Sets validation flag for valid recipient
API RESPONSE
Success Path:
Response: OCR Complete- Returns complete invoice data with validation flags (HTTP 200)
Error Paths:
Email: Fake Issuer Company- Gmail notification for invalid issuer tax IDResponse: Invalid Issuer NIP- Error response for invalid issuer (HTTP 400)Email: Fake Recipient Company- Gmail notification for invalid recipient tax IDResponse: Invalid Recipient NIP- Error response for invalid recipient (HTTP 400)
This workflow validates data, checks for duplicates, and stores the invoice.
- Receives validated invoice via webhook (
POST /save-invoice) - Extracts invoice data from request body
- Verifies:
net_amount + VAT = gross_amount - Validates line item calculations
- Checks sum of items = net amount
- Validates required fields
- Ensures positive amounts
- Confirms VAT rate compliance
- Fetches existing invoices from Google Sheets
- AI compares current invoice with database
- Calculates similarity score
- Flags duplicates if score ≥ 85%
If validation fails:
- Saves to "Failed OCR" sheet
- Sends email notification
- Returns 409 error response
If duplicate detected:
- Sends duplicate alert email
- Returns 409 duplicate response
If all checks pass:
- Proceeds to storage
- Saves main invoice data to Google Sheets
- Splits line items to separate sheet
- Sends XML to Fakturaxl API
- Constructs XML payload
- Posts to Fakturaxl API
- Handles sync errors gracefully
- Returns final response (201)
Entry Point:
Webhook: Save Validated Invoice- Receives validated/edited invoice data from Lovable.dev interface
INVOICE INTAKE WEBHOOK
Data Reception:
- Receives complete invoice JSON with user edits
- Input: Full invoice object with all fields
- Compatible invoice structure with both workflows
MATHEMATICAL VALIDATION
Validation Logic:
Invoice Validator- Code node that performs mathematical validation:- Verifies: net_amount + VAT = gross_amount
- Validates line item calculations
- Checks sum of items = net amount
- Validates required field presence
- Ensures positive amounts
- Confirms VAT rate compliance (0%, 5%, 8%, 23%)
- Checks product array length consistency
AI-POWERED DUPLICATE CHECK
Duplicate Detection:
Get Existing Invoices- Fetches all existing invoices from Google SheetsMerge Invoice Data- Merges current invoice with database for comparisonDuplicate Detector:- AI compares current invoice with existing database
- Calculates similarity score (fuzzy matching)
- Checks: invoice number, company names, amounts, dates
- Flags duplicates if similarity ≥ 85%
Groq Chat Model- Model configuration (llama-3.3-70b-versatile)
DECISION LOGIC
Validation Check:
Check Validation- IF condition: determines if validation passed- If FAIL → routes to error path
- If PASS → proceeds to duplicate check
Duplicate Check:
Check Duplicate- IF condition: determines if duplicate detected- If DUPLICATE → routes to duplicate alert path
- If UNIQUE → proceeds to storage
SPLITS INVOICE ITEMS
Data Processing:
Split Invoice Items- Code node that:- Separates line items from main invoice data
- Creates separate rows for each product
- Preserves invoice metadata for each item
- Prepares data structure for items sheet
DECISION LOGIC - ERROR PATHS
Validation Failed Path:
Save to Failed Sheet- Saves failed invoice to "Failed OCR" sheet for manual reviewNotify Validation Error- Sends detailed Gmail notification with error listResponse: Validation Error- Returns HTTP 409 with validation errors JSON
Duplicate Detected Path:
Notify Duplicate Found- Sends Gmail notification with similarity score and matched invoiceResponse: Duplicate Detected- Returns HTTP 409 with duplicate details JSON
EXTERNAL ACCOUNTING SYNC
Success Path - Storage:
Save to Main Sheet- Saves main invoice data to Google Sheets (primary storage)Save Invoice Items- Saves line items to separate Google Sheets (secondary storage)Add Invoice to Fakturaxl- HTTP POST request to Fakturaxl API with XML payloadResponse: Added to Fakturaxl- Returns HTTP 201 success with Fakturaxl sync confirmation
- n8n: Workflow automation platform
- Node.js: Runtime environment
- JavaScript: Scripting language
- Mistral AI: OCR processing
- Model:
mistral-ocr-latest - Features: Image base64 extraction, multi-page support
- Model:
- Groq Cloud: LLM inference
- Model:
llama-3.3-70b-versatile - Use cases: Data extraction, error correction, duplicate detection
- Model:
- Google Drive API: Document archival
- Google Sheets API: Data storage
- Gmail API: Email notifications
- White List API: Tax ID verification (Polish Ministry of Finance)
- Fakturaxl API: Accounting system sync
- Input: PDF, PNG, JPG, TIFF
- Processing: Markdown, JSON
- Output: JSON, XML (Fakturaxl), CSV (Sheets)
Step 1: Upload invoice via web interface
Step 2: Review extracted data and make corrections if needed
Step 3: Invoice synchronized to Fakturaxl accounting system
- Navigate to web interface
- Drag and drop invoice file
- Click "Process Invoice"
- Wait for OCR processing (10-30 seconds)
System displays extracted fields. Verify accuracy of:
- Invoice number
- Company names
- Tax IDs
- Dates
- Amounts
- Line items
- Click "Edit" button
- Modify any incorrect fields
- Ensure Tax IDs are valid
- Verify calculations
- Click "Save Invoice"
- System performs validation
- Checks for duplicates
- Saves to database
- Syncs to Fakturaxl
- Check Google Sheets for new entry
- Verify Fakturaxl dashboard
- Confirm email notifications (if any errors)
Upload → OCR → AI Extract → AI Correct → Tax ID Verify →
User Review → Validate → Duplicate Check → Save → Sync → Complete
Endpoint: POST /OCR
Request:
Method: POST
Content-Type: multipart/form-data
Body: File upload (binary)
Response (Success - 200):
{
"invoice_number": "FAK/001/2025",
"company_name_invoice_issuer": "Example Corp Ltd.",
"company_name_invoice_recipient": "Client Company Sp. z o.o.",
"invoice_issuer_company_address": "ul. Main Street 10, 00-001 Warsaw",
"invoice_recipient_company_address": "ul. Client Street 5, 00-002 Warsaw",
"tax_identification_issuer": "1234567890",
"tax_identification_recipient": "0987654321",
"invoice_issue_date": "2025-01-15",
"net_amount": "1000.00",
"gross_amount": "1230.00",
"vat_rate": "23",
"product_name": ["Product A", "Product B"],
"unit_price": ["500.00", "500.00"],
"quantity": ["1", "1"],
"value": ["500.00", "500.00"],
"nip_issuer_valid": true,
"nip_recipient_valid": true,
"white_list_data": {
"result": {
"subject": {
"name": "Client Company Sp. z o.o.",
"nip": "0987654321",
"statusVat": "Czynny"
}
}
}
}Response (Invalid Tax ID - 400):
{
"success": false,
"error_code": "INVALID_ISSUER_TAX_ID",
"message": "The Issuer's Tax ID could not be verified in the Ministry of Finance database",
"details": {
"invoice_number": "FAK/001/2025",
"invalid_issuer_nip": "1234567890",
"company_name": "Example Corp Ltd."
},
"action_required": "Please verify the Issuer's Tax ID and resubmit",
"timestamp": "2025-12-12T10:30:00Z"
}Endpoint: POST /save-invoice
Request:
Method: POST
Content-Type: application/json
Body: Invoice JSON (from Workflow 1 response)
Response (Success - 201):
{
"success": true,
"message": "Invoice successfully added to Fakturaxl accounting system",
"invoice_data": {
"invoice_number": "FAK/001/2025",
"issue_date": "2025-01-15",
"company_issuer": "Example Corp Ltd.",
"company_recipient": "Client Company Sp. z o.o.",
"net_amount": "1000.00",
"gross_amount": "1230.00",
"vat_rate": "23%",
"currency": "PLN"
},
"fakturaxl_response": {
"status": "success",
"raw_response": {
"kod": "1",
"info": "Document added successfully"
}
},
"storage": {
"google_sheets": "Saved to main sheet and items sheet",
"fakturaxl": "Synchronized with external accounting system"
},
"next_steps": "Invoice is now available in your Fakturaxl dashboard",
"timestamp": "2025-12-12T10:31:00Z"
}Response (Validation Failed - 409):
{
"success": false,
"error_code": "VALIDATION_FAILED",
"message": "Invoice validation failed - mathematical errors detected",
"validation": {
"is_valid": false,
"error_count": 2,
"warning_count": 0,
"errors": [
{
"field": "gross_amount",
"message": "Gross amount mismatch: Expected 1230.00, got 1250.00",
"expected": "1230.00",
"actual": "1250.00",
"difference": "20.00"
}
],
"warnings": []
},
"action_required": "Please review and correct the validation errors listed above, then resubmit the invoice.",
"timestamp": "2025-12-12T10:31:00Z"
}Response (Duplicate Detected - 409):
{
"success": false,
"error_code": "DUPLICATE_INVOICE",
"message": "This invoice already exists in the database",
"duplicate_detection": {
"is_duplicate": true,
"similarity_score": 95,
"matched_invoice_number": "FAK/001/2025",
"reason": "Exact match found in database"
},
"action_required": "This invoice has already been processed. If this is a legitimate resubmission, please contact support.",
"timestamp": "2025-12-12T10:31:00Z"
}OCR Processing Errors:
- File format not supported
- Mistral API timeout
- OCR extraction failed
- Invalid response format
Validation Errors:
- Mathematical calculation mismatches
- Missing required fields
- Invalid Tax IDs
- Negative amounts
- Array length inconsistencies
Duplicate Detection Errors:
- Similar invoice found (score ≥ 85%)
- Exact invoice number match
- Same parties and amounts
Sync Errors:
- Fakturaxl API unreachable
- Invalid XML format
- Authentication failure
- API rate limit exceeded
System sends detailed email notifications for:
1. Invalid Tax IDs
- Subject: "VALIDATION ERROR: Invalid Issuer Tax ID"
- Contains: Full invoice details, rejection reason, next steps
2. Duplicate Invoices
- Subject: "Duplicate Invoice Detected"
- Contains: Similarity score, matched invoice, comparison data
3. Validation Failures
- Subject: "Validation Failed - Invoice [NUMBER]"
- Contains: List of errors, invoice details, action required
4. Fake Company Detection
- Subject: "DATA ERROR: Invalid Recipient Tax ID"
- Contains: Possible causes (OCR error, private individual, typo)
All external API calls include automatic retries:
- Retry count: 3 attempts
- Wait between tries: 2000ms (2 seconds)
- Applies to:
- Mistral AI API
- White List API
- Fakturaxl API
Failed OCR Sheet:
- All failed invoices saved to dedicated sheet
- Manual review and reprocessing possible
- Preserves original data for debugging
Never Error Mode:
- Enabled on critical nodes
- Returns errors as data instead of stopping workflow
- Allows graceful degradation
Main invoice data storage
Invoice line items data storage
The system uses two main sheets for data storage:
Main OCR Sheet:
- Stores complete invoice data
- One row per invoice
- Contains all metadata fields
Invoice Line Items Sheet:
- Stores broken-down invoice items
- One row per item/product
- Maintains reference to parent invoice
Failed OCR Sheet:
- Stores invoices that failed validation
- For manual review and correction
- Contains error details






