In most medium-sized businesses, the process of handling a supplier invoice looks like this: someone receives the PDF by email, downloads it, opens the accounting system, manually extracts the vendor name, invoice number, subtotal, tax, and total, types it into the system, files it in a folder, and marks it as processed.
Multiplied by 30, 50, or 100 invoices a month, that's between 3 and 8 hours of purely manual administrative work — prone to errors and adding no real value to the business.
OCR with artificial intelligence automates that entire process. An invoice arrives by email, the system reads it, extracts all relevant data, validates it, and records it in your accounting system — in under 30 seconds, without anyone having to touch the document.
In this article I'll explain how to build that system.
What Is OCR and Why AI Changed Everything
OCR (Optical Character Recognition) is the technology that converts images of text — like scanned documents or photos of invoices — into editable digital text.
Traditional OCR has existed since the 1990s. Its problem: it worked well with perfectly formatted documents, but failed constantly with real-world invoices — different layouts, varying date formats, columns aligned in unpredictable ways, stamps and signatures layered on top.
AI-powered OCR — like Google Document AI, Amazon Textract, or Azure Form Recognizer — uses computer vision models trained specifically on millions of invoices. It doesn't just read the text, it understands the document's structure: it knows that "issue date" and "document date" are the same thing, that the tax amount comes after the net amount, that the vendor's tax ID is at the top.
Modern systems achieve 95-99% accuracy on standard invoices. For most businesses, that's equivalent to eliminating manual work almost entirely.
The Complete System Flow
Before getting into technical configuration, the full picture:
Invoice arrives by email (PDF or image)
↓
n8n detects the new email with attachment
↓
Sends the document to Google Document AI
↓
OCR extracts: invoice number, date, vendor tax ID,
description, subtotal, tax, total
↓
Automatic validation: Do the amounts add up? Does the vendor ID exist?
↓
If valid → Records in accounting system / Google Sheets
If error → Notifies team for manual review
↓
Archives the PDF in Google Drive with standardized name
↓
Marks the email as processed
The complete process takes between 15 and 45 seconds per invoice. Without anyone having touched anything.
The Tech Stack
| Tool | Function | Cost |
|---|---|---|
| n8n | Automation engine and orchestration | $20 USD/month |
| Gmail | Invoice receipt | Free |
| Google Document AI | OCR and document intelligence | Pay-per-use: ~$1.50 USD/1,000 pages |
| Google Drive | Document archiving | Free up to 15GB |
| Google Sheets or your accounting system | Data recording | Free (Sheets) |
Estimated cost for 200 invoices/month: under $1 USD in Document AI + $20 USD for n8n = under $21 USD/month to eliminate 5-8 hours of manual work.
Step 1: Set Up Google Document AI
Google Document AI has a processor specifically for invoices called "Invoice Parser" that works well with English-language documents and standard invoice formats.
1.1 Enable the API in Google Cloud
- Go to console.cloud.google.com
- Create a new project or use an existing one
- Enable the Document AI API
- In the Document AI menu, create a new processor of type "Invoice Parser"
- Note down the Project ID and Processor ID — you'll need them in n8n
1.2 Configure Service Credentials
Create a service account with Document AI permissions. Download the JSON credentials file. In n8n, create a new "Google" credential and upload this file.
Step 2: Set Up the Email Trigger in n8n
2.1 Gmail Trigger Node
In n8n, create the "Gmail Trigger" node configured to check the inbox every 5 minutes for new emails that:
- Come from known vendor addresses (optional but recommended)
- Have attachments in PDF or image format
If you use a specific email address for receiving invoices (for example invoices@yourcompany.com), configure the trigger to monitor that inbox exclusively.
2.2 Extract the Attachment
Use the "Gmail" node to download the email attachment. n8n can handle the file as binary data and pass it directly to Document AI.
Step 3: Process the Document With Document AI
3.1 HTTP Request Node for Document AI
Document AI is called via a REST API. Configure the "HTTP Request" node in n8n:
- Method: POST
- URL:
https://documentai.googleapis.com/v1/projects/{{PROJECT_ID}}/locations/us/processors/{{PROCESSOR_ID}}:process - Authentication: Google OAuth2 (the credentials you configured)
- Body: The document in base64 with the corresponding MIME type
The Document AI response includes all extracted fields with their value and the confidence level of each extraction.
3.2 Field Mapping
Document AI returns fields with standard names in English. Use n8n's "Set" node to map them to your system's fields:
invoice_id → invoice_number
invoice_date → issue_date
supplier_name → vendor_name
supplier_tax_id → vendor_tax_id
net_amount → subtotal
tax_amount → tax
total_amount → invoice_total
line_items → line_detail
Step 4: Automatic Validation
Before recording any data in the accounting system, the system needs to validate that the numbers make sense. An error here can create accounting problems that are difficult to correct.
"Code" node in n8n for validations:
const subtotal = parseFloat(items[0].json.subtotal);
const tax = parseFloat(items[0].json.tax);
const total = parseFloat(items[0].json.invoice_total);
const calculatedTax = subtotal * 0.10; // Adjust tax rate for your jurisdiction
const errors = [];
// Validation 1: Tax matches the subtotal
if (Math.abs(calculatedTax - tax) > 1) {
errors.push(`Tax mismatch. Calculated: ${calculatedTax.toFixed(0)}, Extracted: ${tax}`);
}
// Validation 2: Total adds up
if (Math.abs((subtotal + tax) - total) > 1) {
errors.push(`Total mismatch. Sum: ${(subtotal + tax).toFixed(0)}, Extracted: ${total}`);
}
// Validation 3: OCR confidence is sufficient
const confidence = items[0].json.confidence;
if (confidence < 0.85) {
errors.push(`Low OCR confidence: ${(confidence * 100).toFixed(0)}%`);
}
return [{
json: {
...items[0].json,
is_valid: errors.length === 0,
errors: errors
}
}];
If there are validation errors, the workflow takes the team notification path instead of automatic recording.
Step 5: Recording and Archiving
5.1 If the invoice is valid → Record in Google Sheets
Use the "Google Sheets" node to add a row with all extracted data:
| Date | Invoice # | Vendor | Tax ID | Subtotal | Tax | Total | Status | File |
|---|---|---|---|---|---|---|---|---|
| 2026-10-15 | 003241 | Vendor X | 12-3456789 | $100,000 | $10,000 | $110,000 | Processed | [link] |
If your accounting system has an API (like QuickBooks, Xero, FreshBooks, or others), you can connect it directly and create the accounting record without going through Google Sheets.
5.2 Archive the PDF With a Standardized Name
Use the "Google Drive" node to upload the PDF to the corresponding folder with a name that includes: date, invoice number, and vendor.
2026-10-15_invoice-003241_VendorX.pdf
This naming convention makes finding any invoice instantaneous.
5.3 If there are errors → Notify the team
Use the "Gmail" or "WhatsApp Business" node to send a notification to the accounting team with:
- The file name
- The errors detected
- A direct link to the PDF for manual review
Handling Non-Standard Formats
Some invoices — especially from foreign vendors or very old billing systems — may have formats that Document AI doesn't recognize well.
Strategy for these cases:
- Configure a confidence threshold (for example, 80%). Invoices below that threshold always go to human review
- Keep a log of vendors whose invoices frequently fail
- For those vendors, consider creating a custom template in Document AI that learns the specific format of their documents
Over time, the system improves because Document AI allows you to train the model with examples of your specific documents.
Tax Compliance and Legal Considerations
United States: The IRS has requirements for digital record retention. If you receive invoices with specific e-invoice formats, ensure your digital storage complies with applicable regulations.
Canada: CRA guidelines require invoices to be stored for a minimum of 6 years. Ensure your Google Drive archiving structure supports easy retrieval.
International: For businesses operating across multiple jurisdictions, consult with your accountant to confirm the digital storage this system generates meets the document retention requirements of each country.
ROI of the System
For a business that processes 150 invoices per month:
| Item | Before (manual) | After (automatic) |
|---|---|---|
| Time per invoice | 5-8 minutes | 30 seconds |
| Total monthly time | 12-20 hours | 1 hour (review) |
| Data entry errors | 2-5% of invoices | < 0.5% |
| System cost | $0 | $21 USD/month |
| Hours recovered | — | 11-19 hours/month |
If the data entry work is done by someone at $15 USD/hour, the system pays for itself in the first week of savings.
Need to Implement This at Your Business?
The system I described is exactly what I implement for businesses that are processing invoices manually. The implementation process takes between 1 and 2 weeks depending on how many accounting systems need to be connected.
If your business handles more than 50 invoices a month, the investment is recovered in the first month.
Schedule a diagnostic call and I'll tell you whether this system applies to your specific situation and what modifications it would require for your accounting system.