The Definitive Node.js SDK for Invoice Data Extraction
Achieve 99.7%+ extraction accuracy in under 1500ms and eliminate manual data entry errors.

Why Invoice OCR is Difficult
Open-source OCR engines fail on invoices due to their unstructured nature. Unlike fixed-format documents, invoice layouts vary between vendors, making template-based or regex-based approaches brittle and unscalable. Key challenges include accurately parsing multi-page tables with variable line items, handling low-quality scans suffering from skew and pixelation, and correctly interpreting diverse date and currency formats. Maintaining a custom parsing solution requires constant engineering effort to adapt to new vendor templates, leading to high maintenance costs and persistent accuracy issues.
Enterprise-Grade Extraction with StructOCR
StructOCR leverages pre-trained Deep Learning models specifically for invoices, receipts, and purchase orders. Our API first runs automatic image pre-processing, including deskewing and denoising, to optimize inputs for the OCR engine. Unlike Tesseract, which returns a raw text dump, StructOCR provides a standardized JSON output with intelligently parsed fields like line items, merchant details, and validated totals. This eliminates the need for complex post-processing logic, allowing you to integrate a complete data extraction solution in hours, not months.
Production Use Cases
- Accounts Payable Automation: Ingest vendor invoices automatically. Eliminate manual data entry, reduce processing time by 95%, and enable straight-through processing.
- Expense Management: Instantly capture merchant, date, and total amount from employee expense receipts and invoices for faster reimbursement cycles.
- Three-Way Matching: Extract PO numbers, line items, and quantities to automatically match invoices against purchase orders and goods receipt notes.
Implementation: Node.js SDK
The official Node.js SDK simplifies the extraction process. It handles file streams, authentication, and parsing of complex nested JSON responses (like line items) automatically.
Prerequisite: npm install structocr
const StructOCR = require('structocr');
// 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
// 👉 https://structocr.com/register
// Initialize the client with your API key
const client = new StructOCR('YOUR_API_KEY_HERE');
async function scanInvoice() {
// Note: Currently supports image inputs (JPG, PNG)
const filePath = './invoice.jpg';
try {
console.log(`Processing ${filePath}...`);
// SDK handles the file upload and API request
const result = await client.scanInvoice(filePath);
if (result.success && result.data) {
const data = result.data;
console.log('✅ Extraction Successful!');
// Access parsed fields directly
console.log(`Invoice #: ${data.invoice_number}`);
console.log(`Date: ${data.date} (Due: ${data.due_date})`);
console.log(`Vendor: ${data.merchant.name} (Tax ID: ${data.merchant.tax_id})`);
// Financials
console.log(`Total: ${data.financials.total_amount} ${data.currency}`);
console.log(`Tax: ${data.financials.tax_amount}`);
// Line Items (Table Data)
console.log('\n--- Line Items ---');
if (data.line_items && data.line_items.length > 0) {
data.line_items.forEach(item => {
console.log(`- ${item.description}: ${item.quantity} x ${item.unit_price} = ${item.amount}`);
});
}
} else {
console.error('❌ Extraction Failed:', result.error || 'Unknown Error');
}
} catch (error) {
console.error('An unexpected error occurred:', error.message);
}
}
scanInvoice();Technical Specs
- •Latency: < 5s (Average)
- •Uptime: 98.5% SLA
- •Security: AES-256 Encryption & SOC2 Compliant
- •Input: JPG, PNG, WebP (File Path)
- •Max File Size: 4.5MB
- •Output: JSON (Nested Structure)
Key Features
- •Line Item Parsing: Accurately extracts multi-page table data including description, quantity, unit price, and total amount.
- •Financial Validation: Cross-validates subtotal, tax, and total amounts to flag discrepancies.
- •Vendor Identification: Automatically identifies the merchant from a global database, normalizing names and addresses.
Sample JSON Output
StructOCR returns a normalized JSON object, regardless of the input image angle or quality.
{
"success": true,
"data": {
"type": "invoice",
"invoice_number": "INV-2026-001",
"date": "2026-01-15",
"due_date": "2026-02-15",
"currency": "USD",
"merchant": {
"name": "AWS Web Services",
"address": "410 Terry Ave N, Seattle, WA",
"tax_id": "EIN-12-3456789",
"iban": null
},
"customer": {
"name": "Acme Corp Inc.",
"tax_id": "987654321"
},
"financials": {
"subtotal": 100,
"tax_amount": 10,
"total_amount": 110
},
"line_items": [
{
"description": "EC2 Instance Usage",
"quantity": 1,
"unit_price": 80,
"amount": 80
},
{
"description": "S3 Storage",
"quantity": 1,
"unit_price": 20,
"amount": 20
}
]
}
}Frequently Asked Questions
How does StructOCR compare to AWS Textract or Google Vision?
Generic OCR services like Textract or Vision return raw lines of text or key-value pairs that require extensive post-processing and business logic. StructOCR is a specialized model pre-trained on millions of invoices. It returns a structured, predictable JSON schema with normalized fields like `line_items`, `merchant`, and `financials`, eliminating the need for you to build and maintain parsing logic.
Do you store the uploaded images?
No. Documents are processed in-memory and are permanently deleted immediately after the extraction is complete. We do not persist your data.
How do you handle blurry or low-quality images?
Our API includes an automatic pre-processing engine that performs deskewing, denoising, and contrast enhancement to maximize accuracy even on low-quality scans or mobile phone captures.
More OCR Tutorials
Node.js Driver's License OCR API
Achieve >99% accuracy extracting driver's license data with our Node.js SDK. Get structured JSON output in milliseconds, eliminating manual entry errors.
Node.js National ID OCR API
Achieve >99% accuracy with our Node.js SDK for National ID OCR. Get clean, validated JSON output from any ID card image. Integrate KYC in minutes.
Node.js Passport OCR API
High-accuracy Node.js Passport OCR API for parsing the MRZ. Get structured JSON output in milliseconds using our official Node.js SDK. Stop fighting Tesseract.
Node.js VIN (Vehicle Identification Number) OCR API
Tutorial: How to use the StructOCR Node.js SDK to extract data from VIN (Vehicle Identification Number)s. Includes code samples and JSON schema.
Precise Data Extraction and Seamless
Integration with AI-powered OCR API.
Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.
No credit card required • Full API access included