The Definitive Node.js SDK for Invoice Data Extraction

Achieve 99.7%+ extraction accuracy in under 1500ms and eliminate manual data entry errors.

Steve HarringtonUpdated 2026-01-19
A diagram showing a raw, scanned invoice image being processed by the StructOCR API engine and outputting a clean, structured JSON object with key-value pairs.
Figure 1: StructOCR converts raw Invoice images into validated JSON data.

Why Invoice OCR is Difficult

Open-source OCR engines fail on invoices due to their unstructured nature. Unlike fixed-format documents, invoice layouts vary between vendors, making template-based or regex-based approaches brittle and unscalable. Key challenges include accurately parsing multi-page tables with variable line items, handling low-quality scans suffering from skew and pixelation, and correctly interpreting diverse date and currency formats. Maintaining a custom parsing solution requires constant engineering effort to adapt to new vendor templates, leading to high maintenance costs and persistent accuracy issues.

Enterprise-Grade Extraction with StructOCR

StructOCR leverages pre-trained Deep Learning models specifically for invoices, receipts, and purchase orders. Our API first runs automatic image pre-processing, including deskewing and denoising, to optimize inputs for the OCR engine. Unlike Tesseract, which returns a raw text dump, StructOCR provides a standardized JSON output with intelligently parsed fields like line items, merchant details, and validated totals. This eliminates the need for complex post-processing logic, allowing you to integrate a complete data extraction solution in hours, not months.

Production Use Cases

  • Accounts Payable Automation: Ingest vendor invoices automatically. Eliminate manual data entry, reduce processing time by 95%, and enable straight-through processing.
  • Expense Management: Instantly capture merchant, date, and total amount from employee expense receipts and invoices for faster reimbursement cycles.
  • Three-Way Matching: Extract PO numbers, line items, and quantities to automatically match invoices against purchase orders and goods receipt notes.

Implementation: Node.js SDK

The official Node.js SDK simplifies the extraction process. It handles file streams, authentication, and parsing of complex nested JSON responses (like line items) automatically.

Prerequisite: npm install structocr

CODE EXAMPLE
const StructOCR = require('structocr');

// 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
// 👉 https://structocr.com/register
// Initialize the client with your API key
const client = new StructOCR('YOUR_API_KEY_HERE');

async function scanInvoice() {
  // Note: Currently supports image inputs (JPG, PNG)
  const filePath = './invoice.jpg';

  try {
    console.log(`Processing ${filePath}...`);

    // SDK handles the file upload and API request
    const result = await client.scanInvoice(filePath);

    if (result.success && result.data) {
      const data = result.data;
      console.log('✅ Extraction Successful!');
      
      // Access parsed fields directly
      console.log(`Invoice #: ${data.invoice_number}`);
      console.log(`Date:      ${data.date} (Due: ${data.due_date})`);
      console.log(`Vendor:    ${data.merchant.name} (Tax ID: ${data.merchant.tax_id})`);
      
      // Financials
      console.log(`Total:     ${data.financials.total_amount} ${data.currency}`);
      console.log(`Tax:       ${data.financials.tax_amount}`);

      // Line Items (Table Data)
      console.log('\n--- Line Items ---');
      if (data.line_items && data.line_items.length > 0) {
        data.line_items.forEach(item => {
          console.log(`- ${item.description}: ${item.quantity} x ${item.unit_price} = ${item.amount}`);
        });
      }

    } else {
      console.error('❌ Extraction Failed:', result.error || 'Unknown Error');
    }
  } catch (error) {
    console.error('An unexpected error occurred:', error.message);
  }
}

scanInvoice();

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (File Path)
  • Max File Size: 4.5MB
  • Output: JSON (Nested Structure)

Key Features

  • Line Item Parsing: Accurately extracts multi-page table data including description, quantity, unit price, and total amount.
  • Financial Validation: Cross-validates subtotal, tax, and total amounts to flag discrepancies.
  • Vendor Identification: Automatically identifies the merchant from a global database, normalizing names and addresses.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "invoice",
    "invoice_number": "INV-2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-15",
    "currency": "USD",
    "merchant": {
      "name": "AWS Web Services",
      "address": "410 Terry Ave N, Seattle, WA",
      "tax_id": "EIN-12-3456789",
      "iban": null
    },
    "customer": {
      "name": "Acme Corp Inc.",
      "tax_id": "987654321"
    },
    "financials": {
      "subtotal": 100,
      "tax_amount": 10,
      "total_amount": 110
    },
    "line_items": [
      {
        "description": "EC2 Instance Usage",
        "quantity": 1,
        "unit_price": 80,
        "amount": 80
      },
      {
        "description": "S3 Storage",
        "quantity": 1,
        "unit_price": 20,
        "amount": 20
      }
    ]
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Generic OCR services like Textract or Vision return raw lines of text or key-value pairs that require extensive post-processing and business logic. StructOCR is a specialized model pre-trained on millions of invoices. It returns a structured, predictable JSON schema with normalized fields like `line_items`, `merchant`, and `financials`, eliminating the need for you to build and maintain parsing logic.

Do you store the uploaded images?

No. Documents are processed in-memory and are permanently deleted immediately after the extraction is complete. We do not persist your data.

How do you handle blurry or low-quality images?

Our API includes an automatic pre-processing engine that performs deskewing, denoising, and contrast enhancement to maximize accuracy even on low-quality scans or mobile phone captures.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included