Node.js Invoice Parsing API & OCR Wrapper

Achieve 99.7%+ extraction accuracy in under 1 second. Automate your accounts payable pipeline with our REST API and lightweight Node.js SDK.

A diagram showing a raw, scanned invoice being processed by the StructOCR Node.js parsing API and outputting a clean, structured JSON object.
Figure 1: StructOCR converts complex Invoice PDFs and images into validated JSON data.

Why Invoice Parsing is Difficult

Open-source OCR engines fail on invoices due to their unstructured nature. Unlike fixed-format documents, invoice layouts vary between vendors, making template-based or regex-based approaches brittle and unscalable. Key challenges include accurately parsing multi-page tables with variable line items, handling low-quality scans suffering from skew and pixelation, and correctly interpreting diverse date and currency formats. Maintaining a custom parsing solution requires constant engineering effort to adapt to new vendor templates, leading to high maintenance costs and persistent accuracy issues.

Enterprise-Grade Extraction with StructOCR

StructOCR leverages pre-trained deep learning models specifically architected for financial documents, delivering a highly robust invoice parsing API. Our engine automatically pre-processes images to correct skew and lighting before extraction. Unlike basic engines that return a raw text dump, StructOCR provides a standardized JSON output with intelligently parsed fields like line items, merchant details, and validated totals. This eliminates the need for complex post-processing, facilitating seamless accounts payable automation. For full endpoint details and Node.js implementation guides, view our developer documentation.

Production Use Cases

  • Accounts Payable Automation: Ingest vendor invoices automatically. Eliminate manual data entry, reduce processing time by 95%, and enable straight-through processing.
  • Expense Management: Instantly capture merchant, date, and total amount from employee expense receipts and invoices for faster reimbursement cycles.
  • Three-Way Matching: Extract PO numbers, line items, and quantities to automatically match invoices against purchase orders and goods receipt notes.

Live Demo: Invoice extractor

No registration required. Upload a file to test the extraction.

1
Upload
2
Results

Drop files here or click to browse

JPG · PNG · WebP  ·  up to 500 files · max 4.5 MB each

No files selected

Ready to use this in production? Get 20 free API calls — no credit card needed.

Get 20 Free API Calls →

Implementation: Node.js API Wrapper

While our platform is a pure REST API, our official Node.js SDK wrapper simplifies integration. It handles file streams, Base64 encoding, and nested JSON parsing automatically.

Prerequisite: npm install structocr

CODE EXAMPLE
const StructOCR = require('structocr');

// 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
// 👉 https://structocr.com/register
// Initialize the client with your API key for edge processing
const client = new StructOCR('YOUR_API_KEY_HERE');

async function parseInvoice() {
  // Supports PDFs, local image paths, Base64 strings, or image URLs
  const documentSource = './vendor_invoice.pdf';

  try {
    console.log(`Executing invoice parsing API for ${documentSource}...`);

    // The SDK handles file uploading or Base64 conversion under the hood
    const result = await client.scanInvoice(documentSource);

    if (result.success && result.data) {
      const data = result.data;
      console.log('✅ Parsing Successful!');
      
      // Access structured fields directly
      console.log(`Invoice #: ${data.invoice_number}`);
      console.log(`Date:      ${data.date} (Due: ${data.due_date})`);
      console.log(`Vendor:    ${data.merchant.name} (Tax ID: ${data.merchant.tax_id})`);
      
      // Financials
      console.log(`Total:     ${data.financials.total_amount} ${data.currency}`);
      console.log(`Tax:       ${data.financials.tax_amount}`);

      // Line Items (Table Data)
      console.log('\n--- Line Items ---');
      if (data.line_items && data.line_items.length > 0) {
        data.line_items.forEach(item => {
          console.log(`- ${item.description}: ${item.quantity} x ${item.unit_price} = ${item.amount}`);
        });
      }

    } else {
      console.error('❌ Parsing Failed:', result.error || 'Unknown Error');
    }
  } catch (error) {
    console.error('An unexpected error occurred:', error.message);
  }
}

parseInvoice();

Technical Specs

  • Infrastructure: Distributed Global Edge Network
  • Latency: < 1s (Average response time)
  • Security: AES-256 Encryption, In-memory processing (Zero data retention)
  • Inputs Supported: PDF, File Upload (JPG/PNG/WebP), Base64 String, Image URL
  • Output: Structured JSON

Key Features

  • Line Item Parsing: Accurately extracts multi-page table data including description, quantity, unit price, and total amount.
  • Financial Validation: Cross-validates subtotal, tax, and total amounts to flag discrepancies.
  • Vendor Identification: Automatically identifies the merchant from a global database, normalizing names and addresses.
  • Flexible Payloads: Pass Base64 data directly from frontend clients to bypass intermediate backend storage.

Sample JSON Output

The parsing API returns a normalized JSON object, regardless of the input document format or quality.

{
  "success": true,
  "data": {
    "type": "invoice",
    "invoice_number": "INV-2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-15",
    "currency": "USD",
    "merchant": {
      "name": "AWS Web Services",
      "address": "410 Terry Ave N, Seattle, WA",
      "tax_id": "EIN-12-3456789",
      "iban": null
    },
    "customer": {
      "name": "Acme Corp Inc.",
      "tax_id": "987654321"
    },
    "financials": {
      "subtotal": 100,
      "tax_amount": 10,
      "total_amount": 110
    },
    "line_items": [
      {
        "description": "EC2 Instance Usage",
        "quantity": 1,
        "unit_price": 80,
        "amount": 80
      },
      {
        "description": "S3 Storage",
        "quantity": 1,
        "unit_price": 20,
        "amount": 20
      }
    ]
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Generic OCR services like Textract or Vision return raw lines of text or key-value pairs that require extensive post-processing and business logic. StructOCR is a specialized model pre-trained on millions of invoices. It returns a structured, predictable JSON schema with normalized fields like `line_items`, `merchant`, and `financials`, eliminating the need for you to build and maintain parsing logic.

Do you store the uploaded images or PDFs?

No. Documents are processed in-memory on our edge network and are permanently deleted immediately after the extraction is complete. We do not persist your data.

How much does the Node.js invoice parsing API cost?

Our service uses a straightforward pay-as-you-go model. You can view our full pricing details to find a tier that matches your processing volume. New accounts receive free credits to test the API in development.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included