Direct HTTP API for High-Accuracy Invoice Parsing

Q: How does StructOCR compare to AWS Textract or Google Vision?

Unlike generic OCR services that return raw text coordinates, StructOCR's models are pre-trained specifically on millions of invoices. This allows us to return a structured JSON object with key-value pairs (e.g., `invoice_number`, `due_date`) and accurately parsed `line_items`, eliminating the need for complex post-processing logic.

Q: Do you store the uploaded images?

No. We do not store customer data. Images are processed in-memory and are permanently deleted immediately after the API response is generated. Nothing is written to disk.

Q: How do you handle blurry or skewed images?

Our API includes an automatic image pre-processing pipeline that applies de-skewing, noise reduction, and contrast enhancement algorithms before analysis, maximizing accuracy even on low-quality mobile phone captures.

Achieve 99.8%+ accuracy on line-item extraction with sub-second latency via a simple POST request.

Steve Harrington | Solutions Consultant•Updated 2026-04-12

↓ Try Free — Upload Your Image Get 20 Free API Calls — No Credit Card

Diagram showing a scanned invoice image being sent to the StructOCR API and returning a structured JSON data object. — Figure 1: StructOCR converts raw Invoice images into validated JSON data.

Why Self-Managed Invoice OCR Fails

Building reliable invoice OCR is more than a Tesseract wrapper. Open-source tools fail on real-world documents due to variable vendor templates, multi-page tables, and skewed or noisy scans. Maintaining a library of RegEx patterns for each vendor is brittle and unscalable. Developers are forced to build complex post-processing logic to correct errors, parse line items, and validate totals, resulting in high maintenance overhead and persistent accuracy issues that defeat the purpose of automation.

Enterprise-Grade Extraction with StructOCR

StructOCR leverages pre-trained Deep Learning models designed to comprehend the semantic structure of an invoice, extending beyond simple text recognition. Our invoice ocr api pipeline incorporates automatic image pre-processing techniques such as deskewing, denoising, and contrast enhancement to optimize quality prior to data extraction. In contrast to solutions like Tesseract that output unstructured text, StructOCR delivers a standardized JSON format. This output features explicitly typed fields, validated financial data, and precisely parsed line items, significantly simplifying finance workflows and removing the necessity for bespoke parsing logic.

Production Use Cases

Accounts Payable Automation: Eliminate manual data entry. Ingest vendor invoices directly into your ERP or accounting software with 99.8%+ field accuracy.
Expense Management Systems: Enable real-time expense approval by allowing users to snap a photo of a receipt and have line items extracted instantly.
Three-Way Matching: Automate the matching of purchase orders to invoices by extracting key fields like PO number, line items, and totals.

Live Demo: Invoice extractor

No registration required. Upload a file to test the extraction.

Upload

Results

↑

Drop files here or click to browse

JPG · PNG · WebP · up to 500 files · max 4.5 MB each

No files selected

Ready to use this in production? Get 20 free API calls — no credit card needed.

Get 20 Free API Calls →

Implementation: Raw PHP (cURL)

The following PHP example demonstrates a complete extraction flow using cURL. It correctly handles image encoding, sets the 'x-api-key' header, and parses the nested invoice JSON data.

Prerequisite: PHP 7.4+ with cURL extension

CODE EXAMPLE

<?php

// 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
// 👉 https://structocr.com/register

$apiKey = 'YOUR_API_KEY_HERE';
$apiUrl = 'https://api.structocr.com/v1/invoice';
$imagePath = 'invoice.jpg';

// 1. Validate File
if (!file_exists($imagePath)) {
    die("Error: File not found at $imagePath");
}

// 2. Encode Image to Base64
$imageData = file_get_contents($imagePath);
$base64Image = base64_encode($imageData);

// 3. Prepare Payload
$payload = json_encode(['img' => $base64Image]);

// 4. Initialize cURL
$ch = curl_init();
curl_setopt_array($ch, [
    CURLOPT_URL => $apiUrl,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $payload,
    CURLOPT_HTTPHEADER => [
        'Content-Type: application/json',
        'x-api-key: ' . $apiKey, // Key Header
        'Content-Length: ' . strlen($payload)
    ]
]);

// 5. Execute Request
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

if (curl_errno($ch)) {
    die('cURL Error: ' . curl_error($ch));
}
curl_close($ch);

// 6. Handle Response
$result = json_decode($response, true);

if ($httpCode === 200 && isset($result['success']) && $result['success']) {
    $data = $result['data'];
    
    echo "✅ Invoice Processed Successfully!\n";
    echo "----------------------------------\n";
    echo "Invoice #: " . ($data['invoice_number'] ?? 'N/A') . "\n";
    echo "Date:      " . ($data['date'] ?? 'N/A') . "\n";
    echo "Vendor:    " . ($data['merchant']['name'] ?? 'Unknown') . "\n";
    
    // Financials
    $fin = $data['financials'] ?? [];
    echo "Total:     " . ($fin['total_amount'] ?? 0) . " " . ($data['currency'] ?? '') . "\n";
    
    // Line Items
    echo "\n--- Line Items ---\n";
    if (!empty($data['line_items'])) {
        foreach ($data['line_items'] as $item) {
            echo "- " . str_pad($item['description'], 30) . ": " . $item['amount'] . "\n";
        }
    }
} else {
    echo "❌ Processing Failed (Status $httpCode)\n";
    if (isset($result['error'])) {
        echo "Error: " . $result['error'] . "\n";
        echo "Message: " . ($result['message'] ?? '') . "\n";
    } else {
        echo $response;
    }
}
?>

Technical Specs

•Latency: < 5s (Average)
•Uptime: 98.5% SLA
•Security: AES-256 Encryption & SOC2 Compliant
•Input: JPG, PNG, WebP (Base64 Encoded)
•Max File Size: 4.5MB
•Output: JSON (Nested Structure)

Key Features

•Table Extraction Engine: Accurately parses complex line items and tables without manual templating.
•Financial Validation: Cross-validates subtotals, taxes, and grand totals to ensure mathematical accuracy.
•Vendor Normalization: Automatically identifies merchants and extracts standardized tax IDs (VAT/EIN).

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "invoice",
    "invoice_number": "INV-2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-15",
    "currency": "USD",
    "merchant": {
      "name": "AWS Web Services",
      "address": "410 Terry Ave N, Seattle, WA",
      "tax_id": "EIN-12-3456789",
      "iban": null
    },
    "customer": {
      "name": "Acme Corp Inc.",
      "tax_id": "987654321"
    },
    "financials": {
      "subtotal": 100,
      "tax_amount": 10,
      "total_amount": 110
    },
    "line_items": [
      {
        "description": "EC2 Instance Usage",
        "quantity": 1,
        "unit_price": 80,
        "amount": 80
      },
      {
        "description": "S3 Storage",
        "quantity": 1,
        "unit_price": 20,
        "amount": 20
      }
    ]
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Unlike generic OCR services that return raw text coordinates, StructOCR's models are pre-trained specifically on millions of invoices. This allows us to return a structured JSON object with key-value pairs (e.g., `invoice_number`, `due_date`) and accurately parsed `line_items`, eliminating the need for complex post-processing logic.

Do you store the uploaded images?

No. We do not store customer data. Images are processed in-memory and are permanently deleted immediately after the API response is generated. Nothing is written to disk.

How do you handle blurry or skewed images?

Our API includes an automatic image pre-processing pipeline that applies de-skewing, noise reduction, and contrast enhancement algorithms before analysis, maximizing accuracy even on low-quality mobile phone captures.

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.

Get Your 20 Free Credits Test it now in the Playground

No credit card required • Full API access included

Direct HTTP API for High-Accuracy Invoice Parsing

Why Self-Managed Invoice OCR Fails

Enterprise-Grade Extraction with StructOCR

Production Use Cases

Live Demo: Invoice extractor

Implementation: Raw PHP (cURL)

Technical Specs

Key Features

Sample JSON Output

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Do you store the uploaded images?

How do you handle blurry or skewed images?

More OCR Tutorials

PHP Shipping Container OCR API

PHP Driver License OCR SDK

PHP HIN (Hull Identification Number) OCR API

PHP National ID OCR API

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.

Why Self-Managed Invoice OCR Fails

Enterprise-Grade Extraction with StructOCR

Production Use Cases

Live Demo: Invoice extractor

Implementation: Raw PHP (cURL)

Technical Specs

Key Features

Sample JSON Output

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Do you store the uploaded images?

How do you handle blurry or skewed images?

More OCR Tutorials

PHP Shipping Container OCR API

PHP Driver License OCR SDK

PHP HIN (Hull Identification Number) OCR API

PHP National ID OCR API

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.