Direct HTTP API for High-Accuracy Invoice Parsing
Achieve 99.8%+ accuracy on line-item extraction with sub-second latency via a simple POST request.

Why Self-Managed Invoice OCR Fails
Building reliable invoice OCR is more than a Tesseract wrapper. Open-source tools fail on real-world documents due to variable vendor templates, multi-page tables, and skewed or noisy scans. Maintaining a library of RegEx patterns for each vendor is brittle and unscalable. Developers are forced to build complex post-processing logic to correct errors, parse line items, and validate totals, resulting in high maintenance overhead and persistent accuracy issues that defeat the purpose of automation.
Enterprise-Grade Extraction with StructOCR
StructOCR leverages pre-trained Deep Learning models designed to comprehend the semantic structure of an invoice, extending beyond simple text recognition. Our invoice ocr api pipeline incorporates automatic image pre-processing techniques such as deskewing, denoising, and contrast enhancement to optimize quality prior to data extraction. In contrast to solutions like Tesseract that output unstructured text, StructOCR delivers a standardized JSON format. This output features explicitly typed fields, validated financial data, and precisely parsed line items, significantly simplifying finance workflows and removing the necessity for bespoke parsing logic.
Production Use Cases
- Accounts Payable Automation: Eliminate manual data entry. Ingest vendor invoices directly into your ERP or accounting software with 99.8%+ field accuracy.
- Expense Management Systems: Enable real-time expense approval by allowing users to snap a photo of a receipt and have line items extracted instantly.
- Three-Way Matching: Automate the matching of purchase orders to invoices by extracting key fields like PO number, line items, and totals.
Implementation: Raw PHP (cURL)
The following PHP example demonstrates a complete extraction flow using cURL. It correctly handles image encoding, sets the 'x-api-key' header, and parses the nested invoice JSON data.
Prerequisite: PHP 7.4+ with cURL extension
<?php
// 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
// 👉 https://structocr.com/register
$apiKey = 'YOUR_API_KEY_HERE';
$apiUrl = 'https://api.structocr.com/v1/invoice';
$imagePath = 'invoice.jpg';
// 1. Validate File
if (!file_exists($imagePath)) {
die("Error: File not found at $imagePath");
}
// 2. Encode Image to Base64
$imageData = file_get_contents($imagePath);
$base64Image = base64_encode($imageData);
// 3. Prepare Payload
$payload = json_encode(['img' => $base64Image]);
// 4. Initialize cURL
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $apiUrl,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $payload,
CURLOPT_HTTPHEADER => [
'Content-Type: application/json',
'x-api-key: ' . $apiKey, // Key Header
'Content-Length: ' . strlen($payload)
]
]);
// 5. Execute Request
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_errno($ch)) {
die('cURL Error: ' . curl_error($ch));
}
curl_close($ch);
// 6. Handle Response
$result = json_decode($response, true);
if ($httpCode === 200 && isset($result['success']) && $result['success']) {
$data = $result['data'];
echo "✅ Invoice Processed Successfully!\n";
echo "----------------------------------\n";
echo "Invoice #: " . ($data['invoice_number'] ?? 'N/A') . "\n";
echo "Date: " . ($data['date'] ?? 'N/A') . "\n";
echo "Vendor: " . ($data['merchant']['name'] ?? 'Unknown') . "\n";
// Financials
$fin = $data['financials'] ?? [];
echo "Total: " . ($fin['total_amount'] ?? 0) . " " . ($data['currency'] ?? '') . "\n";
// Line Items
echo "\n--- Line Items ---\n";
if (!empty($data['line_items'])) {
foreach ($data['line_items'] as $item) {
echo "- " . str_pad($item['description'], 30) . ": " . $item['amount'] . "\n";
}
}
} else {
echo "❌ Processing Failed (Status $httpCode)\n";
if (isset($result['error'])) {
echo "Error: " . $result['error'] . "\n";
echo "Message: " . ($result['message'] ?? '') . "\n";
} else {
echo $response;
}
}
?>Technical Specs
- •Latency: < 5s (Average)
- •Uptime: 98.5% SLA
- •Security: AES-256 Encryption & SOC2 Compliant
- •Input: JPG, PNG, WebP (Base64 Encoded)
- •Max File Size: 4.5MB
- •Output: JSON (Nested Structure)
Key Features
- •Table Extraction Engine: Accurately parses complex line items and tables without manual templating.
- •Financial Validation: Cross-validates subtotals, taxes, and grand totals to ensure mathematical accuracy.
- •Vendor Normalization: Automatically identifies merchants and extracts standardized tax IDs (VAT/EIN).
Sample JSON Output
StructOCR returns a normalized JSON object, regardless of the input image angle or quality.
{
"success": true,
"data": {
"type": "invoice",
"invoice_number": "INV-2026-001",
"date": "2026-01-15",
"due_date": "2026-02-15",
"currency": "USD",
"merchant": {
"name": "AWS Web Services",
"address": "410 Terry Ave N, Seattle, WA",
"tax_id": "EIN-12-3456789",
"iban": null
},
"customer": {
"name": "Acme Corp Inc.",
"tax_id": "987654321"
},
"financials": {
"subtotal": 100,
"tax_amount": 10,
"total_amount": 110
},
"line_items": [
{
"description": "EC2 Instance Usage",
"quantity": 1,
"unit_price": 80,
"amount": 80
},
{
"description": "S3 Storage",
"quantity": 1,
"unit_price": 20,
"amount": 20
}
]
}
}Frequently Asked Questions
How does StructOCR compare to AWS Textract or Google Vision?
Unlike generic OCR services that return raw text coordinates, StructOCR's models are pre-trained specifically on millions of invoices. This allows us to return a structured JSON object with key-value pairs (e.g., `invoice_number`, `due_date`) and accurately parsed `line_items`, eliminating the need for complex post-processing logic.
Do you store the uploaded images?
No. We do not store customer data. Images are processed in-memory and are permanently deleted immediately after the API response is generated. Nothing is written to disk.
How do you handle blurry or skewed images?
Our API includes an automatic image pre-processing pipeline that applies de-skewing, noise reduction, and contrast enhancement algorithms before analysis, maximizing accuracy even on low-quality mobile phone captures.
More OCR Tutorials
PHP Shipping Container OCR API
Tutorial: Learn how to use the StructOCR PHP Client via cURL for shipping container OCR. Extract ISO 6346 container numbers with 99% accuracy. Includes code samples and JSON schemas.
PHP Driver License OCR SDK
Struggling with manual driver's license data entry? Our PHP OCR SDK provides AES-256 encrypted, SOC2 compliant data extraction in <5s, ensuring 98.5% uptime.
PHP HIN (Hull Identification Number) OCR API
Tutorial: How to use the StructOCR PHP API to extract Hull Identification Numbers (HIN) from images. Includes native cURL code samples and marine OCR solutions.
PHP National ID OCR API
High-accuracy National ID OCR for PHP. Get structured JSON output from ID card images. A superior alternative to Tesseract. No complex PHP SDK needed.
Precise Data Extraction and Seamless
Integration with AI-powered OCR API.
Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.
No credit card required • Full API access included