Direct HTTP API for High-Accuracy Invoice Parsing
Achieve 99.8%+ accuracy on line-item extraction with sub-second latency via a simple POST request.

Why Self-Managed Invoice OCR Fails
Building reliable invoice OCR is more than a Tesseract wrapper. Open-source tools fail on real-world documents due to variable vendor templates, multi-page tables, and skewed or noisy scans. Maintaining a library of RegEx patterns for each vendor is brittle and unscalable. Developers are forced to build complex post-processing logic to correct errors, parse line items, and validate totals, resulting in high maintenance overhead and persistent accuracy issues that defeat the purpose of automation.
Enterprise-Grade Extraction with StructOCR
StructOCR utilizes pre-trained Deep Learning models that understand the semantic structure of an invoice, not just the text. Our API pipeline includes automatic image pre-processing (deskewing, denoising, contrast enhancement) to maximize quality before extraction. Unlike Tesseract which returns an unstructured text dump, StructOCR provides a standardized JSON output with explicitly typed fields, validated financials, and precisely parsed line items, eliminating the need for custom parsing logic on your end.
Production Use Cases
- Accounts Payable Automation: Eliminate manual data entry. Ingest vendor invoices directly into your ERP or accounting software with 99.8%+ field accuracy.
- Expense Management Systems: Enable real-time expense approval by allowing users to snap a photo of a receipt and have line items extracted instantly.
- Three-Way Matching: Automate the matching of purchase orders to invoices by extracting key fields like PO number, line items, and totals.
Implementation: Raw PHP (cURL)
The following PHP example demonstrates a complete extraction flow using cURL. It correctly handles image encoding, sets the 'x-api-key' header, and parses the nested invoice JSON data.
Prerequisite: PHP 7.4+ with cURL extension
<?php
// 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
// 👉 https://structocr.com/register
$apiKey = 'YOUR_API_KEY_HERE';
$apiUrl = 'https://api.structocr.com/v1/invoice';
$imagePath = 'invoice.jpg';
// 1. Validate File
if (!file_exists($imagePath)) {
die("Error: File not found at $imagePath");
}
// 2. Encode Image to Base64
$imageData = file_get_contents($imagePath);
$base64Image = base64_encode($imageData);
// 3. Prepare Payload
$payload = json_encode(['img' => $base64Image]);
// 4. Initialize cURL
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $apiUrl,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $payload,
CURLOPT_HTTPHEADER => [
'Content-Type: application/json',
'x-api-key: ' . $apiKey, // Key Header
'Content-Length: ' . strlen($payload)
]
]);
// 5. Execute Request
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_errno($ch)) {
die('cURL Error: ' . curl_error($ch));
}
curl_close($ch);
// 6. Handle Response
$result = json_decode($response, true);
if ($httpCode === 200 && isset($result['success']) && $result['success']) {
$data = $result['data'];
echo "✅ Invoice Processed Successfully!\n";
echo "----------------------------------\n";
echo "Invoice #: " . ($data['invoice_number'] ?? 'N/A') . "\n";
echo "Date: " . ($data['date'] ?? 'N/A') . "\n";
echo "Vendor: " . ($data['merchant']['name'] ?? 'Unknown') . "\n";
// Financials
$fin = $data['financials'] ?? [];
echo "Total: " . ($fin['total_amount'] ?? 0) . " " . ($data['currency'] ?? '') . "\n";
// Line Items
echo "\n--- Line Items ---\n";
if (!empty($data['line_items'])) {
foreach ($data['line_items'] as $item) {
echo "- " . str_pad($item['description'], 30) . ": " . $item['amount'] . "\n";
}
}
} else {
echo "❌ Processing Failed (Status $httpCode)\n";
if (isset($result['error'])) {
echo "Error: " . $result['error'] . "\n";
echo "Message: " . ($result['message'] ?? '') . "\n";
} else {
echo $response;
}
}
?>Technical Specs
- •Latency: < 5s (Average)
- •Uptime: 98.5% SLA
- •Security: AES-256 Encryption & SOC2 Compliant
- •Input: JPG, PNG, WebP (Base64 Encoded)
- •Max File Size: 4.5MB
- •Output: JSON (Nested Structure)
Key Features
- •Table Extraction Engine: Accurately parses complex line items and tables without manual templating.
- •Financial Validation: Cross-validates subtotals, taxes, and grand totals to ensure mathematical accuracy.
- •Vendor Normalization: Automatically identifies merchants and extracts standardized tax IDs (VAT/EIN).
Sample JSON Output
StructOCR returns a normalized JSON object, regardless of the input image angle or quality.
{
"success": true,
"data": {
"type": "invoice",
"invoice_number": "INV-2026-001",
"date": "2026-01-15",
"due_date": "2026-02-15",
"currency": "USD",
"merchant": {
"name": "AWS Web Services",
"address": "410 Terry Ave N, Seattle, WA",
"tax_id": "EIN-12-3456789",
"iban": null
},
"customer": {
"name": "Acme Corp Inc.",
"tax_id": "987654321"
},
"financials": {
"subtotal": 100,
"tax_amount": 10,
"total_amount": 110
},
"line_items": [
{
"description": "EC2 Instance Usage",
"quantity": 1,
"unit_price": 80,
"amount": 80
},
{
"description": "S3 Storage",
"quantity": 1,
"unit_price": 20,
"amount": 20
}
]
}
}Frequently Asked Questions
How does StructOCR compare to AWS Textract or Google Vision?
Unlike generic OCR services that return raw text coordinates, StructOCR's models are pre-trained specifically on millions of invoices. This allows us to return a structured JSON object with key-value pairs (e.g., `invoice_number`, `due_date`) and accurately parsed `line_items`, eliminating the need for complex post-processing logic.
Do you store the uploaded images?
No. We do not store customer data. Images are processed in-memory and are permanently deleted immediately after the API response is generated. Nothing is written to disk.
How do you handle blurry or skewed images?
Our API includes an automatic image pre-processing pipeline that applies de-skewing, noise reduction, and contrast enhancement algorithms before analysis, maximizing accuracy even on low-quality mobile phone captures.
More OCR Tutorials
PHP Driver's License OCR API
High-accuracy PHP API for Driver's License OCR. Parse PDF417 barcodes and extract data directly to a standardized JSON output. Stop fighting Tesseract.
PHP National ID OCR API
High-accuracy National ID OCR for PHP. Get structured JSON output from ID card images. A superior alternative to Tesseract. No complex PHP SDK needed.
PHP Passport OCR API
High accuracy PHP Passport OCR API for parsing ICAO 9303 documents. Get standardized JSON output from any passport image without complex RegEx.
PHP VIN (Vehicle Identification Number) OCR API
Tutorial: How to use the StructOCR PHP Client to extract data from VIN (Vehicle Identification Number)s. Includes code samples and JSON schema.
Precise Data Extraction and Seamless
Integration with AI-powered OCR API.
Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.
No credit card required • Full API access included