Direct PHP National ID OCR API: Structured Data from Raw Images

Achieve 99.8% extraction accuracy and sub-1.5 second latency with a single HTTP POST request.

Steve HarringtonUpdated 2026-01-16
A process diagram showing a raw image of a National ID card being sent to the StructOCR API, which then returns a clean, structured JSON object with all the extracted fields.
Figure 1: StructOCR converts raw National ID images into validated JSON data.

Why National ID OCR is Difficult

Generic OCR tools like Tesseract fail on real-world ID documents due to complexities they are not designed for. Physical cards suffer from photographic glare, shadows, and inconsistent lighting that corrupt character data. Users submit images with skew and rotation, requiring geometric correction before processing. Furthermore, simple text extraction is insufficient; data like MRZ (Machine-Readable Zone) lines contain checksums that must be algorithmically validated, not just read. Maintaining a library of RegEx patterns to parse varying ID formats across jurisdictions is a significant and brittle engineering overhead. These factors combine to make in-house ID processing a low-accuracy, high-maintenance liability.

Enterprise-Grade Extraction with StructOCR

StructOCR bypasses the fragility of open-source solutions by using pre-trained Deep Learning models specialized for identity documents. Our API doesn't just read text; it understands the structure of an ID card. Upon receiving an image, our system performs automatic pre-processing, including deskewing, glare removal, and denoising to normalize the input. The cleaned image is then processed by our models to extract specific fields, which are cross-validated and returned in a standardized JSON output. This eliminates the need for manual parsing, RegEx maintenance, and complex image processing pipelines, providing a reliable, single-step solution that consistently outperforms generic OCR engines.

Production Use Cases

  • Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from National IDs in < 2 seconds.
  • Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
  • Global Compliance: Handle National IDs from 200+ jurisdictions without custom rules.

Implementation: Raw PHP (cURL)

The following PHP code demonstrates a complete flow using cURL. It handles image encoding, sets the required 'x-api-key' header, and parses region-specific fields (CNP, CPF, Address).

Prerequisite: PHP 7.4+ with cURL extension

CODE EXAMPLE
<?php

// 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
// 👉 https://structocr.com/register

$apiKey = 'YOUR_API_KEY_HERE';
$apiUrl = 'https://api.structocr.com/v1/national-id';
$imagePath = 'id_card.jpg'; // Supports JPG, PNG, WebP

// 1. Validate File
if (!file_exists($imagePath)) {
    die("Error: File not found at $imagePath");
}

// 2. Encode Image to Base64
$imageData = file_get_contents($imagePath);
$base64Image = base64_encode($imageData);

// 3. Prepare Payload
$payload = json_encode(['img' => $base64Image]);

// 4. Initialize cURL
$ch = curl_init();
curl_setopt_array($ch, [
    CURLOPT_URL => $apiUrl,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $payload,
    CURLOPT_HTTPHEADER => [
        'Content-Type: application/json',
        'x-api-key: ' . $apiKey, // Required Header
        'Content-Length: ' . strlen($payload)
    ]
]);

// 5. Execute Request
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

if (curl_errno($ch)) {
    die('cURL Error: ' . curl_error($ch));
}
curl_close($ch);

// 6. Handle Response
$result = json_decode($response, true);

if ($httpCode === 200 && isset($result['success']) && $result['success']) {
    $data = $result['data'];
    
    echo "✅ National ID Processed!\n";
    echo "--------------------------\n";
    echo "Region:     " . ($data['country_code'] ?? 'N/A') . "\n";
    echo "Name:       " . ($data['given_names'] ?? '') . " " . ($data['surname'] ?? '') . "\n";
    echo "Doc Number: " . ($data['document_number'] ?? 'N/A') . " (Series: " . ($data['card_series'] ?? 'N/A') . ")\n";
    
    // Critical for verification (CNP, CPF, NIN)
    echo "Personal #: " . ($data['personal_number'] ?? 'N/A') . "\n";
    
    echo "DOB:        " . ($data['date_of_birth'] ?? 'N/A') . "\n";
    echo "Address:    " . ($data['address'] ?? 'N/A') . "\n";
} else {
    echo "❌ Processing Failed (Status $httpCode)\n";
    if (isset($result['error'])) {
        echo "Error: " . $result['error'] . "\n";
    } else {
        echo $response;
    }
}
?>

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (Base64 Encoded)
  • Max File Size: 4.5MB
  • Output: JSON (Structured Data)

Key Features

  • Specialized Numbers: Extracts region-specific IDs like CNP (Romania), CPF (Brazil), and NIN (Nigeria).
  • Multi-line Addresses: Intelligently reconstructs full addresses from fragmented lines on ID cards.
  • Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality. Field names are consistent across all supported document types.

{
  "success": true,
  "data": {
    "type": "national_id",
    "country_code": "ROU",
    "nationality": "ROMANA",
    "document_number": "123456",
    "card_series": "KS",
    "personal_number": "1920319123456",
    "surname": "POPESCU",
    "given_names": "ANDREI",
    "sex": "M",
    "date_of_birth": "1992-03-19",
    "place_of_birth": "Jud. CS Mun. Reșița",
    "address": "Jud. CS Orș. Bocșa Str. Nucilor Nr. 15",
    "date_of_issue": "2020-05-10",
    "date_of_expiry": "2030-05-10",
    "issuing_authority": "SPCLEP Bocșa"
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

General-purpose OCR services like Textract or Vision return raw lines of text and their coordinates. You are still responsible for parsing, validating, and structuring that data. StructOCR is a specialized, vertical solution that performs these steps for you, returning a clean JSON object with labeled fields like `surname` and `date_of_birth` directly.

Do you store the uploaded images?

No. Images are processed entirely in-memory and are permanently deleted immediately after the extraction request is completed. We do not persist sensitive customer data on our servers.

How do you handle blurry or low-quality images?

Our API pipeline includes an automatic image enhancement engine that applies denoising, deblurring, and contrast correction before analysis. This significantly improves extraction accuracy on sub-optimal images captured by mobile devices.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included