Eliminate Passport Data Entry Errors with a Single API Call

Achieve 99.8%+ accuracy and sub-second response times for ICAO 9303 compliant MRZ and VIZ data extraction.

Steve HarringtonUpdated 2026-01-16
Diagram showing a passport photo being uploaded to the StructOCR API, which then processes the image, extracts key fields like name and passport number, and returns a structured JSON object.
Figure 1: StructOCR converts raw Passport images into validated JSON data.

Why Passport OCR is Difficult

Open-source tools like Tesseract fail on real-world passport scans due to common image defects like holographic glare, shadows, and non-standard lighting. Skew and rotation require complex pre-processing pipelines that are brittle and expensive to maintain. Furthermore, simply extracting text is insufficient. Parsing the ICAO 9303 compliant Machine Readable Zone (MRZ) involves validating complex checksums and character encodings. Manually building and maintaining RegEx patterns for dozens of international passport variations is a significant engineering overhead that rarely achieves the accuracy required for production systems, leading to costly manual review cycles.

Enterprise-Grade Extraction with StructOCR

StructOCR replaces fragile, multi-step OCR pipelines with a single, reliable API endpoint. Our pre-trained deep learning models are specialized for identity documents, significantly outperforming generic engines like Tesseract. The API handles image pre-processing automatically, including deskewing, denoising, and glare removal, before extraction. Instead of returning a wall of unstructured text, StructOCR delivers a predictable, standardized JSON object with validated fields, including correctly calculated MRZ checksums. This eliminates the need for complex post-processing logic on your end.

Production Use Cases

  • Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from Passports in < 2 seconds.
  • Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
  • Global Compliance: Handle Passports from 200+ jurisdictions without custom rules.

Implementation: Java (Standard HttpClient)

The following Java code uses the native `java.net.http.HttpClient` (Java 11+). It demonstrates how to handle `x-api-key` authentication and extract both MRZ and Visual Zone (VIZ) data without external dependencies.

Prerequisite: JDK 11+

CODE EXAMPLE
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Base64;

public class PassportOcrExample {

    // 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
    // 👉 https://structocr.com/register

    private static final String API_KEY = "YOUR_API_KEY_HERE";
    private static final String API_ENDPOINT = "https://api.structocr.com/v1/passport";

    public static void main(String[] args) {
        // Note: Supports JPG, PNG, WebP (Max 4.5MB)
        String imagePath = "passport.jpg";

        try {
            // 1. Validate File
            Path path = Path.of(imagePath);
            if (!Files.exists(path)) {
                System.err.println("Error: File not found at " + path.toAbsolutePath());
                return;
            }

            // 2. Read and Encode Image
            byte[] imageBytes = Files.readAllBytes(path);
            String base64Image = Base64.getEncoder().encodeToString(imageBytes);

            // 3. Construct JSON Payload (Dependency-free)
            // For production, use Jackson or Gson.
            String jsonPayload = "{\"img\": \"" + base64Image + "\"}";

            // 4. Create HttpClient
            HttpClient client = HttpClient.newBuilder()
                    .version(HttpClient.Version.HTTP_1_1)
                    .connectTimeout(Duration.ofSeconds(10))
                    .build();

            // 5. Build Request
            // Important: 'x-api-key' is required in the header
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(URI.create(API_ENDPOINT))
                    .header("Content-Type", "application/json")
                    .header("x-api-key", API_KEY)
                    .timeout(Duration.ofSeconds(30))
                    .POST(HttpRequest.BodyPublishers.ofString(jsonPayload))
                    .build();

            System.out.println("Scanning passport at " + API_ENDPOINT + "...");

            // 6. Send Request
            HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());

            // 7. Output Result
            if (response.statusCode() == 200) {
                System.out.println("✅ Extraction Successful!");
                // The response contains both MRZ data and Visual Zone (VIZ) data.
                // Parse the JSON to access unique fields like 'place_of_birth' or 'place_of_issue'.
                System.out.println(response.body());
            } else {
                System.err.println("❌ API Error: " + response.statusCode());
                System.err.println(response.body());
            }

        } catch (IOException | InterruptedException e) {
            System.err.println("Request failed: " + e.getMessage());
            Thread.currentThread().interrupt();
        }
    }
}

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (Base64 Encoded)
  • Max File Size: 4.5MB
  • Output: JSON (Structured Data)

Key Features

  • Visual Extraction (VIZ): Parses non-MRZ data fields like Place of Birth and Issuing Authority.
  • Global Support: Optimized for 195+ countries, handling complex backgrounds and holograms.
  • Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "passport",
    "country_code": "USA",
    "nationality": "UNITED STATES",
    "passport_number": "E12345678",
    "surname": "DOE",
    "given_names": "JOHN",
    "sex": "M",
    "date_of_birth": "1990-01-01",
    "place_of_birth": "NEW YORK, USA",
    "date_of_issue": "2020-01-01",
    "date_of_expiry": "2030-01-01",
    "place_of_issue": "PASSPORT AGENCY"
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Unlike general-purpose OCR services like AWS Textract or Google Vision that return raw text blocks, StructOCR is a specialized model. It is pre-trained on millions of identity documents to identify and return structured, labeled data fields (e.g., `date_of_birth`, `passport_number`), including validation of MRZ checksums. This eliminates the need for complex and brittle post-processing logic.

Do you store the uploaded images?

No. All image data is processed in-memory and is purged immediately following the API request. We do not persist PII on our systems, aligning with strict data privacy and compliance standards.

How to handle blurry images?

The API includes an integrated image enhancement engine that automatically attempts to deblur, denoise, and correct contrast on suboptimal images before the extraction process begins, maximizing the success rate on real-world captures.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included