The Definitive Python SDK for Passport Data Extraction

Q: How does StructOCR compare to AWS Textract or Google Vision?

General-purpose OCR services like AWS Textract and Google Vision return raw, unstructured lines of text that require significant post-processing. StructOCR is a specialized API, pre-trained on identity documents. It directly returns a structured JSON object with validated fields like `surname` and `date_of_birth`, eliminating the need for any parsing on your end.

Q: Do you store the uploaded images?

No. We do not store any customer data. Images are processed in-memory and are permanently deleted immediately after the OCR process is complete. Your data never touches persistent storage on our servers.

Q: How to handle blurry images?

Our API includes a sophisticated image pre-processing engine that automatically enhances images before extraction. This includes deblurring, denoising, and contrast correction, allowing us to achieve high accuracy even on suboptimal images from mobile phone cameras.

Achieve 99.8%+ accuracy and sub-second latency, converting passport images directly into structured JSON.

Steve Harrington | Solutions Consultant•Updated 2026-04-12

↓ Try Free — Upload Your Image Get 20 Free API Calls — No Credit Card

A diagram showing a passport image on the left, an arrow pointing to the StructOCR API processing icon in the middle, and structured JSON data output on the right. — Figure 1: StructOCR converts raw Passport images into validated JSON data.

Why Passport OCR is Difficult

Standard open-source OCR engines like Tesseract fail on real-world passport scans due to inherent complexities. Image defects such as camera glare, shadows, and non-uniform lighting degrade character recognition. Geometric distortions, including skew and rotation, misalign text regions, breaking parsing logic. The Machine Readable Zone (MRZ) itself, while standardized by ICAO 9303, requires precise parsing and checksum validation—a non-trivial task. Manually validating MRZ check digits or maintaining brittle RegEx patterns for dozens of international passport variations creates a significant engineering overhead that is both costly and error-prone.

Enterprise-Grade Extraction with StructOCR

StructOCR simplifies fragile, multi-step OCR pipelines into a single API call, ideal for processing identity documents. Our solution leverages pre-trained deep learning models optimized for specific document types, unlike generic OCR engines. The passport mrz ocr api automatically handles image pre-processing, including deskewing, denoising, and glare correction. Unlike Tesseract, which outputs unstructured text, StructOCR provides a standardized JSON with validated fields, eliminating manual parsing and format normalization. This robust solution is designed to support operations across international borders.

Production Use Cases

Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from Passports in < 2 seconds.
Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
Global Compliance: Handle Passports from 200+ jurisdictions without custom rules.

Live Demo: Passport scanner

No registration required. Upload a file to test the extraction.

Upload

Results

↑

Drop files here or click to browse

JPG · PNG · WebP · up to 500 files · max 4.5 MB each

No files selected

Ready to use this in production? Get 20 free API calls — no credit card needed.

Get 20 Free API Calls →

Implementation: Python SDK

The official Python SDK supports both MRZ and Visual Inspection Zone (VIZ) extraction. It automatically handles file encoding and parses fields like 'Place of Birth' that aren't available in the MRZ.

Prerequisite: pip install structocr

CODE EXAMPLE

from structocr import StructOCR

# 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
# 👉 https://structocr.com/register
# Initialize with your API Key
client = StructOCR("YOUR_API_KEY_HERE")

def scan_passport():
    # Note: Supports JPG, PNG, WebP (Max 4.5MB)
    image_path = "passport.jpg"

    try:
        print(f"Scanning {image_path}...")
        
        # The SDK handles file upload and API communication
        result = client.scan_passport(image_path)

        if result.get('success'):
            data = result['data']
            print("✅ Extraction Successful!")
            
            # Identity Data
            print(f"Passport #: {data.get('passport_number')}")
            print(f"Name:       {data.get('given_names')} {data.get('surname')}")
            print(f"Nation:     {data.get('nationality')} ({data.get('country_code')})")
            
            # Visual Zone Specifics (Not in MRZ)
            print(f"Birth Place:{data.get('place_of_birth')}")
            print(f"Issued At:  {data.get('place_of_issue')}")
            
            # Dates
            print(f"DOB:        {data.get('date_of_birth')} ({data.get('sex')})")
            print(f"Expiry:     {data.get('date_of_expiry')}")
        else:
            print(f"❌ Extraction Failed: {result.get('error')}")

    except Exception as e:
        # Handle SDK or Network errors
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    scan_passport()

Technical Specs

•Latency: < 5s (Average)
•Uptime: 98.5% SLA
•Security: AES-256 Encryption & SOC2 Compliant
•Input: JPG, PNG, WebP (File Path)
•Max File Size: 4.5MB
•Output: JSON (Structured Data)

Key Features

•Visual Extraction (VIZ): Parses non-MRZ data fields like Place of Birth and Issuing Authority.
•Global Support: Optimized for 195+ countries, handling complex backgrounds and holograms.
•Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "passport",
    "country_code": "USA",
    "nationality": "UNITED STATES",
    "passport_number": "E12345678",
    "surname": "DOE",
    "given_names": "JOHN",
    "sex": "M",
    "date_of_birth": "1990-01-01",
    "place_of_birth": "NEW YORK, USA",
    "date_of_issue": "2020-01-01",
    "date_of_expiry": "2030-01-01",
    "place_of_issue": "PASSPORT AGENCY"
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

General-purpose OCR services like AWS Textract and Google Vision return raw, unstructured lines of text that require significant post-processing. StructOCR is a specialized API, pre-trained on identity documents. It directly returns a structured JSON object with validated fields like `surname` and `date_of_birth`, eliminating the need for any parsing on your end.

Do you store the uploaded images?

No. We do not store any customer data. Images are processed in-memory and are permanently deleted immediately after the OCR process is complete. Your data never touches persistent storage on our servers.

How to handle blurry images?

Our API includes a sophisticated image pre-processing engine that automatically enhances images before extraction. This includes deblurring, denoising, and contrast correction, allowing us to achieve high accuracy even on suboptimal images from mobile phone cameras.

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.

Get Your 20 Free Credits Test it now in the Playground

No credit card required • Full API access included

The Definitive Python SDK for Passport Data Extraction

Why Passport OCR is Difficult

Enterprise-Grade Extraction with StructOCR

Production Use Cases

Live Demo: Passport scanner

Implementation: Python SDK

Technical Specs

Key Features

Sample JSON Output

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Do you store the uploaded images?

How to handle blurry images?

More OCR Tutorials

Python Shipping Container OCR API

Python Driver License OCR SDK & API

Python HIN (Hull Identification Number) OCR API SDK

Python Invoice Line Item OCR API

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.

Why Passport OCR is Difficult

Enterprise-Grade Extraction with StructOCR

Production Use Cases

Live Demo: Passport scanner

Implementation: Python SDK

Technical Specs

Key Features

Sample JSON Output

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

Do you store the uploaded images?

How to handle blurry images?

More OCR Tutorials

Python Shipping Container OCR API

Python Driver License OCR SDK & API

Python HIN (Hull Identification Number) OCR API SDK

Python Invoice Line Item OCR API

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Precise Data Extraction and Seamless
Integration with AI-powered OCR API.