The Definitive Python SDK for Passport Data Extraction
Achieve 99.8%+ accuracy and sub-second latency, converting passport images directly into structured JSON.

Why Passport OCR is Difficult
Standard open-source OCR engines like Tesseract fail on real-world passport scans due to inherent complexities. Image defects such as camera glare, shadows, and non-uniform lighting degrade character recognition. Geometric distortions, including skew and rotation, misalign text regions, breaking parsing logic. The Machine Readable Zone (MRZ) itself, while standardized by ICAO 9303, requires precise parsing and checksum validation—a non-trivial task. Manually validating MRZ check digits or maintaining brittle RegEx patterns for dozens of international passport variations creates a significant engineering overhead that is both costly and error-prone.
Enterprise-Grade Extraction with StructOCR
StructOCR simplifies fragile, multi-step OCR pipelines into a single API call, ideal for processing identity documents. Our solution leverages pre-trained deep learning models optimized for specific document types, unlike generic OCR engines. The passport mrz ocr api automatically handles image pre-processing, including deskewing, denoising, and glare correction. Unlike Tesseract, which outputs unstructured text, StructOCR provides a standardized JSON with validated fields, eliminating manual parsing and format normalization. This robust solution is designed to support operations across international borders.
Production Use Cases
- Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from Passports in < 2 seconds.
- Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
- Global Compliance: Handle Passports from 200+ jurisdictions without custom rules.
Live Demo: Passport scanner
No registration required. Upload a file to test the extraction.
Drop files here or click to browse
JPG · PNG · WebP · up to 500 files · max 4.5 MB each
Ready to use this in production? Get 20 free API calls — no credit card needed.
Get 20 Free API Calls →Implementation: Python SDK
The official Python SDK supports both MRZ and Visual Inspection Zone (VIZ) extraction. It automatically handles file encoding and parses fields like 'Place of Birth' that aren't available in the MRZ.
Prerequisite: pip install structocr
from structocr import StructOCR
# 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
# 👉 https://structocr.com/register
# Initialize with your API Key
client = StructOCR("YOUR_API_KEY_HERE")
def scan_passport():
# Note: Supports JPG, PNG, WebP (Max 4.5MB)
image_path = "passport.jpg"
try:
print(f"Scanning {image_path}...")
# The SDK handles file upload and API communication
result = client.scan_passport(image_path)
if result.get('success'):
data = result['data']
print("✅ Extraction Successful!")
# Identity Data
print(f"Passport #: {data.get('passport_number')}")
print(f"Name: {data.get('given_names')} {data.get('surname')}")
print(f"Nation: {data.get('nationality')} ({data.get('country_code')})")
# Visual Zone Specifics (Not in MRZ)
print(f"Birth Place:{data.get('place_of_birth')}")
print(f"Issued At: {data.get('place_of_issue')}")
# Dates
print(f"DOB: {data.get('date_of_birth')} ({data.get('sex')})")
print(f"Expiry: {data.get('date_of_expiry')}")
else:
print(f"❌ Extraction Failed: {result.get('error')}")
except Exception as e:
# Handle SDK or Network errors
print(f"An error occurred: {e}")
if __name__ == "__main__":
scan_passport()Technical Specs
- •Latency: < 5s (Average)
- •Uptime: 98.5% SLA
- •Security: AES-256 Encryption & SOC2 Compliant
- •Input: JPG, PNG, WebP (File Path)
- •Max File Size: 4.5MB
- •Output: JSON (Structured Data)
Key Features
- •Visual Extraction (VIZ): Parses non-MRZ data fields like Place of Birth and Issuing Authority.
- •Global Support: Optimized for 195+ countries, handling complex backgrounds and holograms.
- •Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.
Sample JSON Output
StructOCR returns a normalized JSON object, regardless of the input image angle or quality.
{
"success": true,
"data": {
"type": "passport",
"country_code": "USA",
"nationality": "UNITED STATES",
"passport_number": "E12345678",
"surname": "DOE",
"given_names": "JOHN",
"sex": "M",
"date_of_birth": "1990-01-01",
"place_of_birth": "NEW YORK, USA",
"date_of_issue": "2020-01-01",
"date_of_expiry": "2030-01-01",
"place_of_issue": "PASSPORT AGENCY"
}
}Frequently Asked Questions
How does StructOCR compare to AWS Textract or Google Vision?
General-purpose OCR services like AWS Textract and Google Vision return raw, unstructured lines of text that require significant post-processing. StructOCR is a specialized API, pre-trained on identity documents. It directly returns a structured JSON object with validated fields like `surname` and `date_of_birth`, eliminating the need for any parsing on your end.
Do you store the uploaded images?
No. We do not store any customer data. Images are processed in-memory and are permanently deleted immediately after the OCR process is complete. Your data never touches persistent storage on our servers.
How to handle blurry images?
Our API includes a sophisticated image pre-processing engine that automatically enhances images before extraction. This includes deblurring, denoising, and contrast correction, allowing us to achieve high accuracy even on suboptimal images from mobile phone cameras.
More OCR Tutorials
Python Shipping Container OCR API
Tutorial: Learn how to use the StructOCR Python SDK for shipping container OCR. Extract ISO 6346 container numbers with 99% accuracy. Includes code samples and JSON schemas.
Python Driver License OCR SDK & API
Stop struggling with manual data entry. Integrate our driver license OCR SDK (Python wrapper) to extract structured JSON in <1s. Secure, accurate, and developer-friendly.
Python HIN (Hull Identification Number) OCR API SDK
Tutorial: Extract Hull Identification Numbers (HIN) using the StructOCR Python SDK. Perfect for marine data pipelines, ETL workflows, and automated watercraft valuations.
Python Invoice Line Item OCR API
Struggling with invoice line item extraction? Our Python OCR API delivers structured JSON in under 5s, ensuring 98.5% uptime and SOC2 compliance. Secure your data.
Precise Data Extraction and Seamless
Integration with AI-powered OCR API.
Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.
No credit card required • Full API access included