The Definitive Python SDK for Passport Data Extraction
Achieve 99.8%+ accuracy and sub-second latency, converting passport images directly into structured JSON.

Why Passport OCR is Difficult
Standard open-source OCR engines like Tesseract fail on real-world passport scans due to inherent complexities. Image defects such as camera glare, shadows, and non-uniform lighting degrade character recognition. Geometric distortions, including skew and rotation, misalign text regions, breaking parsing logic. The Machine Readable Zone (MRZ) itself, while standardized by ICAO 9303, requires precise parsing and checksum validation—a non-trivial task. Manually validating MRZ check digits or maintaining brittle RegEx patterns for dozens of international passport variations creates a significant engineering overhead that is both costly and error-prone.
Enterprise-Grade Extraction with StructOCR
StructOCR replaces fragile, multi-step OCR pipelines with a single API call. Our solution is built on pre-trained deep learning models specifically optimized for identity documents, not generic text. The API includes automatic image pre-processing, handling deskewing, denoising, and glare correction before analysis. Unlike Tesseract, which returns unstructured text blocks requiring post-processing, StructOCR directly outputs a standardized JSON object with validated fields. This eliminates the need for manual parsing, checksum calculations, and format normalization.
Production Use Cases
- Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from Passports in < 2 seconds.
- Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
- Global Compliance: Handle Passports from 200+ jurisdictions without custom rules.
Implementation: Python SDK
The official Python SDK supports both MRZ and Visual Inspection Zone (VIZ) extraction. It automatically handles file encoding and parses fields like 'Place of Birth' that aren't available in the MRZ.
Prerequisite: pip install structocr
from structocr import StructOCR
# 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
# 👉 https://structocr.com/register
# Initialize with your API Key
client = StructOCR("YOUR_API_KEY_HERE")
def scan_passport():
# Note: Supports JPG, PNG, WebP (Max 4.5MB)
image_path = "passport.jpg"
try:
print(f"Scanning {image_path}...")
# The SDK handles file upload and API communication
result = client.scan_passport(image_path)
if result.get('success'):
data = result['data']
print("✅ Extraction Successful!")
# Identity Data
print(f"Passport #: {data.get('passport_number')}")
print(f"Name: {data.get('given_names')} {data.get('surname')}")
print(f"Nation: {data.get('nationality')} ({data.get('country_code')})")
# Visual Zone Specifics (Not in MRZ)
print(f"Birth Place:{data.get('place_of_birth')}")
print(f"Issued At: {data.get('place_of_issue')}")
# Dates
print(f"DOB: {data.get('date_of_birth')} ({data.get('sex')})")
print(f"Expiry: {data.get('date_of_expiry')}")
else:
print(f"❌ Extraction Failed: {result.get('error')}")
except Exception as e:
# Handle SDK or Network errors
print(f"An error occurred: {e}")
if __name__ == "__main__":
scan_passport()Technical Specs
- •Latency: < 5s (Average)
- •Uptime: 98.5% SLA
- •Security: AES-256 Encryption & SOC2 Compliant
- •Input: JPG, PNG, WebP (File Path)
- •Max File Size: 4.5MB
- •Output: JSON (Structured Data)
Key Features
- •Visual Extraction (VIZ): Parses non-MRZ data fields like Place of Birth and Issuing Authority.
- •Global Support: Optimized for 195+ countries, handling complex backgrounds and holograms.
- •Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.
Sample JSON Output
StructOCR returns a normalized JSON object, regardless of the input image angle or quality.
{
"success": true,
"data": {
"type": "passport",
"country_code": "USA",
"nationality": "UNITED STATES",
"passport_number": "E12345678",
"surname": "DOE",
"given_names": "JOHN",
"sex": "M",
"date_of_birth": "1990-01-01",
"place_of_birth": "NEW YORK, USA",
"date_of_issue": "2020-01-01",
"date_of_expiry": "2030-01-01",
"place_of_issue": "PASSPORT AGENCY"
}
}Frequently Asked Questions
How does StructOCR compare to AWS Textract or Google Vision?
General-purpose OCR services like AWS Textract and Google Vision return raw, unstructured lines of text that require significant post-processing. StructOCR is a specialized API, pre-trained on identity documents. It directly returns a structured JSON object with validated fields like `surname` and `date_of_birth`, eliminating the need for any parsing on your end.
Do you store the uploaded images?
No. We do not store any customer data. Images are processed in-memory and are permanently deleted immediately after the OCR process is complete. Your data never touches persistent storage on our servers.
How to handle blurry images?
Our API includes a sophisticated image pre-processing engine that automatically enhances images before extraction. This includes deblurring, denoising, and contrast correction, allowing us to achieve high accuracy even on suboptimal images from mobile phone cameras.
More OCR Tutorials
Python Driver's License OCR API
Extract driver's license data with our high-accuracy Python SDK. Get structured JSON output in seconds, eliminating manual entry and Tesseract errors.
Python Invoice OCR API
High-accuracy Invoice OCR API for Python. Get structured JSON output with line items, totals, and merchant data. Eliminate Tesseract errors with our Python SDK.
Python National ID OCR API
High-accuracy National ID OCR for Python. Get structured JSON output via our dedicated Python SDK. Automate KYC and data entry with 99%+ accuracy.
Python VIN (Vehicle Identification Number) OCR API
Tutorial: How to use the StructOCR Python SDK to extract data from VIN (Vehicle Identification Number)s. Includes code samples and JSON schema.
Precise Data Extraction and Seamless
Integration with AI-powered OCR API.
Empower your solutions with automated data extraction by
integrating best-in class StructOCR via API seamlessly.
No credit card required • Full API access included