Python SDK for Enterprise-Grade National ID OCR

Achieve 99.8% field-level accuracy and sub-second latency for identity verification and KYC automation.

Steve HarringtonUpdated 2026-01-16
Diagram showing the StructOCR process: A user uploads a photo of a National ID card, which is then processed by the StructOCR API. The API performs image preprocessing, data extraction, and validation, outputting a structured JSON object with key-value pairs like name, DOB, and document number.
Figure 1: StructOCR converts raw National ID images into validated JSON data.

Why National ID OCR is Difficult

Generic OCR engines like Tesseract fail on National IDs due to inherent complexities. Laminate glare, shadows, and non-uniform lighting create artifacts that corrupt character recognition. User-submitted images often suffer from significant skew and rotation, requiring robust preprocessing. Furthermore, parsing extracted text is a brittle process. It involves maintaining complex RegEx patterns for dozens of ID layouts, which constantly change. Manually implementing logic to parse the Machine-Readable Zone (MRZ) and validate its check digits is error-prone and adds significant engineering overhead. These challenges lead to high error rates and unsustainable maintenance costs for in-house solutions.

Enterprise-Grade Extraction with StructOCR

StructOCR bypasses these challenges using specialized, pre-trained deep learning models. Our API handles the entire pipeline, from automatic image pre-processing—including perspective correction, glare removal, and denoising—to data extraction. Unlike Tesseract, which returns unstructured text lines, StructOCR's models are trained specifically on identity documents to locate and identify semantic fields. The result is a standardized, validated JSON object, eliminating the need for post-processing or manual data validation, delivering production-ready data directly to your application.

Production Use Cases

  • Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from National IDs in < 2 seconds.
  • Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
  • Global Compliance: Handle National IDs from 200+ jurisdictions without custom rules.

Implementation: Python SDK

The official Python SDK abstracts the API complexity. It automatically parses region-specific fields like CNP (Romania), CPF (Brazil), or NIN (Nigeria) into a standardized structure.

Prerequisite: pip install structocr

CODE EXAMPLE
from structocr import StructOCR

# 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
# 👉 https://structocr.com/register
# Initialize with your API Key
client = StructOCR("YOUR_API_KEY_HERE")

def scan_national_id():
    # Note: Supports JPG, PNG, WebP (Max 4.5MB)
    image_path = "id_card.jpg"

    try:
        print(f"Scanning {image_path}...")
        
        # The SDK handles file upload and API communication
        result = client.scan_national_id(image_path)

        # Check success flag (SDK returns a dict matching the JSON response)
        if result.get('success'):
            data = result['data']
            print("✅ Extraction Successful!")
            
            # Basic Identity
            print(f"Region:     {data.get('country_code')} (Series: {data.get('card_series')})")
            print(f"Name:       {data.get('given_names')} {data.get('surname')}")
            print(f"ID Number:  {data.get('document_number')}")
            
            # Critical Field: Personal Identity Number (CNP/CPF/NIN)
            print(f"Personal #: {data.get('personal_number')}")
            
            # Demographics
            print(f"DOB:        {data.get('date_of_birth')} ({data.get('sex')})")
            print(f"Address:    {data.get('address')}")
        else:
            print(f"❌ Extraction Failed: {result.get('error')}")

    except Exception as e:
        # Handle SDK or Network errors
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    scan_national_id()

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (File Path)
  • Max File Size: 4.5MB
  • Output: JSON (Structured Data)

Key Features

  • Specialized Numbers: Extracts region-specific IDs like CNP (Romania), CPF (Brazil), and NIN (Nigeria).
  • Multi-line Addresses: Intelligently reconstructs full addresses from fragmented lines on ID cards.
  • Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "national_id",
    "country_code": "ROU",
    "nationality": "ROMANA",
    "document_number": "123456",
    "card_series": "KS",
    "personal_number": "1920319123456",
    "surname": "POPESCU",
    "given_names": "ANDREI",
    "sex": "M",
    "date_of_birth": "1992-03-19",
    "place_of_birth": "Jud. CS Mun. Reșița",
    "address": "Jud. CS Orș. Bocșa Str. Nucilor Nr. 15",
    "date_of_issue": "2020-05-10",
    "date_of_expiry": "2030-05-10",
    "issuing_authority": "SPCLEP Bocșa"
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

General-purpose OCR services like AWS Textract and Google Vision return raw, unstructured text dumps or simple key-value pairs. You are still responsible for parsing, validating, and structuring that data. StructOCR is a specialized model trained exclusively on identity documents. It returns a fully parsed, validated JSON object with predefined fields like `date_of_birth` and `document_number`, eliminating the need for any post-processing logic.

Do you store the uploaded images?

We do not store customer images. All uploaded files are processed in-memory and are permanently deleted immediately after the OCR extraction process is complete. Your data privacy is paramount.

How to handle blurry images?

Our API includes a powerful, automatic image enhancement engine. Before extraction, it performs denoising, deblurring, and contrast correction to maximize the accuracy of results from low-quality or blurry source images.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included