Direct HTTP API for National ID Data Extraction in Java

Achieve 99.8%+ data accuracy and sub-1500ms latency via a Hybrid Vision AI & MRZ Validation engine.

Diagram showing a raw, skewed photo of a national ID card being sent to the StructOCR API and returning a clean, structured JSON object with key-value pairs like name and document number.
Figure 1: StructOCR converts raw National ID images into validated JSON data.

Why National ID OCR is Difficult

Generic OCR engines like Tesseract fail on real-world ID documents due to their variability. The core challenge is not just character recognition, but contextual understanding. Issues include inconsistent lighting causing glare and shadows, variable skew and rotation from mobile captures, and laminated surfaces that distort text. Furthermore, extracting structured data requires parsing complex layouts that differ by country and document version. This leads to brittle, high-maintenance RegEx patterns. Manually parsing and validating Machine-Readable Zone (MRZ) check digits is an additional, error-prone step that open-source tools do not handle out-of-the-box, increasing engineering overhead and reducing data reliability.

Enterprise-Grade Extraction with StructOCR

StructOCR bypasses the limitations of generic OCR through a Hybrid Vision AI & MRZ Validation architecture. Our API first runs an automatic image pre-processing pipeline, which includes perspective correction (deskewing), glare removal, and denoising to normalize the input. The cleaned image is then passed to models trained specifically on millions of identity documents, enabling them to locate and extract specific fields like 'Date of Birth' or 'Document Number' with high precision. This advanced national id ocr capability ensures that even complex data, such as a 14 digits structure within an ID, is accurately captured. Unlike Tesseract which returns an unstructured block of text, StructOCR delivers a standardized, predictable JSON output with built-in validation for every request, eliminating the need for client-side parsing and maintenance.

Production Use Cases

  • Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from National IDs in < 2 seconds.
  • Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
  • Global Compliance: Handle National IDs from 200+ jurisdictions without custom rules.

Live Demo: ID card scanner

No registration required. Upload a file to test the extraction.

1
Upload
2
Results

Drop files here or click to browse

JPG · PNG · WebP  ·  up to 500 files · max 4.5 MB each

No files selected

Ready to use this in production? Get 20 free API calls — no credit card needed.

Get 20 Free API Calls →

Implementation: Java (Standard HttpClient)

The following code uses the native `java.net.http.HttpClient` (Java 11+). It handles the `x-api-key` authentication and sends the Base64-encoded image without requiring third-party libraries.

Prerequisite: JDK 11+

CODE EXAMPLE
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Base64;

public class NationalIdOcrExample {

    // 💰 Save 30%+ vs competitors. Get 20 free credits instantly:
    // 👉 https://structocr.com/register

    private static final String API_KEY = "YOUR_API_KEY_HERE";
    private static final String API_ENDPOINT = "https://api.structocr.com/v1/national-id";

    public static void main(String[] args) {
        // Note: Supports JPG, PNG, WebP (Max 4.5MB)
        String imagePath = "id_card.jpg";

        try {
            // 1. Validate File
            Path path = Path.of(imagePath);
            if (!Files.exists(path)) {
                System.err.println("Error: File not found at " + path.toAbsolutePath());
                return;
            }

            // 2. Read and Encode Image
            byte[] imageBytes = Files.readAllBytes(path);
            String base64Image = Base64.getEncoder().encodeToString(imageBytes);

            // 3. Construct JSON Payload (Dependency-free)
            // For production, use Jackson or Gson.
            String jsonPayload = "{\"img\": \"" + base64Image + "\"}";

            // 4. Create HttpClient
            HttpClient client = HttpClient.newBuilder()
                    .version(HttpClient.Version.HTTP_1_1)
                    .connectTimeout(Duration.ofSeconds(10))
                    .build();

            // 5. Build Request
            // Important: 'x-api-key' is required in the header
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(URI.create(API_ENDPOINT))
                    .header("Content-Type", "application/json")
                    .header("x-api-key", API_KEY)
                    .timeout(Duration.ofSeconds(30))
                    .POST(HttpRequest.BodyPublishers.ofString(jsonPayload))
                    .build();

            System.out.println("Scanning ID card at " + API_ENDPOINT + "...");

            // 6. Send Request
            HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());

            // 7. Output Result
            if (response.statusCode() == 200) {
                System.out.println("✅ Extraction Successful!");
                // Parse the JSON response here to access specific fields (e.g., 'personal_number')
                // Note: Raw MRZ data (if present) is located inside the 'additional_fields' object.
                System.out.println(response.body());
            } else {
                System.err.println("❌ API Error: " + response.statusCode());
                System.err.println(response.body());
            }

        } catch (IOException | InterruptedException e) {
            System.err.println("Request failed: " + e.getMessage());
            Thread.currentThread().interrupt();
        }
    }
}

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (Base64 Encoded)
  • Max File Size: 4.5MB
  • Output: JSON (Structured Data)

Key Features

  • Hybrid VIZ + MRZ AI: Cross-validates unstructured visual data against cryptographic MRZ checksums (TD1/TD2) for zero hallucination.
  • Specialized Numbers: Extracts region-specific IDs like CNP (Romania), CPF (Brazil), and NIN (Nigeria).
  • Multi-line Addresses: Intelligently reconstructs full addresses from fragmented lines on ID cards.

Sample JSON Output

StructOCR returns a normalized JSON object containing both Visual Zone (VIZ) extraction and raw Machine-Readable Zone (MRZ) lines.

{
  "success": true,
  "data": {
    "type": "national_id",
    "country_code": "ROU",
    "nationality": "ROMANA",
    "document_number": "123456",
    "card_series": "KS",
    "personal_number": "1920319123456",
    "surname": "POPESCU",
    "given_names": "ANDREI",
    "sex": "M",
    "date_of_birth": "1992-03-19",
    "place_of_birth": "Jud. CS Mun. Reșița",
    "address": "Jud. CS Orș. Bocșa Str. Nucilor Nr. 15",
    "date_of_issue": "2020-05-10",
    "date_of_expiry": "2030-05-10",
    "issuing_authority": "SPCLEP Bocșa",
    "additional_fields": {
      "phone_number": null,
      "tramite_number": null,
      "ejemplar": null,
      "mrz_line_1": "IDROU123456<0<<<<<<<<<<<<<<<<",
      "mrz_line_2": "9203195M3005108ROU19203191234562",
      "mrz_line_3": null
    }
  }
}

Frequently Asked Questions

Do you support Machine Readable Zones (MRZ) on ID cards?

Yes! Our engine natively supports ICAO 9303 standard MRZ formats (TD1/TD2) found on many global ID cards. Our Hybrid architecture extracts both the raw MRZ lines and cross-validates them against the Visual Zone (VIZ) for maximum accuracy.

How does StructOCR compare to AWS Textract or Google Vision?

General purpose APIs like Textract or Vision return raw, unstructured lines of text and coordinates, leaving the complex task of parsing and validation to your engineers. StructOCR is a specialized API; it returns a structured JSON object with clearly defined fields such as 'surname' and 'date_of_birth', which are already validated for correctness (e.g., MRZ checksums). This eliminates post-processing and reduces development time.

Do you store the uploaded images?

No. All image data is processed in-memory and is purged immediately after the API request is completed. We do not persist any customer-provided images on our servers.

How to handle blurry images?

Our API includes an internal image enhancement engine that automatically attempts to deblur and sharpen images before processing. For best results, we recommend a minimum resolution of 300 DPI, but the system is robust against common mobile camera capture issues.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included