Direct HTTP API for National ID Data Extraction in Java

Achieve 99.8%+ data accuracy with sub-1500ms latency via a single API call.

Steve HarringtonUpdated 2026-01-16
Diagram showing a raw, skewed photo of a national ID card being sent to the StructOCR API and returning a clean, structured JSON object with key-value pairs like name and document number.
Figure 1: StructOCR converts raw National ID images into validated JSON data.

Why National ID OCR is Difficult

Generic OCR engines like Tesseract fail on real-world ID documents due to their variability. The core challenge is not just character recognition, but contextual understanding. Issues include inconsistent lighting causing glare and shadows, variable skew and rotation from mobile captures, and laminated surfaces that distort text. Furthermore, extracting structured data requires parsing complex layouts that differ by country and document version. This leads to brittle, high-maintenance RegEx patterns. Manually parsing and validating Machine-Readable Zone (MRZ) check digits is an additional, error-prone step that open-source tools do not handle out-of-the-box, increasing engineering overhead and reducing data reliability.

Enterprise-Grade Extraction with StructOCR

StructOCR bypasses the limitations of generic OCR with specialized, pre-trained Deep Learning models. Our API first runs an automatic image pre-processing pipeline, which includes perspective correction (deskewing), glare removal, and denoising to normalize the input. The cleaned image is then passed to models trained specifically on millions of identity documents, enabling them to locate and extract specific fields like 'Date of Birth' or 'Document Number' with high precision. Unlike Tesseract which returns an unstructured block of text, StructOCR delivers a standardized, predictable JSON output with built-in validation for every request, eliminating the need for client-side parsing and maintenance.

Production Use Cases

  • Digital Onboarding (KYC): Reduce drop-off rates by pre-filling user data from National IDs in < 2 seconds.
  • Fraud Prevention: Detect tampered fonts or mismatched MRZ checksums automatically.
  • Global Compliance: Handle National IDs from 200+ jurisdictions without custom rules.

Implementation: Java (Standard HttpClient)

The following code uses the native `java.net.http.HttpClient` (Java 11+). It handles the `x-api-key` authentication and sends the Base64-encoded image without requiring third-party libraries.

Prerequisite: JDK 11+

CODE EXAMPLE
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.Base64;

public class NationalIdOcrExample {

    // 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
    // 👉 https://structocr.com/register

    private static final String API_KEY = "YOUR_API_KEY_HERE";
    private static final String API_ENDPOINT = "https://api.structocr.com/v1/national-id";

    public static void main(String[] args) {
        // Note: Supports JPG, PNG, WebP (Max 4.5MB)
        String imagePath = "id_card.jpg";

        try {
            // 1. Validate File
            Path path = Path.of(imagePath);
            if (!Files.exists(path)) {
                System.err.println("Error: File not found at " + path.toAbsolutePath());
                return;
            }

            // 2. Read and Encode Image
            byte[] imageBytes = Files.readAllBytes(path);
            String base64Image = Base64.getEncoder().encodeToString(imageBytes);

            // 3. Construct JSON Payload (Dependency-free)
            // For production, use Jackson or Gson.
            String jsonPayload = "{\"img\": \"" + base64Image + "\"}";

            // 4. Create HttpClient
            HttpClient client = HttpClient.newBuilder()
                    .version(HttpClient.Version.HTTP_1_1)
                    .connectTimeout(Duration.ofSeconds(10))
                    .build();

            // 5. Build Request
            // Important: 'x-api-key' is required in the header
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(URI.create(API_ENDPOINT))
                    .header("Content-Type", "application/json")
                    .header("x-api-key", API_KEY)
                    .timeout(Duration.ofSeconds(30))
                    .POST(HttpRequest.BodyPublishers.ofString(jsonPayload))
                    .build();

            System.out.println("Scanning ID card at " + API_ENDPOINT + "...");

            // 6. Send Request
            HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());

            // 7. Output Result
            if (response.statusCode() == 200) {
                System.out.println("✅ Extraction Successful!");
                // Parse the JSON response here to access specific fields like 'personal_number' or 'address'
                System.out.println(response.body());
            } else {
                System.err.println("❌ API Error: " + response.statusCode());
                System.err.println(response.body());
            }

        } catch (IOException | InterruptedException e) {
            System.err.println("Request failed: " + e.getMessage());
            Thread.currentThread().interrupt();
        }
    }
}

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (Base64 Encoded)
  • Max File Size: 4.5MB
  • Output: JSON (Structured Data)

Key Features

  • Specialized Numbers: Extracts region-specific IDs like CNP (Romania), CPF (Brazil), and NIN (Nigeria).
  • Multi-line Addresses: Intelligently reconstructs full addresses from fragmented lines on ID cards.
  • Date Normalization: Returns all dates (Birth, Issue, Expiry) in a standardized YYYY-MM-DD format.

Sample JSON Output

StructOCR returns a normalized JSON object, regardless of the input image angle or quality.

{
  "success": true,
  "data": {
    "type": "national_id",
    "country_code": "ROU",
    "nationality": "ROMANA",
    "document_number": "123456",
    "card_series": "KS",
    "personal_number": "1920319123456",
    "surname": "POPESCU",
    "given_names": "ANDREI",
    "sex": "M",
    "date_of_birth": "1992-03-19",
    "place_of_birth": "Jud. CS Mun. Reșița",
    "address": "Jud. CS Orș. Bocșa Str. Nucilor Nr. 15",
    "date_of_issue": "2020-05-10",
    "date_of_expiry": "2030-05-10",
    "issuing_authority": "SPCLEP Bocșa"
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

General purpose APIs like Textract or Vision return raw, unstructured lines of text and coordinates, leaving the complex task of parsing and validation to your engineers. StructOCR is a specialized API; it returns a structured JSON object with clearly defined fields such as 'surname' and 'date_of_birth', which are already validated for correctness (e.g., MRZ checksums). This eliminates post-processing and reduces development time.

Do you store the uploaded images?

No. All image data is processed in-memory and is purged immediately after the API request is completed. We do not persist any customer-provided images on our servers.

How to handle blurry images?

Our API includes an internal image enhancement engine that automatically attempts to deblur and sharpen images before processing. For best results, we recommend a minimum resolution of 300 DPI, but the system is robust against common mobile camera capture issues.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included