The C# Invoice OCR API for High-Accuracy Data Extraction

Achieve 99.8%+ accuracy on unstructured invoices and get structured JSON data in under 1500ms.

Steve HarringtonUpdated 2026-01-19
A diagram showing a scanned invoice image being sent to the StructOCR API and returning a structured JSON object with extracted fields like invoice number, line items, and total amount.
Figure 1: StructOCR converts raw invoice images and PDFs into validated JSON data.

Why Invoice OCR Fails with Generic Tools

Generic OCR tools like Tesseract fail on invoices due to their high layout variance. Parsing fails when confronted with multi-page documents, complex table structures with hidden lines, skewed scans, and low-resolution photos. The engineering overhead to maintain template-specific RegEx patterns for each vendor becomes unsustainable. Developers are forced to build and maintain complex pre-processing pipelines for denoising and deskewing, only to achieve mediocre accuracy that still requires manual review, defeating the purpose of automation.

Structured Data Extraction with StructOCR

StructOCR utilizes a suite of pre-trained Deep Learning models, purpose-built for financial documents. Our API handles image pre-processing automatically, including deskewing, glare removal, and noise reduction. Unlike Tesseract, which returns an unstructured dump of text coordinates, StructOCR provides a standardized JSON output with logically grouped entities such as line items, vendor details, and tax summaries. This eliminates the need for any post-processing logic, allowing you to integrate directly into your AP system.

Production Use Cases

  • Accounts Payable Automation: Eliminate manual data entry. Ingest vendor invoices from any format (PDF, JPG) and automatically populate your ERP system.
  • Expense Management Automation: Streamline expense reporting by allowing employees to simply photograph receipts and invoices, with all data extracted instantly.
  • Three-Way Matching: Automatically verify invoice data against purchase orders and goods receipt notes to detect discrepancies and prevent fraud.

Implementation: Raw C# (HttpClient)

The following C# code demonstrates a complete flow using `System.Net.Http`. It properly sets the `x-api-key` header and deserializes the nested JSON response (including line items and financials) into strong-typed objects.

Prerequisite: .NET Core 3.1+ or .NET 5/6/7+

CODE EXAMPLE
// 💰 Save 30%+ vs competitors. Get 200 free requests instantly:
// 👉 https://structocr.com/register

using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Threading.Tasks;
using System.Collections.Generic;

public class InvoiceOcrExample
{
    // 🔑 Need a key? Get 200 free requests instantly (No Credit Card required):
    // 👉 https://structocr.com/register
    private const string ApiKey = "YOUR_API_KEY_HERE";
    private const string ApiEndpoint = "https://api.structocr.com/v1/invoice";

    private static readonly HttpClient client = new HttpClient();

    public static async Task Main(string[] args)
    {
        // Note: Currently supports image inputs (JPG, PNG)
        string imagePath = "invoice.jpg";

        if (!File.Exists(imagePath))
        {
            Console.WriteLine($"Error: File not found at {imagePath}");
            return;
        }

        try
        {
            // 1. Prepare Payload
            byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
            string base64Image = Convert.ToBase64String(imageBytes);
            var payload = new { img = base64Image };

            // 2. Setup Request
            // Important: Clear headers to avoid duplication if reusing client
            client.DefaultRequestHeaders.Clear();
            client.DefaultRequestHeaders.Add("x-api-key", ApiKey);

            // 3. Send POST Request
            Console.WriteLine($"Sending invoice to {ApiEndpoint}...");
            HttpResponseMessage response = await client.PostAsJsonAsync(ApiEndpoint, payload);

            // 4. Handle Response
            string responseBody = await response.Content.ReadAsStringAsync();

            if (!response.IsSuccessStatusCode)
            {
                Console.WriteLine($"API Error ({response.StatusCode}): {responseBody}");
                return;
            }

            // 5. Deserialize JSON to Objects
            var options = new JsonSerializerOptions { PropertyNameCaseInsensitive = true };
            var result = JsonSerializer.Deserialize<ApiResponse>(responseBody, options);

            if (result?.Success == true && result.Data != null)
            {
                var data = result.Data;
                Console.WriteLine("✅ Extraction Successful!");
                Console.WriteLine($"Invoice #: {data.InvoiceNumber}");
                Console.WriteLine($"Date:      {data.Date}");
                Console.WriteLine($"Vendor:    {data.Merchant?.Name} (Tax ID: {data.Merchant?.TaxId})");
                Console.WriteLine($"Total:     {data.Financials?.TotalAmount} {data.Currency}");

                Console.WriteLine("\n--- Line Items ---");
                if (data.LineItems != null)
                {
                    foreach (var item in data.LineItems)
                    {
                        Console.WriteLine($"- {item.Description}: {item.Quantity} x {item.UnitPrice} = {item.Amount}");
                    }
                }
            }
            else
            {
                Console.WriteLine($"Extraction Failed: {result?.Error}");
            }
        }
        catch (Exception e)
        {
            Console.WriteLine($"Unexpected Error: {e.Message}");
        }
    }
}

// --- Data Models ---
public class ApiResponse
{
    public bool Success { get; set; }
    public InvoiceData Data { get; set; }
    public string Error { get; set; }
}

public class InvoiceData
{
    [JsonPropertyName("invoice_number")]
    public string InvoiceNumber { get; set; }
    public string Date { get; set; }
    public string Currency { get; set; }
    public MerchantData Merchant { get; set; }
    public FinancialData Financials { get; set; }
    
    [JsonPropertyName("line_items")]
    public List<LineItem> LineItems { get; set; }
}

public class MerchantData
{
    public string Name { get; set; }
    [JsonPropertyName("tax_id")]
    public string TaxId { get; set; }
}

public class FinancialData
{
    [JsonPropertyName("total_amount")]
    public decimal? TotalAmount { get; set; }
    [JsonPropertyName("tax_amount")]
    public decimal? TaxAmount { get; set; }
}

public class LineItem
{
    public string Description { get; set; }
    public decimal? Quantity { get; set; }
    [JsonPropertyName("unit_price")]
    public decimal? UnitPrice { get; set; }
    public decimal? Amount { get; set; }
}

Technical Specs

  • Latency: < 5s (Average)
  • Uptime: 98.5% SLA
  • Security: AES-256 Encryption & SOC2 Compliant
  • Input: JPG, PNG, WebP (Base64 Encoded)
  • Max File Size: 4.5MB
  • Output: JSON (Nested Structure)

Key Features

  • Table Extraction Engine: Accurately parses complex line items and tables without manual templating.
  • Financial Validation: Cross-validates subtotals, taxes, and grand totals to ensure mathematical accuracy.
  • Vendor Normalization: Automatically identifies merchants and extracts standardized tax IDs (VAT/EIN).

Sample JSON Output

StructOCR returns a clean, normalized JSON object, regardless of the input invoice's layout, language, or quality.

{
  "success": true,
  "data": {
    "type": "invoice",
    "invoice_number": "INV-2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-15",
    "currency": "USD",
    "merchant": {
      "name": "AWS Web Services",
      "address": "410 Terry Ave N, Seattle, WA",
      "tax_id": "EIN-12-3456789",
      "iban": null
    },
    "customer": {
      "name": "Acme Corp Inc.",
      "tax_id": "987654321"
    },
    "financials": {
      "subtotal": 100,
      "tax_amount": 10,
      "total_amount": 110
    },
    "line_items": [
      {
        "description": "EC2 Instance Usage",
        "quantity": 1,
        "unit_price": 80,
        "amount": 80
      },
      {
        "description": "S3 Storage",
        "quantity": 1,
        "unit_price": 20,
        "amount": 20
      }
    ]
  }
}

Frequently Asked Questions

How does StructOCR compare to AWS Textract or Google Vision?

General-purpose OCR services return raw text blocks and coordinates, leaving you to write and maintain complex parsing logic. StructOCR is a specialized model trained exclusively on invoices. It returns a structured JSON with pre-identified fields like `invoice_number`, `line_items`, and `total_amount`, eliminating post-processing.

Do you store the uploaded invoice images?

No. Images and documents are processed in-memory and are permanently deleted immediately after the extraction is complete. We do not persist customer data.

How do you handle blurry or low-quality scans?

Our API includes an automatic, server-side image enhancement engine. It performs deskewing, denoising, and contrast correction before the data extraction process begins, maximizing accuracy on suboptimal inputs.

More OCR Tutorials

Precise Data Extraction and Seamless Integration with AI-powered OCR API.

Empower your solutions with automated data extraction by integrating best-in class StructOCR via API seamlessly.

No credit card required • Full API access included