How to Automate Invoice Processing with an API: A Complete Guide

Your accounts payable team is manually keying vendor names, invoice numbers, and line item totals into your ERP. Eight hours a week. Every week. For documents that all look basically the same.

This is a solved problem. Automated invoice processing APIs can eliminate 95%+ of manual data entry from invoice workflows — and getting one running takes less than an afternoon. This guide walks through exactly how to do it.

What "automated invoice processing" actually means

The term covers a few different things depending on who's using it:

OCR + raw text extraction: Turning a scanned invoice image into readable text. Useful, but not sufficient — you still have to parse the text yourself.
Template-based extraction: Building custom rules per vendor. Works until you add a new vendor. Breaks constantly.
AI-powered structured extraction: Sending the document to an AI that understands invoice structure and returns clean, validated JSON with the exact fields you need. No templates. No rules.

The third approach is what we're covering here. It's the one that actually scales.

What you need before you start

A document parsing API (we'll use Dokyumi — 100 free extractions/month, no card required)
Your invoices in PDF or image format (JPG, PNG — even poor scans work)
Your destination system (a database, ERP, spreadsheet, or webhook endpoint)

That's it. No AWS account. No GCP project. No ML pipeline to maintain.

Step 1: Define your invoice extraction schema

The first step is telling the API what fields you want out of an invoice. You do this in plain English, not code.

"An accounts payable invoice. I need: vendor_name, vendor_address, invoice_number, invoice_date, due_date, subtotal, tax_amount, total_amount, currency, and line_items (an array with description, quantity, unit_price, and line_total for each)."

The AI generates the full extraction schema for you. You can review it, add fields, or remove ones you don't need. Give it a slug like invoice-parser.

You now have a dedicated API endpoint: POST https://dokyumi.com/api/v1/extract with your schema ID.

Step 2: Send your first invoice

Get your API key from API Keys → New Key. Then:

curl -X POST https://dokyumi.com/api/v1/extract \
  -H "Authorization: Bearer dk_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "schema_id=your-schema-id"

Response in under 10 seconds:

{
  "status": "success",
  "data": {
    "vendor_name": "Acme Office Supply Co.",
    "vendor_address": "123 Main St, Chicago, IL 60601",
    "invoice_number": "INV-2026-00847",
    "invoice_date": "2026-03-10",
    "due_date": "2026-04-09",
    "subtotal": 2340.00,
    "tax_amount": 187.20,
    "total_amount": 2527.20,
    "currency": "USD",
    "line_items": [
      {
        "description": "Office Chair (Ergonomic, Model X3)",
        "quantity": 4,
        "unit_price": 285.00,
        "line_total": 1140.00
      },
      {
        "description": "Standing Desk (Adjustable, 60in)",
        "quantity": 2,
        "unit_price": 600.00,
        "line_total": 1200.00
      }
    ]
  },
  "confidence": {
    "overall": 0.97,
    "fields": {
      "vendor_name": 0.99,
      "invoice_number": 0.98,
      "total_amount": 0.99,
      "line_items": 0.95
    }
  },
  "extraction_id": "ext_01j8abc123",
  "cached": false,
  "processing_ms": 4210
}

Clean, validated JSON. No post-processing needed. Every field you asked for, plus confidence scores so you know when something might need a human review.

Step 3: Build the automation layer

Now you wire this into your workflow. Here's a minimal Python script that processes a directory of invoices and writes results to a database:

import os
import json
import requests
from pathlib import Path

API_KEY = os.environ["DOKYUMI_API_KEY"]
SCHEMA_ID = os.environ["DOKYUMI_SCHEMA_ID"]
INVOICE_DIR = Path("./invoices/pending")

def extract_invoice(file_path: Path) -> dict:
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://dokyumi.com/api/v1/extract",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": (file_path.name, f, "application/pdf")},
            data={"schema_id": SCHEMA_ID},
            timeout=30
        )
    response.raise_for_status()
    return response.json()

def process_pending_invoices():
    results = []
    for invoice_file in INVOICE_DIR.glob("*.pdf"):
        print(f"Processing {invoice_file.name}...")
        try:
            result = extract_invoice(invoice_file)
            if result["status"] == "success":
                data = result["data"]
                confidence = result["confidence"]["overall"]

                # Flag for human review if confidence below threshold
                data["needs_review"] = confidence < 0.90
                data["source_file"] = invoice_file.name
                data["extraction_id"] = result["extraction_id"]

                results.append(data)
                invoice_file.rename(invoice_file.parent.parent / "processed" / invoice_file.name)
                print(f"  ✓ {data['vendor_name']} — ${data['total_amount']} (confidence: {confidence:.0%})")
            else:
                print(f"  ✗ Extraction failed: {result.get('error')}")
        except Exception as e:
            print(f"  ✗ Error: {e}")

    return results

if __name__ == "__main__":
    invoices = process_pending_invoices()
    print(f"\nProcessed {len(invoices)} invoices")
    print(f"Flagged for review: {sum(1 for i in invoices if i['needs_review'])}")

This pattern — extract, validate confidence, route to review queue if uncertain — is the core of any production invoice automation system.

Step 4: Handle the edge cases that kill real implementations

Most tutorials stop at the happy path. Here's what you actually need to handle in production:

Multi-page invoices

Dokyumi handles multi-page PDFs natively. Line items that span pages get merged automatically. You don't need to split documents before sending.

Low-quality scans

Mistral OCR is unusually good at reconstruction from poor-quality images. But for documents that come out with confidence < 0.75 on critical fields, build a review queue. Don't try to auto-correct — flag it and let a human take 30 seconds to verify.

International invoices

Add currency and locale to your schema description: "invoices may be in USD, EUR, or GBP — normalize all amounts to a float". The model handles comma-as-decimal-separator, VAT vs. tax labeling, and date format variations.

Duplicate detection

Dokyumi caches OCR results by file hash. If someone accidentally submits the same invoice twice, the second call is near-instant and costs nothing. Build deduplication on your side using invoice_number + vendor_name as a composite key.

Webhook delivery for batch jobs

If you're processing hundreds of invoices in a batch, use webhook delivery instead of synchronous polling. Set your webhook URL in the schema settings — Dokyumi will POST each result as it completes.

curl -X POST https://dokyumi.com/api/v1/extract \
  -H "Authorization: Bearer dk_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "schema_id=your-schema-id" \
  -F "webhook_url=https://your-app.com/hooks/invoices"

Step 5: Connect to your ERP or accounting system

The JSON you get back is already structured to map directly to most AP systems. Here are common integration patterns:

QuickBooks Online

Use the QBO Bills API. Map vendor_name → VendorRef, total_amount → TotalAmt, line_items → Line[].

NetSuite

POST to the Vendor Bills REST endpoint. The line item array maps cleanly to item.items[].

Xero

Use the Xero Invoices API. The extracted data maps to Contact.Name, LineItems[], and AmountDue without transformation.

Google Sheets / Airtable

For smaller operations: flatten the line items with json.dumps(data["line_items"]) and write each invoice as a row. Quick and works well for teams not yet on a full ERP.

What this actually costs

This is the question that matters. Here's a realistic cost model for a mid-sized AP team processing 500 invoices/month:

Approach	Setup time	Monthly cost	Manual hours/month
Manual data entry	—	$2,000–4,000 (labor)	40–80 hrs
AWS Textract + custom processing	2–6 weeks	$150–400 (infra + dev)	10–20 hrs (maintenance)
Dokyumi Starter plan	< 1 day	$79/month flat	2–5 hrs (review queue only)

The flat-rate pricing matters more than it sounds. AWS Textract and Google Document AI both charge per page — which is fine until you start processing attachments, multi-page statements, or remittance PDFs. At 500 invoices averaging 3 pages each, per-page pricing can easily hit $150–400/month with zero predictability. Dokyumi's Starter plan covers up to 500 extractions flat.

Common mistakes to avoid

Trying to build this with raw OCR first. If you extract raw text and then write regex to find "Total:" or "Invoice #", you're building a fragile system that breaks on every new vendor template. AI structured extraction is cheaper, more accurate, and infinitely more maintainable.

Skipping confidence scores. AI extraction is not 100% accurate on every document. That's fine — the accuracy rate for a clean invoice is typically 95–99%. The confidence score is the mechanism for catching the other 1–5%. Build a review queue from day one.

Not testing with your worst documents first. Use your oldest, lowest-quality scans during testing. If the API handles those well, everything else is easy. If it struggles, you want to know before you've built the integration.

Processing synchronously in large batches. For anything over 50 invoices, use the async webhook pattern. Synchronous polling at scale creates timeout issues and is harder to resume on failure.

The result

A team that was spending 40 hours a month on invoice data entry comes out the other side spending 3–5 hours reviewing the handful of extractions that got flagged. Everything else is handled automatically, with a full audit trail, confidence scores, and source documents stored for reference.

The setup from schema definition to first production extraction is typically under a day. The ROI shows up in week one.

The full invoice extraction workflow — schema definition, API integration, ERP mapping, confidence-based review routing — is something you can have running this week.