Question 1

What is Dokyumi?

Accepted Answer

Dokyumi is a no-code document parsing API platform. You describe the fields you want to extract from your documents, and Dokyumi generates a dedicated API endpoint that handles OCR and structured data extraction automatically. It uses a two-stage AI pipeline: Mistral OCR for text extraction and Claude for intelligent field mapping.

Question 2

How does Dokyumi compare to AWS Textract or Google Document AI?

Accepted Answer

AWS Textract and Google Document AI are raw OCR engines — they return raw text or key-value pairs and require you to write significant post-processing code. Dokyumi is schema-first: you define exactly which fields you want (like vendor_name, invoice_total, due_date) and get clean, validated JSON back. No AWS or GCP account required. Setup takes under 2 minutes instead of weeks.

Question 3

What document types does Dokyumi support?

Accepted Answer

Dokyumi supports any document with a repeatable structure: invoices, bank statements, W-2s and 1099s, pay stubs, insurance claims (EOBs, declaration pages), medical records, legal contracts, leases, bills of lading, customs documents, and more. If the document has consistent fields, Dokyumi can extract them.

Question 4

What file formats are supported?

Accepted Answer

Dokyumi accepts PDF, JPEG, PNG, TIFF, and WEBP files up to 20MB. For best OCR results, documents should be at least 150 DPI. Both scanned documents and digital PDFs are supported.

Question 5

How do I get started with Dokyumi?

Accepted Answer

Sign up for free at dokyumi.com. You get 100 free extractions per month with no credit card required. Create your first extraction schema by describing your document in plain English (or use AI inference to auto-detect the schema), then use the generated API endpoint to start extracting data.

Question 6

What is the pricing?

Accepted Answer

Dokyumi offers a free tier with 100 extractions per month. Paid plans start at $79/month for the Starter plan (1,000 extractions, 10 schemas), $499/month for Growth (10,000 extractions, 50 schemas, white-label portals), and $1,299/month for Enterprise (50,000 extractions, unlimited schemas, white-label sites, and SLAs). All paid plans include full API access, webhooks, and priority support.

Question 7

What are white-label portals?

Accepted Answer

White-label portals are branded upload interfaces you can give to your customers. Instead of exposing your API, you create a custom-branded page where clients upload documents. Dokyumi handles extraction in the background and delivers results via webhook or email. White-label sites are available on every plan: 1 on Free, 5 on Starter, 25 on Growth, and unlimited on Enterprise.

Question 8

Does Dokyumi have an API I can call from my code?

Accepted Answer

Yes. Dokyumi's core feature is its REST API. After creating a schema, you get a dedicated endpoint you can call with a POST request — file + schema slug. Returns structured JSON with extracted fields and per-field confidence scores. See the full API documentation at dokyumi.com/docs.

Feature	Dokyumi	AWS Textract	Google Doc AI	LlamaParse
No AWS/GCP account required
Custom extraction schema				Partial
Dedicated API endpoint per schema
White-label upload portals
OCR result caching
Field confidence scores
Predictable flat-rate pricing
Free tier	100/mo	1K pages/mo	1K pages/mo	10K credits/mo

Any document in.
Structured JSON out.

How it works

Define Your Schema

Get Your API Endpoint

Get Structured JSON

One API call. Structured JSON.

What people are parsing

Invoice Processing

Bank Statements

Insurance Claims

Tax Documents

Medical Records

Logistics & Shipping

Built for developers

Two-Stage AI Pipeline

Schema Validation

OCR Caching

White-Label Portals

How Dokyumi compares

Built for the boring work that matters

Frequently asked questions

Stop parsing documents by hand.

Any document in.Structured JSON out.