Order Extraction API

One API for PDF purchase orders, order emails, EDI, and spreadsheets. Extract every line item, match it to your catalog, and return ERP-ready JSON. Built for B2B orders, not generic OCR.

Or try the extraction free, no signup: PO Extractor

Learn more

How the Extraction API Works

From a document in your inbox to a matched, ERP-ready order object in one API call.

Step 1

POST the document

Send a PDF, email, EDI file, or spreadsheet to a single endpoint. No per-format parser, no template setup. The API detects the document type for you.

Step 2

AI extracts the order

Vision and language models read the PO number, dates, addresses, and every line item from any layout, including tables, headers, and handwritten amendments.

Step 3

Lines match your catalog

Each line resolves to your SKU using customer-specific aliases and UPCs. Units of measure normalize, prices validate against master data, and a confidence score is attached.

Step 4

Get ERP-ready JSON

The response is a clean order object you can post straight to your ERP, with field-level provenance back to the source document and a review flag for low-confidence lines.

One Request, a Matched Order Back

POST any order document. Get structured, catalog-matched JSON. The example below shows a PDF purchase order resolved to a SKU with a confidence score.

Request
curl https://api.ordersync.io/v1/extract \
  -H "Authorization: Bearer $ORDERSYNC_API_KEY" \
  -F "file=@purchase-order.pdf" \
  -F "customer_id=acme-foods"
Response
{
  "document_type": "purchase_order",
  "po_number": "PO-48821",
  "order_date": "2026-06-28",
  "requested_delivery": "2026-07-05",
  "ship_to": {
    "name": "Acme Foods - Tacoma DC",
    "address": "1200 Port Rd, Tacoma, WA 98421"
  },
  "line_items": [
    {
      "raw_description": "CHOC BAR DARK 70% 12CT",
      "matched_sku": "MV-DK70-12",
      "quantity": 40,
      "uom": "CASE",
      "unit_price": 28.50,
      "confidence": 0.98
    }
  ],
  "review_required": false,
  "source": { "page": 1, "format": "pdf" }
}

Illustrative contract. Endpoints, fields, and your catalog mapping are set during onboarding.

Why an Order API, Not a Generic OCR API

OCR and document-parsing APIs read documents. An order API understands orders. The difference is everything you would otherwise build yourself.

Order-aware, not just OCR

Generic extraction APIs hand back the text they read. OrderSync resolves each line to your catalog, normalizes the unit of measure, and checks the price. The output is an order you can post, not a transcript you reconcile by hand.

One endpoint for every format

PDF, email, EDI X12, CSV, and Excel orders all hit the same endpoint and return the same JSON shape. You integrate once instead of stitching together an OCR vendor, an EDI translator, and an email parser.

Confidence scores and review routing

Every line carries a confidence score. Clean machine-generated PDFs post automatically. Scanned, faxed, or ambiguous documents flag for a human-review step before they reach your ERP, so bad data never syncs silently.

Provenance to the source pixel

Each extracted value traces back to where it appeared in the document. When a customer disputes a quantity, you show the highlighted region in the original PDF instead of arguing from memory.

What Teams Build With It

Same endpoint, different document on the way in. Pick the order source you are drowning in.

Purchase order PDF to JSON

A customer emails a PO as a PDF. POST it to the API and get back structured line items, addresses, and dates, with each SKU matched to your catalog and ready to drop into your order system.

Order emails to structured orders

Orders arrive as free-text email bodies or mixed attachments. The API reads the message and any attached PDFs or spreadsheets and returns one normalized order object.

EDI plus non-EDI in one pipeline

Some customers send EDI 850, most send PDFs and email. Route all of them through the same endpoint so your integration does not care how the order arrived.

Spreadsheet orders without column mapping

CSV and Excel order files vary by customer. The API reads them contextually and returns the same JSON as every other format, so you skip building a mapping per template.

Frequently Asked Questions

It returns structured order JSON: PO number, order and delivery dates, ship-to and bill-to addresses, and line items with product codes, descriptions, quantities, units of measure, and prices. Unlike a generic OCR API, each line item is matched against your catalog and pricing, so you get a resolved SKU and a confidence score, not just raw text.

Generic OCR and document-parsing APIs (Mindee, Nanonets, Rossum, Google Document AI, Amazon Textract) return the fields they read off the page. They do not know your catalog, your customer-specific part numbers, or your pricing. The OrderSync API is purpose-built for B2B orders: it normalizes units of measure, resolves customer aliases to your SKUs, validates pricing against master data, and flags low-confidence lines for review. You get an order you can post, not a transcript you still have to reconcile.

PDF purchase orders (digital-native and scanned), order emails with bodies or attachments, EDI X12 (850, 855, 860), and CSV or Excel order files. One endpoint accepts all of them and returns the same normalized JSON shape, so you integrate once instead of building a parser per format.

Accuracy depends on document quality. On clean, machine-generated PDFs, line-item extraction is typically 95% or better on quantities and pricing. Scanned, faxed, or handwritten documents score lower on individual fields, which is why every line carries a confidence score and low-confidence orders can route to a human-review queue before they reach your ERP.

API access is granted through an onboarding call. We map your catalog, customer part-number aliases, and ERP target during setup so the API returns matched, post-ready orders from day one rather than raw fields. Book an intro call to request a key and see your own documents run through it.

Yes. The free PO Extractor, Invoice Extractor, and Email Order Parser tools run the same extraction engine in the browser with no signup. Upload a document and see the structured output the API would return.

Request Access to the Extraction API

Bring a real purchase order. On the call we map your catalog and ERP, then run your own document through the API so you see matched, post-ready JSON come back. 15 minutes, no commitment.

Request API Access

No credit card required. Prefer to poke first? The free tools run the same engine with no signup.