# Convert PDF Purchase Orders to JSON (or EDI) via API

> How to turn PDF purchase orders into structured JSON or compliant EDI through an API, what the response looks like, and how to handle scanned and low-confidence documents.

<QuickAnswer>
Send the PDF to an extraction API endpoint. It reads the purchase order, matches each line to your catalog, and returns structured JSON: PO number, dates, addresses, and line items with quantities and prices. From that JSON you post the order to your ERP, or generate a compliant EDI 850 for a trading partner.
</QuickAnswer>

**To convert a PDF purchase order programmatically, POST it to an extraction API that returns structured JSON, then route that JSON either into your ERP or into a compliant EDI document, depending on who needs it next.**

The PDF is the universal order format. Every buyer can produce one, which is exactly why it lands in your inbox more than any other channel. The problem is that a PDF is a picture of an order, not the order itself. To do anything automated with it, you have to turn it into data.

There are two destinations worth converting to, and the right one depends on what sits downstream.

## Destination one: PDF to JSON

JSON is the format your own systems speak. Convert the PDF to JSON when the order is going into your ERP, your order management system, or any internal pipeline.

A single API call does it. You send the file, the API extracts and matches, and you get back a normalized object:

```json
{
  "po_number": "PO-48821",
  "order_date": "2026-06-28",
  "ship_to": { "name": "Acme Foods - Tacoma DC" },
  "line_items": [
    {
      "raw_description": "CHOC BAR DARK 70% 12CT",
      "matched_sku": "MV-DK70-12",
      "quantity": 40,
      "uom": "CASE",
      "unit_price": 28.50,
      "confidence": 0.98
    }
  ],
  "review_required": false
}
```

The value is in the `matched_sku` and `confidence` fields. A raw text dump of the PDF would still leave you mapping "CHOC BAR DARK 70% 12CT" to your catalog by hand. An order-aware [extraction API](/extraction-api) does that resolution and tells you how sure it is, so high-confidence orders post automatically and the rest stop for review.

## Destination two: PDF to EDI

Sometimes the order has to leave your building again as EDI, because a retailer or trading partner requires it. In that case you convert the PDF to a compliant X12 document rather than to internal JSON.

The mechanics are the same up front: extract and match. The difference is the output format. Instead of JSON, the system emits an EDI 850 purchase order that carries the partner's exact qualifiers, separators, and version. The [X12](https://x12.org/) standard defines the transaction set, and each partner layers their own requirements on top. OrderSync handles this as [PDF to EDI](/pdf-to-edi), and the same engine covers [email to EDI](/email-to-edi) when the order arrives as a message instead of an attachment.

If you want to see what a finished 850 should contain before you generate one, the [EDI 850 purchase order guide](/guides/edi/850-purchase-order) breaks it down segment by segment, and the free [EDI Inspector](/edi-inspector) parses a real one in the browser.

## Handling scanned and low-confidence PDFs

Not every PDF is clean. Plenty are scans of printed pages, or faxes saved to PDF, or photos a rep took on a phone. Conversion still works, with a few caveats:

- **OCR runs first**: The API reads the image, then extracts. Accuracy on clean machine-generated PDFs is typically 95% or better on quantities and pricing, and lower per field on scans and handwriting.
- **UPCs make matching reliable**: When a UPC is on the document, the identifier resolves to your SKU unambiguously ([GS1 US](https://www.gs1us.org/) governs that standard); when only a description exists, the alias logic does the work.
- **Confidence routing matters more here**: Because field accuracy drops on poor scans, the review flag protects your ERP, so a misread quantity never becomes a short shipment.

## JSON or EDI: which destination

| | Convert to JSON | Convert to EDI |
| --- | --- | --- |
| Use when | Order goes into your own ERP or systems | A trading partner requires an X12 document |
| Output | Normalized order object | Compliant EDI 850 with partner qualifiers |
| Consumer | Your create-order API | Retailer or partner mailbox (AS2, SFTP, VAN) |
| Extraction step | Identical | Identical |

The extraction and matching are the same for both. Only the output format changes, so you choose per destination rather than per document.

## A practical sequence

1. Receive the PDF, by email, upload, or webhook.
2. POST it to the extraction endpoint.
3. Read the JSON response and check `review_required`.
4. If clean, post to your ERP or generate the EDI 850. If flagged, route to review.
5. Keep the source document linked to the extracted data for audit and disputes.

Most teams that get orders in several formats run all of them through one [multi-format pipeline](/multi-format-orders) so step two is identical whether the order was a PDF, an email, or an EDI file. The downstream code only ever sees clean JSON.

## Frequently asked questions

### Can I really convert any PDF layout without setup?

Yes. An AI-based extraction API reads documents by structure and meaning, not fixed coordinates, so a new customer layout converts on first contact without a template project.

### JSON or EDI, which should I convert to?

JSON if the order is going into your own systems. EDI if a trading partner requires an X12 document. The same extraction feeds both, so you decide per destination, not per document.

### What about the data on a bad scan?

OCR handles the image and confidence scoring handles the uncertainty. Low-confidence lines flag for review rather than posting silently.

### How do I get an API key?

[Request access to the extraction API](/extraction-api). Onboarding maps your catalog and ERP so the conversion returns matched, post-ready orders. You can try the conversion free first with the [PO Extractor](/tools/po-extractor).
