# Purchase Order API: Extract PO Data Programmatically

> How a purchase order API turns PDF and email POs into structured JSON your systems can post. What to expect from the endpoint, the response shape, and matching.

<QuickAnswer>
A purchase order API accepts a PO as a PDF, email, or EDI file and returns structured JSON: PO number, dates, addresses, and line items with quantities and prices. A good one also matches each line to your catalog and attaches a confidence score, so you get a postable order instead of raw text.
</QuickAnswer>

**A purchase order API converts an inbound PO in any format into structured, validated order data your systems can post, replacing the manual keying that sits between a customer email and your ERP.**

Most B2B orders still arrive as documents. A buyer emails a PDF, attaches a spreadsheet, or drops a fax into your inbox. Someone on your team reads it and types it into the order system. A purchase order API removes that step: you send the document to an endpoint and get back the order as data.

This is different from asking your ERP for "the purchase order API." That phrase has two meanings. One is the API your order system exposes to create and read orders you already have. The other, the one this article is about, is an extraction API that reads an inbound PO and produces the structured data in the first place. You usually need both, and they connect at the line-item level.

## What a purchase order API returns

The input is a document. The output is a normalized order object. A typical response looks like this:

```json
{
  "po_number": "PO-48821",
  "order_date": "2026-06-28",
  "requested_delivery": "2026-07-05",
  "ship_to": {
    "name": "Acme Foods - Tacoma DC",
    "address": "1200 Port Rd, Tacoma, WA 98421"
  },
  "line_items": [
    {
      "raw_description": "CHOC BAR DARK 70% 12CT",
      "matched_sku": "MV-DK70-12",
      "quantity": 40,
      "uom": "CASE",
      "unit_price": 28.50,
      "confidence": 0.98
    }
  ],
  "review_required": false
}
```

The header fields are the easy part. PO number, dates, and addresses sit in predictable places, and most extraction tools read them well. The line items are where the work is, and where a purpose-built order API earns its keep.

## Extraction is not the hard part. Matching is.

Reading "CHOC BAR DARK 70% 12CT" off a page is solved technology. Knowing that string means SKU `MV-DK70-12` in your catalog, ordered by the case, at your contract price for that customer, is the part that breaks generic tools.

A buyer's PO rarely uses your part numbers. It uses theirs, or a free-text description, or a UPC. Turning that into your SKU requires customer-specific aliases, UPC lookups, and unit-of-measure logic. The GS1 standard for product identification ([GS1 US](https://www.gs1us.org/)) helps when a UPC is present, but plenty of POs carry only a description. That is why an order API attaches a `matched_sku` and a `confidence` score per line, rather than handing back the raw text and leaving the reconciliation to you.

When confidence is high, the order posts automatically. When it is low, the line flags for a person to check before anything reaches your ERP. That review gate is what keeps a bad extraction from becoming a short shipment.

## Purchase order API vs a generic extraction API

| Capability | Generic OCR / parsing API | Purchase order API |
| --- | --- | --- |
| Reads any PDF layout | Yes | Yes |
| Returns header fields | Yes | Yes |
| Knows your catalog and SKUs | No | Yes |
| Normalizes units of measure | No | Yes |
| Validates price against master data | No | Yes |
| Confidence score per line | Sometimes | Yes |
| Output | Raw fields to reconcile | Postable order |

The distinction matters because the reconciliation work a generic API leaves behind is most of the labor you were trying to remove. Reading the document was never the bottleneck. Matching it to your data was. For a deeper comparison, see [AI order processing versus OCR](/blog/ai-order-processing-vs-ocr).

## Formats one endpoint should accept

You do not want a parser per channel. A practical purchase order API takes all of these and returns the same JSON shape:

- **PDF**: Digital-native and scanned purchase orders both run through the same endpoint without a template per layout.
- **Email**: Order details in the message body or in attached files are read and merged into one order object.
- **EDI X12 850**: Structured documents from partners who send them are parsed alongside everything else ([X12](https://x12.org/) maintains the standard).
- **CSV and Excel**: Spreadsheet orders are read contextually, so you skip building a column map for every customer.

If you receive EDI today and want to read it before you wire anything up, the free [EDI Inspector](/edi-inspector) parses an X12 850 in the browser and shows the segments in plain language. For non-EDI orders, the [free PO Extractor](/tools/po-extractor) runs the same extraction engine on a PDF with no signup.

## Where the data goes next

Structured JSON is the handoff point. From there you either post it through your order system's own create-order API, or generate an [EDI 850 from the PDF](/pdf-to-edi) for a partner who needs it. The extraction API does not care which path you take; it produces clean data and your integration decides the destination. Teams that run mixed channels usually route everything through one [multi-format order pipeline](/multi-format-orders) so the downstream code never has to know how the order arrived.

The reason any of this is worth building is the cost of the alternative. Manual order entry is slow and error-prone, and the errors are expensive. See the [real cost of manual order entry](/blog/manual-order-entry-statistics) for the error-rate and labor data.

## Frequently asked questions

### Is a purchase order API the same as my ERP's order API?

No. Your ERP's order API creates and reads orders that already exist as data. An extraction API produces that data from an inbound document. They meet at the line-item level: extraction matches the SKU, your ERP API posts the order.

### Can it handle scanned or faxed POs?

Yes, with OCR on the front end. Accuracy on clean machine-generated PDFs is typically 95% or better on quantities and pricing. Scanned and handwritten documents score lower per field, which is why confidence scoring and a review step matter.

### Do I have to map every customer's format?

No. A modern order API reads documents contextually rather than by fixed template, so a new customer layout works on first contact without a setup project.

### How do I get access?

The [OrderSync extraction API](/extraction-api) is set up through an onboarding call that maps your catalog, customer aliases, and ERP target so the API returns matched, post-ready orders. You can try the extraction free first with the [PO Extractor](/tools/po-extractor).