Convert PDF Purchase Orders to JSON (or EDI) via API
How to turn PDF purchase orders into structured JSON or compliant EDI through an API, what the response looks like, and how to handle scanned and low-confidence documents.
To convert a PDF purchase order programmatically, POST it to an extraction API that returns structured JSON, then route that JSON either into your ERP or into a compliant EDI document, depending on who needs it next.
The PDF is the universal order format. Every buyer can produce one, which is exactly why it lands in your inbox more than any other channel. The problem is that a PDF is a picture of an order, not the order itself. To do anything automated with it, you have to turn it into data.
There are two destinations worth converting to, and the right one depends on what sits downstream.
Destination one: PDF to JSON
JSON is the format your own systems speak. Convert the PDF to JSON when the order is going into your ERP, your order management system, or any internal pipeline.
A single API call does it. You send the file, the API extracts and matches, and you get back a normalized object:
{
"po_number": "PO-48821",
"order_date": "2026-06-28",
"ship_to": { "name": "Acme Foods - Tacoma DC" },
"line_items": [
{
"raw_description": "CHOC BAR DARK 70% 12CT",
"matched_sku": "MV-DK70-12",
"quantity": 40,
"uom": "CASE",
"unit_price": 28.50,
"confidence": 0.98
}
],
"review_required": false
}
The value is in the matched_sku and confidence fields. A raw text dump of the PDF would still leave you mapping "CHOC BAR DARK 70% 12CT" to your catalog by hand. An order-aware extraction API does that resolution and tells you how sure it is, so high-confidence orders post automatically and the rest stop for review.
Destination two: PDF to EDI
Sometimes the order has to leave your building again as EDI, because a retailer or trading partner requires it. In that case you convert the PDF to a compliant X12 document rather than to internal JSON.
The mechanics are the same up front: extract and match. The difference is the output format. Instead of JSON, the system emits an EDI 850 purchase order that carries the partner's exact qualifiers, separators, and version. The X12 standard defines the transaction set, and each partner layers their own requirements on top. OrderSync handles this as PDF to EDI, and the same engine covers email to EDI when the order arrives as a message instead of an attachment.
If you want to see what a finished 850 should contain before you generate one, the EDI 850 purchase order guide breaks it down segment by segment, and the free EDI Inspector parses a real one in the browser.
Handling scanned and low-confidence PDFs
Not every PDF is clean. Plenty are scans of printed pages, or faxes saved to PDF, or photos a rep took on a phone. Conversion still works, with a few caveats:
- OCR runs first: The API reads the image, then extracts. Accuracy on clean machine-generated PDFs is typically 95% or better on quantities and pricing, and lower per field on scans and handwriting.
- UPCs make matching reliable: When a UPC is on the document, the identifier resolves to your SKU unambiguously (GS1 US governs that standard); when only a description exists, the alias logic does the work.
- Confidence routing matters more here: Because field accuracy drops on poor scans, the review flag protects your ERP, so a misread quantity never becomes a short shipment.
JSON or EDI: which destination
| Convert to JSON | Convert to EDI | |
|---|---|---|
| Use when | Order goes into your own ERP or systems | A trading partner requires an X12 document |
| Output | Normalized order object | Compliant EDI 850 with partner qualifiers |
| Consumer | Your create-order API | Retailer or partner mailbox (AS2, SFTP, VAN) |
| Extraction step | Identical | Identical |
The extraction and matching are the same for both. Only the output format changes, so you choose per destination rather than per document.
A practical sequence
- Receive the PDF, by email, upload, or webhook.
- POST it to the extraction endpoint.
- Read the JSON response and check
review_required. - If clean, post to your ERP or generate the EDI 850. If flagged, route to review.
- Keep the source document linked to the extracted data for audit and disputes.
Most teams that get orders in several formats run all of them through one multi-format pipeline so step two is identical whether the order was a PDF, an email, or an EDI file. The downstream code only ever sees clean JSON.
Frequently asked questions
Can I really convert any PDF layout without setup?
Yes. An AI-based extraction API reads documents by structure and meaning, not fixed coordinates, so a new customer layout converts on first contact without a template project.
JSON or EDI, which should I convert to?
JSON if the order is going into your own systems. EDI if a trading partner requires an X12 document. The same extraction feeds both, so you decide per destination, not per document.
What about the data on a bad scan?
OCR handles the image and confidence scoring handles the uncertainty. Low-confidence lines flag for review rather than posting silently.
How do I get an API key?
Request access to the extraction API. Onboarding maps your catalog and ERP so the conversion returns matched, post-ready orders. You can try the conversion free first with the PO Extractor.
Stop manually entering orders
OrderSync turns EDI, email, PDF, and fax orders into structured data automatically. See how it works for your business.
Related Articles
Purchase Order API: Extract PO Data Programmatically
How a purchase order API turns PDF and email POs into structured JSON your systems can post. What to expect from the endpoint, the response shape, and matching.
TechnologyAI Order Entry Systems: How They Work and When to Use One | OrderSync Blog
AI order entry systems extract purchase order data from any format without templates or manual setup. Here is how they work, where they outperform traditional systems, and where they do not.
TechnologyAI Order Agent vs EDI: Do You Still Need EDI?
How AI order agents compare to traditional EDI for B2B order processing, when you need both, and when an AI agent can replace EDI entirely.
TechnologyAI Order Agent vs Manual Entry Compared
A side-by-side comparison of AI order agents and manual data entry for B2B order processing, with real cost, speed, and accuracy numbers.
TechnologyAI Order Processing vs OCR: Key Differences
How AI-powered order processing compares to traditional OCR and template-based extraction, and why AI handles layout variations that break OCR systems.
TechnologyAI-Powered EDI Processing for Small Teams
EDI is mandatory for major retailers but brutal for small teams. AI-powered EDI processing automates validation, exception handling, and ERP sync.
TechnologyAI vs EDI vs API: B2B Order Processing
EDI and APIs handle data transport. AI handles data intelligence. The real question isn't which protocol to use, but how AI transforms order processing.
TechnologyEDI vs API: Choosing the Right Method
Compare EDI and API integration for e-commerce and retail. Pros, cons, costs, and use cases for each approach to help you decide.
TechnologyMore from the Blog
Document Parsing API for B2B Orders vs Generic OCR
Generic document parsing and OCR APIs read fields off a page. B2B order processing needs catalog matching and validation. Here is where the two diverge and which you need.
ComparisonsManaged EDI Services: Providers, Costs, and the Automated Alternative
What managed EDI services and EDI service providers do, what they cost, and when automated EDI software is the better fit. An honest guide for distributors and suppliers.
ComparisonsEDI Without ERP Integration: A Guide for Small Manufacturers | OrderSync Blog
How small manufacturers and suppliers become EDI capable without an ERP integration or a $50K API: any-format orders in, valid X12 out, into the EDI client you already use.
EDI IntegrationManual Order Entry: Costs, Error Rates, and Time (2026)
What manual order entry actually costs in 2026: per-order labor math from BLS wage data, verified error rate studies, and time benchmarks.
Order Automation