PP-DocLayoutV3 β ONNX export
ONNX export of PaddlePaddle/PP-DocLayoutV3_safetensors, the layout-detection model used in the PaddleOCR-VL-1.5 pipeline.
This export preserves all four model heads: classification logits, bounding boxes, instance-segmentation masks, and reading-order logits. The original PaddlePaddle release outputs polygon points and reading order via a postprocessor that consumes these four tensors.
Files
| file | size | purpose |
|---|---|---|
PP-DocLayoutV3.onnx |
~5 MB | model graph (references external weights) |
PP-DocLayoutV3.onnx.data |
~137 MB | weight tensors (must sit alongside .onnx) |
config.json |
β | original model config (HuggingFace-style) |
preprocessor_config.json |
β | image preprocessing parameters (800Γ800 resize, normalize) |
inference.yml |
β | original PaddlePaddle inference config (reference only) |
Inputs / outputs
Input (single tensor):
| name | shape | dtype | notes |
|---|---|---|---|
pixel_values |
(B, 3, 800, 800) |
float32 |
Resize image to 800Γ800, rescale by 1/255, mean=[0,0,0], std=[1,1,1] (matches preprocessor_config.json). |
Outputs (four tensors):
| name | shape | notes |
|---|---|---|
logits |
(B, 300, 25) |
per-query class logits over 25 layout classes |
pred_boxes |
(B, 300, 4) |
normalized (cx, cy, w, h) β convert via standard DETR decoding |
out_masks |
(B, 300, 200, 200) |
per-query instance-segmentation masks; cv2 contour extraction yields polygon points |
order_logits |
(B, 300, 300) |
per-query permutation logits for reading order; argmax / Sinkhorn for ordering |
Postprocessing
The official postprocessor lives in transformers.models.pp_doclayout_v3.image_processing_pp_doclayout_v3.PPDocLayoutV3ImageProcessor.post_process_object_detection. It takes the four output tensors plus a target_sizes tensor and returns:
{
"scores": (N,) float32
"labels": (N,) int64
"boxes": (N, 4) float32 β axis-aligned (x1, y1, x2, y2) in target coords
"polygon_points": list[N] each (P, 2) int polygon vertices in target coords
"order_seq": (N,) int64 β reading-order index
}
You can use that postprocessor directly (transformers >= 5.4, requires torch and cv2) or port it to numpy + cv2 for a torch-free runtime.
Loading
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("PP-DocLayoutV3.onnx", providers=["CPUExecutionProvider"])
# preprocess to 800x800 RGB float32, normalize per preprocessor_config.json
pixel_values = ... # shape (1, 3, 800, 800), float32
logits, pred_boxes, out_masks, order_logits = sess.run(
["logits", "pred_boxes", "out_masks", "order_logits"],
{"pixel_values": pixel_values},
)
The .onnx.data sidecar is loaded automatically by onnxruntime via the relative location reference embedded in the graph. Both files must sit in the same directory.
How this was exported
pip install transformers==5.6.2 torch==2.11 onnx==1.21 onnxscriptmodel = AutoModelForObjectDetection.from_pretrained("PaddlePaddle/PP-DocLayoutV3_safetensors").eval()- Wrap the model so
forward(pixel_values)returns(logits, pred_boxes, out_masks, order_logits). torch.onnx.export(wrapped, (pixel_values,), "PP-DocLayoutV3.onnx", opset_version=18, dynamo=True, dynamic_axes={"pixel_values": {0: "batch"}})- Re-save with
onnx.save(..., save_as_external_data=True, location="PP-DocLayoutV3.onnx.data")to standardize the sidecar filename.
Numerical parity vs torch (random (1, 3, 800, 800) input):
| output | max absolute diff |
|---|---|
logits |
1.32e-4 |
pred_boxes |
1.57e-5 |
out_masks |
1.62e-3 |
order_logits |
3.96e-2 |
The order_logits deviation reflects accumulated floating-point drift in the decoder's attention; argmax-based reading order is unaffected on the test images we checked.
Inference speed
CPU (Apple M-series, single page, 800Γ800 input): ~480 ms/page with CPUExecutionProvider.
Source
- Original weights: PaddlePaddle/PP-DocLayoutV3_safetensors
- Original PaddlePaddle release: PaddlePaddle/PP-DocLayoutV3
- Paper: PaddleOCR-VL-1.5 (arXiv:2601.21957)
License
Apache-2.0 (inherited from PaddlePaddle/PP-DocLayoutV3_safetensors).
- Downloads last month
- 14
Model tree for Bei0001/PP-DocLayoutV3-ONNX
Base model
PaddlePaddle/PP-DocLayoutV3