YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ABot-OCR
ABot-OCR is a document image OCR model that converts PDF/document page images into structured Markdown output, supporting recognition and reconstruction of text, mathematical formulas (LaTeX), tables (HTML), and other elements.
Code: https://github.com/amap-cvlab/ABot-OCR
Paper: https://arxiv.org/abs/2605.27978
Benchmarks
Requirements
Python 3.11 is recommended. Install the following dependencies:
pip install vllm==0.18.0 torch==2.10.0
Note: Inference uses vLLM to load the model. Sufficient GPU memory is required (~4GB model weights; actual usage depends on
batch_sizeand image resolution).
Inference
Inference script: abot-ocr-infer.py
1. Configure Model Path
Update the default model path in the script:
MODEL_PATH = "./abot-ocr" # Path to the model directory in this repo
2. Run from Command Line
Edit the parameters in the __main__ block at the bottom of abot-ocr-infer.py, then run:
python abot-ocr-infer.py
Acknowledgements
Our work is inspired by many excellent open-source projects. We sincerely thank the developers of Qwen-VL, PaddleOCR-VL, MinerU, and the broader OCR community.
- Downloads last month
- 41
