mini-kh-OCR β Khmer & English Document OCR Pipeline
An end-to-end OCR pipeline that combines two models to detect, classify, and recognise Khmer and English text from document images.
Input Image
β
βΌ
βββββββββββββββββββββββββββββββ
β Text Detection β phonsobon/mini-text-detection (YOLO11n)
β β subject / reference / β
β content bounding boxes β
βββββββββββββββ¬ββββββββββββββββ
β crop each region
βΌ
βββββββββββββββββββββββββββββββ
β Text Recognition β phonsobon/mini-ocr (CRNN + CTC)
β β Khmer & English text β
βββββββββββββββ¬ββββββββββββββββ
β
βΌ
Structured output
grouped by class
Detection Classes
| ID | Class | Khmer | Description |
|---|---|---|---|
0 |
subject |
ααααααααα» | Title or subject heading |
1 |
reference |
ααα | Reference or citation |
2 |
content |
α’ααααα | Main body / paragraph text |
Models Used
| Role | Repository |
|---|---|
| Text Detection | phonsobon/mini-text-detection |
| Text Recognition | phonsobon/mini-ocr |
Files
| File | Description |
|---|---|
mini_kh_ocr.py |
Pipeline class β load and import this |
Installation
pip install torch torchvision ultralytics huggingface_hub pillow numpy
Quick Start
from huggingface_hub import hf_hub_download
# Download pipeline script
pipeline_path = hf_hub_download(
repo_id="phonsobon/mini-kh-OCR",
filename="mini_kh_ocr.py",
)
import importlib.util, sys
spec = importlib.util.spec_from_file_location("mini_kh_ocr", pipeline_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
MiniKhOCR = mod.MiniKhOCR
# ββ Load pipeline βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ocr = MiniKhOCR()
# ββ Run on an image βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
result = ocr("your_document.jpg")
Output Format
result is a dictionary with the following structure:
{
"subject": ["ααααααααα»: ααααΎααα»αα
αααΆαα"], # ααααααααα» β subject/heading texts
"reference": ["ααα: ααα α α α‘/α’α€"], # ααα β reference texts
"content": ["α’ααααα...", "..."], # α’ααααα β body paragraph texts
"regions": [ # all detections sorted top β bottom
{
"class": "subject",
"conf": 0.91,
"box": {"x1": 10, "y1": 5, "x2": 320, "y2": 40},
"text": "ααααααααα»: ααααΎααα»αα
αααΆαα",
},
{
"class": "reference",
"conf": 0.87,
"box": {"x1": 10, "y1": 50, "x2": 200, "y2": 75},
"text": "ααα: ααα α α α‘/α’α€",
},
...
]
}
Usage Examples
Access text by class
result = ocr("document.jpg")
print("=== SUBJECT ===")
for text in result["subject"]:
print(text)
print("=== REFERENCE ===")
for text in result["reference"]:
print(text)
print("=== CONTENT ===")
for text in result["content"]:
print(text)
Format as a structured document
document = ocr.to_document(result)
print(document)
# Output:
# [SUBJECT]
# ααααααααα»: ααααΎααα»αα
αααΆαα
#
# [REFERENCE]
# ααα: ααα α α α‘/α’α€
#
# [CONTENT]
# α’αααααααααΌα
# α’αααααααΈααΈα
Verbose mode β print each region as it is processed
result = ocr("document.jpg", verbose=True)
# [subject] (10,5)β(320,40) conf=0.91 β 'ααααααααα»: ααααΎααα»αα
αααΆαα'
# [reference] (10,50)β(200,75) conf=0.87 β 'ααα: ααα α α α‘/α’α€'
# [content] (10,90)β(600,120) conf=0.93 β 'α’αααααααααΌα'
Get cropped images alongside text
result = ocr("document.jpg", return_crops=True)
for region in result["regions"]:
print(region["class"], "β", region["text"])
region["crop"].show() # PIL Image of the cropped region
Batch processing
import os
folder = "path/to/documents/"
all_results = {}
for fname in os.listdir(folder):
if fname.lower().endswith((".jpg", ".jpeg", ".png")):
path = os.path.join(folder, fname)
result = ocr(path)
all_results[fname] = {
"subject": result["subject"],
"reference": result["reference"],
"content": result["content"],
}
print(f"β
{fname} β {len(result['regions'])} regions detected")
Export to JSON
import json
result = ocr("document.jpg")
# Remove PIL crops before serialising (not JSON-serialisable)
exportable = {
"subject": result["subject"],
"reference": result["reference"],
"content": result["content"],
"regions": [
{k: v for k, v in r.items() if k != "crop"}
for r in result["regions"]
],
}
with open("output.json", "w", encoding="utf-8") as f:
json.dump(exportable, f, ensure_ascii=False, indent=2)
Configuration
ocr = MiniKhOCR(
det_conf = 0.25, # lower β more detections, higher β fewer but more confident
det_iou = 0.45, # NMS IoU threshold
det_imgsz = 640, # detection image size
device = "auto", # "auto" | "cuda" | "cpu"
)
Limitations
- Designed for document-style images (printed text, clear layout).
- Text recognition works best on single-line crops β very tall content regions spanning multiple lines may merge lines together.
- Handwritten text is not supported.
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support