czczup
/

textnet-base

Image Feature Extraction

Model card Files Files and versions

textnet-base / README.md

czczup's picture

Update README.md

6d2531f verified over 1 year ago

|

History Blame Contribute Delete

2.33 kB

	---
	library_name: transformers
	pipeline_tag: image-feature-extraction
	---
	## TextNet-T/S/B: Efficient Text Detection Models

	### Overview
	[TextNet](https://arxiv.org/abs/2111.02394) is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants TextNet-T, TextNet-S, and TextNet-B (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.

	### Performance
	TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.

	### How to use
	### Transformers
	```bash
	pip install transformers
	```

	```python
	import torch
	import requests
	from PIL import Image
	from transformers import AutoImageProcessor, AutoBackbone

	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)

	processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base")
	model = AutoBackbone.from_pretrained("jadechoghari/textnet-base")

	inputs = processor(image, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	```
	### Training
	We first compare TextNet with representative hand-crafted backbones,
	such as ResNets and VGG16. For a fair comparison,
	all models are first pre-trained on IC17-MLT [52] and then
	finetuned on Total-Text. The proposed
	TextNet models achieve a better trade-off between accuracy
	and inference speed than previous hand-crafted models by a
	significant margin. In addition, notably, our TextNet-T, -S, and
	-B only have 6.8M, 8.0M, and 8.9M parameters respectively,
	which are more parameter-efficient than ResNets and VGG16.
	These results demonstrate that TextNet models are effective for
	text detection on the GPU device.

	### Applications
	Perfect for real-world text detection tasks, including:
	- Natural scene text recognition
	- Multi-lingual and multi-oriented text detection
	- Document text region analysis

	### Contribution
	This model was contributed by [Raghavan](https://huggingface.co/Raghavan),
	[jadechoghari](https://huggingface.co/jadechoghari)
	and [nielsr](https://huggingface.co/nielsr).