Instructions to use czczup/textnet-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use czczup/textnet-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="czczup/textnet-base")# Load model directly from transformers import AutoImageProcessor, TextNetBackbone processor = AutoImageProcessor.from_pretrained("czczup/textnet-base") model = TextNetBackbone.from_pretrained("czczup/textnet-base") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| pipeline_tag: image-feature-extraction | |
| ## TextNet-T/S/B: Efficient Text Detection Models | |
| ### **Overview** | |
| [TextNet](https://arxiv.org/abs/2111.02394) is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed. | |
| ### **Performance** | |
| TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications. | |
| ### How to use | |
| ### Transformers | |
| ```bash | |
| pip install transformers | |
| ``` | |
| ```python | |
| import torch | |
| import requests | |
| from PIL import Image | |
| from transformers import AutoImageProcessor, AutoBackbone | |
| url = "http://images.cocodataset.org/val2017/000000039769.jpg" | |
| image = Image.open(requests.get(url, stream=True).raw) | |
| processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base") | |
| model = AutoBackbone.from_pretrained("jadechoghari/textnet-base") | |
| inputs = processor(image, return_tensors="pt") | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| ``` | |
| ### **Training** | |
| We first compare TextNet with representative hand-crafted backbones, | |
| such as ResNets and VGG16. For a fair comparison, | |
| all models are first pre-trained on IC17-MLT [52] and then | |
| finetuned on Total-Text. The proposed | |
| TextNet models achieve a better trade-off between accuracy | |
| and inference speed than previous hand-crafted models by a | |
| significant margin. In addition, notably, our TextNet-T, -S, and | |
| -B only have 6.8M, 8.0M, and 8.9M parameters respectively, | |
| which are more parameter-efficient than ResNets and VGG16. | |
| These results demonstrate that TextNet models are effective for | |
| text detection on the GPU device. | |
| ### **Applications** | |
| Perfect for real-world text detection tasks, including: | |
| - Natural scene text recognition | |
| - Multi-lingual and multi-oriented text detection | |
| - Document text region analysis | |
| ### **Contribution** | |
| This model was contributed by [Raghavan](https://huggingface.co/Raghavan), | |
| [jadechoghari](https://huggingface.co/jadechoghari) | |
| and [nielsr](https://huggingface.co/nielsr). |