This repository contains the official checkpoint for PixelGPT, as presented in the paper Autoregressive Pre-Training on Pixels and Texts (EMNLP 2024). For detailed instructions on how to use the model, please visit our GitHub page.

Model Description

DualGPT is an autoregressive language model pre-trained on the dual modality of both pixels and texts. By processing documents as visual data (pixels), the model learns to predict both the next token and the next image patch in a sequence, enabling it to handle visually complex tasks in different modalities.

Citation

@misc{chai2024autoregressivepretrainingpixelstexts,
  title = {Autoregressive Pre-Training on Pixels and Texts},
  author = {Chai, Yekun and Liu, Qingyi and Xiao, Jingwu and Wang, Shuohuan and Sun, Yu and Wu, Hua},
  year = {2024},
  eprint = {2404.10710},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url = {https://arxiv.org/abs/2404.10710},
}

Downloads last month: 27

Safetensors

Model size

0.4B params

Tensor type

F32

Collection including ernie-research/DualGPT

Pixel-based Pre-training (PixelGPT)

Collection

[EMNLP'24] [Autoregressive Pre-Training on Pixels and Texts](https://arxiv.org/pdf/2404.10710). • 6 items • Updated May 21, 2025 • 1

Paper for ernie-research/DualGPT

Dual Modalities of Text: Visual and Textual Generative Pre-training

Paper • 2404.10710 • Published Apr 16, 2024 • 2