Pixel-based Pre-training (PixelGPT)
Collection
[EMNLP'24] [Autoregressive Pre-Training on Pixels and Texts](https://arxiv.org/pdf/2404.10710). • 6 items • Updated • 1
How to use ernie-research/DualGPT with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="ernie-research/DualGPT") # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ernie-research/DualGPT", dtype="auto")How to use ernie-research/DualGPT with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ernie-research/DualGPT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ernie-research/DualGPT",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/ernie-research/DualGPT
How to use ernie-research/DualGPT with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "ernie-research/DualGPT" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ernie-research/DualGPT",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "ernie-research/DualGPT" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ernie-research/DualGPT",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use ernie-research/DualGPT with Docker Model Runner:
docker model run hf.co/ernie-research/DualGPT
This repository contains the official checkpoint for PixelGPT, as presented in the paper Autoregressive Pre-Training on Pixels and Texts (EMNLP 2024). For detailed instructions on how to use the model, please visit our GitHub page.
DualGPT is an autoregressive language model pre-trained on the dual modality of both pixels and texts. By processing documents as visual data (pixels), the model learns to predict both the next token and the next image patch in a sequence, enabling it to handle visually complex tasks in different modalities.
@misc{chai2024autoregressivepretrainingpixelstexts,
title = {Autoregressive Pre-Training on Pixels and Texts},
author = {Chai, Yekun and Liu, Qingyi and Xiao, Jingwu and Wang, Shuohuan and Sun, Yu and Wu, Hua},
year = {2024},
eprint = {2404.10710},
archiveprefix = {arXiv},
primaryclass = {cs.CL},
url = {https://arxiv.org/abs/2404.10710},
}