Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Paper • 2312.10160 • Published • 2
How to use khhuang/chart-to-table with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="khhuang/chart-to-table") # Load model directly
from transformers import AutoTokenizer, AutoModelForImageTextToText
tokenizer = AutoTokenizer.from_pretrained("khhuang/chart-to-table")
model = AutoModelForImageTextToText.from_pretrained("khhuang/chart-to-table")How to use khhuang/chart-to-table with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "khhuang/chart-to-table"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khhuang/chart-to-table",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/khhuang/chart-to-table
How to use khhuang/chart-to-table with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "khhuang/chart-to-table" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khhuang/chart-to-table",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "khhuang/chart-to-table" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khhuang/chart-to-table",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use khhuang/chart-to-table with Docker Model Runner:
docker model run hf.co/khhuang/chart-to-table
The Chart-To-Table model was introduced in the paper "Do LVLMs Understand Charts?
Analyzing and Correcting Factual Errors in Chart Captioning" for converting a chart into a structured table. The generated tables use &&& to delimit rows and | to delimit columns. The underlying architecture of this model is UniChart.
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
model_name = "khhuang/chart-to-table"
model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
processor = DonutProcessor.from_pretrained(model_name)
image_path = "PATH_TO_IMAGE"
# Format text inputs
input_prompt = "<data_table_generation> <s_answer>"
# Encode chart figure and tokenize text
img = Image.open(IMAGE_PATH)
pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
pixel_values = pixel_values.cuda()
decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()#.squeeze(0)
outputs = model.generate(
pixel_values.cuda(),
decoder_input_ids=decoder_input_ids.cuda(),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
extracted_table = sequence.split("<s_answer>")[1].strip()
@misc{huang-etal-2023-do,
title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
author = "Huang, Kung-Hsiang and
Zhou, Mingyang and
Chan, Hou Pong and
Fung, Yi R. and
Wang, Zhenhailong and
Zhang, Lingyu and
Chang, Shih-Fu and
Ji, Heng",
year={2023},
eprint={2312.10160},
archivePrefix={arXiv},
primaryClass={cs.CL}
}