Papers
arxiv:2606.21949

CapRiCorn-1K: A Comprehensive Benchmark for Video Captioning and Subject Referential Consistency Across Temporal Scales

Published on Jun 20
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A new benchmark named CapRiCorn-1K is introduced to evaluate video captioning quality and subject referential consistency across different durations and video domains, revealing current model limitations and demonstrating strong correlation with downstream task performance.

Accurate and comprehensive video captions with consistent subject references are critical for downstream understanding and generation tasks. However, few existing benchmarks can objectively and comprehensively evaluate these properties across diverse durations and scenarios, thereby hindering the advancement of video captioning models. To bridge this gap, we propose CapRiCorn-1K, a comprehensive benchmark designed to evaluate both video captioning quality and subject referential consistency across long temporal horizons and diverse video domains. To accommodate varied evaluation needs, our benchmark supports both audiovisual and visual-only settings. Extensive experiments on CapRiCorn-1K reveal that current models generally struggle to generate accurate and comprehensive captions while maintaining consistent subject references. Moreover, as video duration increases, both the overall caption quality and subject referential consistency decline. Notably, our evaluation metrics exhibit strong correlations with the performance of downstream understanding and generation tasks conditioned on the generated captions, further validating their effectiveness. The project is available at https://github.com/xlchen0205/CapRiCorn-1K .

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.21949
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.21949 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.21949 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.