nkkbr
/

ViCA-ScanNet

Video-Text-to-Text

text-generation

vision-language

video understanding

spatial reasoning

visuospatial cognition

Model card Files Files and versions

ViCA-ScanNet / README.md

nkkbr's picture

Update README.md

a94043e verified about 1 year ago

|

612 Bytes

	---
	license: apache-2.0
	tags:
	- multimodal
	- vision-language
	- video understanding
	- spatial reasoning
	- visuospatial cognition
	- llava
	- qwen
	- llava-video
	datasets:
	- nkkbr/ViCA-322K
	- nkkbr/ViCA-thinking-2.68k
	language:
	- en
	library_name: transformers
	pipeline_tag: video-text-to-text
	model_name: ViCA-ScanNet-7B
	base_model: lmms-lab/LLaVA-Video-7B-Qwen2
	---
	## Usage and Full Documentation

	For detailed model description, training setup, datasets, evaluation results, and inference code, please refer to the main ViCA-7B README:

	[nkkbr/ViCA](https://huggingface.co/nkkbr/ViCA)