| --- |
| datasets: |
| - nkp37/OpenVid-1M |
| base_model: |
| - ali-vilab/i2vgen-xl |
| - THUDM/CogVideoX-5b |
| tags: |
| - video super-resolution |
| --- |
| # STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution |
|
|
| ### Code: https://github.com/NJU-PCALab/STAR |
| ### Paper: https://arxiv.org/abs/2501.02976 |
| ### Project Page: https://nju-pcalab.github.io/projects/STAR |
| ### Demo Video: https://youtu.be/hx0zrql-SrU |
|
|
|
|
| ## ⚙️ Dependencies and Installation |
| ``` |
| ## git clone this repository |
| git clone https://github.com/NJU-PCALab/STAR.git |
| cd STAR |
| |
| ## create an environment |
| conda create -n star python=3.10 |
| conda activate star |
| pip install -r requirements.txt |
| sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y |
| ``` |
|
|
| ## 🚀 Inference |
|
|
| ### Model Weight |
| | Base Model | Type | URL | |
| |------------|--------|-----------------------------------------------------------------------------------------------| |
| | I2VGen-XL | Light Degradation | [:link:](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/light_deg.pt?download=true) | |
| | I2VGen-XL | Heavy Degradation | [:link:](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/heavy_deg.pt?download=true) | |
| | CogVideoX-5B | Heavy Degradation | [:link:](https://huggingface.co/SherryX/STAR/tree/main/CogVideoX-5B-based) | |
|
|
| ### 1. I2VGen-XL-based |
| #### Step 1: Download the pretrained model STAR from [HuggingFace](https://huggingface.co/SherryX/STAR). |
| We provide two verisions for I2VGen-XL-based model, `heavy_deg.pt` for heavy degraded videos and `light_deg.pt` for light degraded videos (e.g., the low-resolution video downloaded from video websites). |
|
|
| You can put the weight into `pretrained_weight/`. |
|
|
| #### Step 2: Prepare testing data |
| You can put the testing videos in the `input/video/`. |
|
|
| As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt [using Pllava](https://github.com/hpcaitech/Open-Sora/tree/main/tools/caption#pllava-captioning). 3. Manually write the prompt. You can put the txt file in the `input/text/`. |
|
|
|
|
| #### Step 3: Change the path |
| You need to change the paths in `video_super_resolution/scripts/inference_sr.sh` to your local corresponding paths, including `video_folder_path`, `txt_file_path`, `model_path`, and `save_dir`. |
|
|
|
|
| #### Step 4: Running inference command |
| ``` |
| bash video_super_resolution/scripts/inference_sr.sh |
| ``` |
| If you encounter an OOM problem, you can set a smaller `frame_length` in `inference_sr.sh`. |
|
|
| ### 2. CogVideoX-based |
| Refer to these [instructions](https://github.com/NJU-PCALab/STAR/tree/main/cogvideox-based#cogvideox-based-model-inference) for inference with the CogVideX-5B-based model. |
|
|
| Please note that the CogVideX-5B-based model supports only 720x480 input. |