Text-to-Video
Open-Sora
Safetensors
VisoLearn nielsr HF Staff commited on
Commit
a0586c2
·
verified ·
1 Parent(s): 2af71d3

No changes (#1)

Browse files

- No changes (20d0f11d6b111fc26c6458b51bb21366f1eb7a10)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
 
2
  license: apache-2.0
3
  pipeline_tag: text-to-video
4
- library_name: open-sora
5
  ---
6
 
7
  ## Open-Sora: Democratizing Efficient Video Production for All
@@ -83,6 +83,9 @@ Our model is optimized for image-to-video generation, but it can also be used fo
83
  # Generate one given prompt
84
  torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea"
85
 
 
 
 
86
  # Generation with csv
87
  torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --dataset.data-path assets/texts/example.csv
88
  ```
@@ -164,7 +167,7 @@ Use `--num-sample k` to generate `k` samples for each prompt.
164
 
165
  ## Computational Efficiency
166
 
167
- We test the computational efficiency of text-to-video on H100/H800 GPU. For 256x256, we use colossalai's tensor parallelism. For 768x768, we use colossalai's sequence parallelism. All use number of steps 50. The results are presented in the format: $\color{blue}{\text{Total time (s)}}/\color{red}{\text{peak GPU memory (GB)}}$
168
 
169
  | Resolution | 1x GPU | 2x GPUs | 4x GPUs | 8x GPUs |
170
  | ---------- | -------------------------------------- | ------------------------------------- | ------------------------------------- | ------------------------------------- |
@@ -177,10 +180,14 @@ On [VBench](https://huggingface.co/spaces/Vchitect/VBench_Leaderboard), Open-Sor
177
 
178
  ![VBench](https://github.com/hpcaitech/Open-Sora-Demo/blob/main/readme/v2_vbench.png)
179
 
180
- Human preference results show our model is on par with HunyuanVideo 14B and Step-Video 30B.
181
 
182
  ![Win Rate](https://github.com/hpcaitech/Open-Sora-Demo/blob/main/readme/v2_winrate.png)
183
 
 
 
 
 
184
  ## Contribution
185
 
186
  Thanks goes to these wonderful contributors:
@@ -215,12 +222,18 @@ Here we only list a few of the projects. For other works and datasets, please re
215
  ## Citation
216
 
217
  ```bibtex
218
- @software{opensora,
219
- author = {Zangwei Zheng and Xiangyu Peng and Tianji Yang and Chenhui Shen and Shenggui Li and Hongxin Liu and Yukun Zhou and Tianyi Li and Yang You},
220
- title = {Open-Sora: Democratizing Efficient Video Production for All},
221
- month = {March},
222
- year = {2024},
223
- url = {https://github.com/hpcaitech/Open-Sora}
 
 
 
 
 
 
224
  }
225
  ```
226
 
 
1
  ---
2
+ library_name: open-sora
3
  license: apache-2.0
4
  pipeline_tag: text-to-video
 
5
  ---
6
 
7
  ## Open-Sora: Democratizing Efficient Video Production for All
 
83
  # Generate one given prompt
84
  torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea"
85
 
86
+ # Save memory with offloading
87
+ torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --offload True
88
+
89
  # Generation with csv
90
  torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --dataset.data-path assets/texts/example.csv
91
  ```
 
167
 
168
  ## Computational Efficiency
169
 
170
+ We test the computational efficiency of text-to-video on H100/H800 GPU. For 256x256, we use colossalai's tensor parallelism, and `--offload True` is used. For 768x768, we use colossalai's sequence parallelism. All use number of steps 50. The results are presented in the format: $\color{blue}{\text{Total time (s)}}/\color{red}{\text{peak GPU memory (GB)}}$
171
 
172
  | Resolution | 1x GPU | 2x GPUs | 4x GPUs | 8x GPUs |
173
  | ---------- | -------------------------------------- | ------------------------------------- | ------------------------------------- | ------------------------------------- |
 
180
 
181
  ![VBench](https://github.com/hpcaitech/Open-Sora-Demo/blob/main/readme/v2_vbench.png)
182
 
183
+ Human preference results show our model is on par with HunyuanVideo 11B and Step-Video 30B.
184
 
185
  ![Win Rate](https://github.com/hpcaitech/Open-Sora-Demo/blob/main/readme/v2_winrate.png)
186
 
187
+ With strong performance, Open-Sora 2.0 is cost-effective.
188
+
189
+ ![Cost](https://github.com/hpcaitech/Open-Sora-Demo/blob/main/readme/v2_cost.png)
190
+
191
  ## Contribution
192
 
193
  Thanks goes to these wonderful contributors:
 
222
  ## Citation
223
 
224
  ```bibtex
225
+ @article{opensora,
226
+ title={Open-sora: Democratizing efficient video production for all},
227
+ author={Zheng, Zangwei and Peng, Xiangyu and Yang, Tianji and Shen, Chenhui and Li, Shenggui and Liu, Hongxin and Zhou, Yukun and Li, Tianyi and You, Yang},
228
+ journal={arXiv preprint arXiv:2412.20404},
229
+ year={2024}
230
+ }
231
+
232
+ @article{opensora2,
233
+ title={Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k},
234
+ author={Xiangyu Peng and Zangwei Zheng and Chenhui Shen and Tom Young and Xinying Guo and Binluo Wang and Hang Xu and Hongxin Liu and Mingyan Jiang and Wenjun Li and Yuhui Wang and Anbang Ye and Gang Ren and Qianran Ma and Wanying Liang and Xiang Lian and Xiwen Wu and Yuting Zhong and Zhuangyan Li and Chaoyu Gong and Guojun Lei and Leijun Cheng and Limin Zhang and Minghao Li and Ruijie Zhang and Silan Hu and Shijie Huang and Xiaokang Wang and Yuanheng Zhao and Yuqi Wang and Ziang Wei and Yang You},
235
+ year={2025},
236
+ journal={arXiv preprint arXiv:2503.09642},
237
  }
238
  ```
239