No changes (#1)
Browse files- No changes (20d0f11d6b111fc26c6458b51bb21366f1eb7a10)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
|
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-to-video
|
4 |
-
library_name: open-sora
|
5 |
---
|
6 |
|
7 |
## Open-Sora: Democratizing Efficient Video Production for All
|
@@ -83,6 +83,9 @@ Our model is optimized for image-to-video generation, but it can also be used fo
|
|
83 |
# Generate one given prompt
|
84 |
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea"
|
85 |
|
|
|
|
|
|
|
86 |
# Generation with csv
|
87 |
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --dataset.data-path assets/texts/example.csv
|
88 |
```
|
@@ -164,7 +167,7 @@ Use `--num-sample k` to generate `k` samples for each prompt.
|
|
164 |
|
165 |
## Computational Efficiency
|
166 |
|
167 |
-
We test the computational efficiency of text-to-video on H100/H800 GPU. For 256x256, we use colossalai's tensor parallelism. For 768x768, we use colossalai's sequence parallelism. All use number of steps 50. The results are presented in the format: $\color{blue}{\text{Total time (s)}}/\color{red}{\text{peak GPU memory (GB)}}$
|
168 |
|
169 |
| Resolution | 1x GPU | 2x GPUs | 4x GPUs | 8x GPUs |
|
170 |
| ---------- | -------------------------------------- | ------------------------------------- | ------------------------------------- | ------------------------------------- |
|
@@ -177,10 +180,14 @@ On [VBench](https://huggingface.co/spaces/Vchitect/VBench_Leaderboard), Open-Sor
|
|
177 |
|
178 |

|
179 |
|
180 |
-
Human preference results show our model is on par with HunyuanVideo
|
181 |
|
182 |

|
183 |
|
|
|
|
|
|
|
|
|
184 |
## Contribution
|
185 |
|
186 |
Thanks goes to these wonderful contributors:
|
@@ -215,12 +222,18 @@ Here we only list a few of the projects. For other works and datasets, please re
|
|
215 |
## Citation
|
216 |
|
217 |
```bibtex
|
218 |
-
@
|
219 |
-
|
220 |
-
|
221 |
-
|
222 |
-
year
|
223 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
224 |
}
|
225 |
```
|
226 |
|
|
|
1 |
---
|
2 |
+
library_name: open-sora
|
3 |
license: apache-2.0
|
4 |
pipeline_tag: text-to-video
|
|
|
5 |
---
|
6 |
|
7 |
## Open-Sora: Democratizing Efficient Video Production for All
|
|
|
83 |
# Generate one given prompt
|
84 |
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea"
|
85 |
|
86 |
+
# Save memory with offloading
|
87 |
+
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --offload True
|
88 |
+
|
89 |
# Generation with csv
|
90 |
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --dataset.data-path assets/texts/example.csv
|
91 |
```
|
|
|
167 |
|
168 |
## Computational Efficiency
|
169 |
|
170 |
+
We test the computational efficiency of text-to-video on H100/H800 GPU. For 256x256, we use colossalai's tensor parallelism, and `--offload True` is used. For 768x768, we use colossalai's sequence parallelism. All use number of steps 50. The results are presented in the format: $\color{blue}{\text{Total time (s)}}/\color{red}{\text{peak GPU memory (GB)}}$
|
171 |
|
172 |
| Resolution | 1x GPU | 2x GPUs | 4x GPUs | 8x GPUs |
|
173 |
| ---------- | -------------------------------------- | ------------------------------------- | ------------------------------------- | ------------------------------------- |
|
|
|
180 |
|
181 |

|
182 |
|
183 |
+
Human preference results show our model is on par with HunyuanVideo 11B and Step-Video 30B.
|
184 |
|
185 |

|
186 |
|
187 |
+
With strong performance, Open-Sora 2.0 is cost-effective.
|
188 |
+
|
189 |
+

|
190 |
+
|
191 |
## Contribution
|
192 |
|
193 |
Thanks goes to these wonderful contributors:
|
|
|
222 |
## Citation
|
223 |
|
224 |
```bibtex
|
225 |
+
@article{opensora,
|
226 |
+
title={Open-sora: Democratizing efficient video production for all},
|
227 |
+
author={Zheng, Zangwei and Peng, Xiangyu and Yang, Tianji and Shen, Chenhui and Li, Shenggui and Liu, Hongxin and Zhou, Yukun and Li, Tianyi and You, Yang},
|
228 |
+
journal={arXiv preprint arXiv:2412.20404},
|
229 |
+
year={2024}
|
230 |
+
}
|
231 |
+
|
232 |
+
@article{opensora2,
|
233 |
+
title={Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k},
|
234 |
+
author={Xiangyu Peng and Zangwei Zheng and Chenhui Shen and Tom Young and Xinying Guo and Binluo Wang and Hang Xu and Hongxin Liu and Mingyan Jiang and Wenjun Li and Yuhui Wang and Anbang Ye and Gang Ren and Qianran Ma and Wanying Liang and Xiang Lian and Xiwen Wu and Yuting Zhong and Zhuangyan Li and Chaoyu Gong and Guojun Lei and Leijun Cheng and Limin Zhang and Minghao Li and Ruijie Zhang and Silan Hu and Shijie Huang and Xiaokang Wang and Yuanheng Zhao and Yuqi Wang and Ziang Wei and Yang You},
|
235 |
+
year={2025},
|
236 |
+
journal={arXiv preprint arXiv:2503.09642},
|
237 |
}
|
238 |
```
|
239 |
|