|
--- |
|
license: mit |
|
base_model: |
|
- Shitao/OmniGen-v1 |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-to-image |
|
--- |
|
|
|
|
|
This repo contains bitsandbytes 4bit-NF4 float16 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). These are intended for Google Colab users or those with a GPU that does not support bfloat16. Other 4-bit seekers should prefer the [bf16-bnb-4bit](https://huggingface.co/gryan/OmniGen-v1-bnb-4bit) model as it produces higher quality images. For info about OmniGen see the [original model card](https://huggingface.co/Shitao/OmniGen-v1). |
|
|
|
|
|
- 8-bit weights: [gryan/OmniGen-v1-bnb-8bit](https://huggingface.co/gryan/OmniGen-v1-bnb-8bit) |
|
- 4-bit (bf16, nf4) weights: [gryan/OmniGen-v1-bnb-4bit](https://huggingface.co/gryan/OmniGen-v1-bnb-4bit) |
|
|
|
|
|
## Usage |
|
Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started. |
|
|
|
> [!IMPORTANT] |
|
> NOTE: This feature is not officially supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151). |
|
|
|
```python |
|
from OmniGen import OmniGenPipeline, OmniGen |
|
|
|
# pass the quantized model in the pipeline |
|
model = OmniGen.from_pretrained('gryan/OmniGen-v1-fp16-bnb-4bit', dtype=torch.float16) |
|
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model) |
|
|
|
# proceed as normal! |
|
|
|
## Text to Image |
|
images = pipe( |
|
prompt="A curly-haired man in a red shirt is drinking tea.", |
|
height=1024, |
|
width=1024, |
|
guidance_scale=2.5, |
|
seed=0, |
|
) |
|
images[0].save("example_t2i.png") # save output PIL Image |
|
|
|
## Multi-modal to Image |
|
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img> |
|
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>. |
|
images = pipe( |
|
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.", |
|
input_images=["./imgs/test_cases/two_man.jpg"], |
|
height=1024, |
|
width=1024, |
|
guidance_scale=2.5, |
|
img_guidance_scale=1.6, |
|
seed=0 |
|
) |
|
images[0].save("example_ti2i.png") # save output PIL image |
|
``` |
|
|
|
## Image Samples |
|
<img src="./assets/text_only_1111_fp16_4bit.png" alt="Text Only FP16 4bit"> |
|
<img src="./assets/single_img_1111_fp16_4bit.png" alt="Single Image FP16 4bit"> |
|
<img src="./assets/double_img_1111_fp16_4bit.png" alt="Double Image FP16 4bit"> |