---
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
---

# LaViDa-LLaDa v1.0 Instruct (Transformers-Compatible)

[[Github]](https://github.com/jacklishufan/LaViDa)[[Paper]](paper/paper.pdf) [[Arxiv]](https://arxiv.org/abs/2505.16839) [[Checkpoints]](https://huggingface.co/collections/jacklishufan/lavida-10-682ecf5a5fa8c5df85c61ded) [[Data]](https://huggingface.co/datasets/jacklishufan/lavida-train) [[Website]](https://homepage.jackli.org/projects/lavida/)

This is a transformers-compatible version of the LaViDa-LLaDa checkpoint. It allows direct loading via Huggingface `transformers` APIs for easier inference and integration.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained('./lavida-llada-v1.0-instruct/')
model = AutoModelForCausalLM.from_pretrained('./lavida-llada-v1.0-instruct/', torch_dtype=torch.bfloat16)
image_processor = model.get_vision_tower().image_processor

model.resize_token_embeddings(len(tokenizer))
model.tie_weights()
```

## License

Apache 2.0