--- license: apache-2.0 library_name: transformers pipeline_tag: image-text-to-text --- # LaViDa-LLaDa v1.0 Instruct (Transformers-Compatible) [[Github]](https://github.com/jacklishufan/LaViDa)[[Paper]](paper/paper.pdf) [[Arxiv]](https://arxiv.org/abs/2505.16839) [[Checkpoints]](https://huggingface.co/collections/jacklishufan/lavida-10-682ecf5a5fa8c5df85c61ded) [[Data]](https://huggingface.co/datasets/jacklishufan/lavida-train) [[Website]](https://homepage.jackli.org/projects/lavida/) This is a transformers-compatible version of the LaViDa-LLaDa checkpoint. It allows direct loading via Huggingface `transformers` APIs for easier inference and integration. ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained('./lavida-llada-v1.0-instruct/') model = AutoModelForCausalLM.from_pretrained('./lavida-llada-v1.0-instruct/', torch_dtype=torch.bfloat16) image_processor = model.get_vision_tower().image_processor model.resize_token_embeddings(len(tokenizer)) model.tie_weights() ``` ## License Apache 2.0