bfshi
/

llava-v1.5-7b-s2-lora

Text Generation

Model card Files Files and versions Community

llava-v1.5-7b-s2-lora / README.md

bfshi's picture

Update README.md

404469f verified 12 months ago

|

history blame contribute delete

1.41 kB

	---
	{}
	---

	[![CODE](https://img.shields.io/badge/GitHub-Repository-<COLOR>)](https://github.com/bfshi/scaling_on_scales)

	# When Do We Not Need Larger Vision Models?

	## Model

	This is a LLaVA-v1.5-7b model trained with [S<sup>2</sup>-Wrapper](https://github.com/bfshi/scaling_on_scales), a simple approach to enable any vision model to perceive high-resolution images. We use image resolutions of up to 1008x1008 for this model.

	## Training

	The training pipeline and dataset completely follow [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA/tree/main). We use LoRA to fine-tune the model.

	## Benchmarking

	\| Version \| Size \| Schedule \| Checkpoint \| VQAv2 \| VizWiz \| TextVQA \| MMMU-val \| MathVista \| MM-Bench \| SEED \| MM-Vet \|
	\|----------\|----------\|-----------\|-----------\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| LLaVA-1.5 \| 7B \| full_ft-1e \| [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) \| 78.5 \| 50.0 \| 58.2 \| 36.2 \| 25.2 \| 64.3 \| 65.7 \| 31.1 \|
	\| LLaVA-1.5 \| 7B \| lora-1e \| [liuhaotian/llava-v1.5-7b-lora](https://huggingface.co/liuhaotian/llava-v1.5-7b-lora) \| 79.1 \| 47.8 \| 58.2 \| - \| - \| 66.1 \| - \| 30.2 \|
	\| LLaVA-1.5-S2 \| 7B \| lora-1e \| this model \| 80.0 \| 50.1 \| 61.0 \| 37.7 \| 25.3 \| 66.2 \| 67.9 \| 32.4 \|



	## License
	Llama 2 is licensed under the LLAMA 2 Community License,
	Copyright (c) Meta Platforms, Inc. All Rights Reserved.