VLR-CVC
/

Qwen2.5-VL-7B-Instruct-lora-ComicsPAP

Model card Files Files and versions Community

Qwen2.5-VL-7B-Instruct-lora-ComicsPAP / README.md

Llabres's picture

Update README.md

dd930ba verified about 2 months ago

|

3.19 kB

	---
	library_name: transformers
	tags:
	- comics
	license: cc-by-sa-4.0
	datasets:
	- VLR-CVC/ComicsPAP
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	---

	# Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset

	[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tunined simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset.
	The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8.

	## Results
	\| Model \| Repo \| Sequence Filling (%) \| Character Coherence (%) \| Visual Closure (%) \| Text Closure (%) \| Caption Relevance (%) \| Total (%) \|
	\| :------------------------: \| :---------------------------------------------------------------------------------: \| :------------------: \| :---------------------: \| :----------------: \| :--------------: \| :-------------------: \| :-------: \|
	\| Random \| \| 20.22 \| 50.00 \| 14.41 \| 25.00 \| 25.00 \| 24.30 \|
	\| Qwen2.5-VL-3B (Zero-Shot) \| [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) \| 27.48 \| 48.95 \| 21.33 \| 27.41 \| 32.82 \| 29.61 \|
	\| Qwen2.5-VL-7B (Zero-Shot) \| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) \| 30.53 \| 54.55 \| 22.00 \| 37.45 \| 40.84 \| 34.91 \|
	\| Qwen2.5-VL-72B (Zero-Shot) \| [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) \| 46.88 \| 53.84 \| 23.66 \| 55.60 \| 38.17 \| 41.27 \|
	\| Qwen2.5-VL-3B (Lora Fine-Tuned) \| [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) \| 62.21 \| 93.01 \| 42.33 \| 63.71 \| 35.49 \| 55.55 \|
	\| Qwen2.5-VL-7B (Lora Fine-Tuned) \| [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) \| 69.08 \| 93.01 \| 42.00 \| 74.90 \| 49.62 \| 62.31 \|

	## Citation

	BibTeX:
	```
	@misc{vivoli2025comicspap,
	title={ComicsPAP: understanding comic strips by picking the correct panel},
	author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas},
	year={2025},
	eprint={2503.08561},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2503.08561},
	}

	@misc{qwen2.5-VL,
	title = {Qwen2.5-VL},
	url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
	author = {Qwen Team},
	month = {January},
	year = {2025}
	}
	```