--- library_name: transformers tags: - comics license: cc-by-sa-4.0 datasets: - VLR-CVC/ComicsPAP language: - en base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- # Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset. The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8. ## Results | Model | Repo | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) | | :------------------------: | :---------------------------------------------------------------------------------: | :------------------: | :---------------------: | :----------------: | :--------------: | :-------------------: | :-------: | | Random | | 20.22 | 50.00 | 14.41 | 25.00 | 25.00 | 24.30 | | Qwen2.5-VL-3B (Zero-Shot) | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 27.48 | 48.95 | 21.33 | 27.41 | 32.82 | 29.61 | | Qwen2.5-VL-7B (Zero-Shot) | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 30.53 | 54.55 | 22.00 | 37.45 | 40.84 | 34.91 | | Qwen2.5-VL-72B (Zero-Shot) | [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) | 46.88 | 53.84 | 23.66 | 55.60 | 38.17 | 41.27 | | Qwen2.5-VL-3B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) | 62.21 | **93.01** | **42.33** | 63.71 | 35.49 | 55.55 | | Qwen2.5-VL-7B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) | **69.08** | **93.01** | 42.00 | **74.90** | **49.62** | **62.31** | ## Citation **BibTeX:** ``` @misc{vivoli2025comicspap, title={ComicsPAP: understanding comic strips by picking the correct panel}, author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas}, year={2025}, eprint={2503.08561}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.08561}, } @misc{qwen2.5-VL, title = {Qwen2.5-VL}, url = {https://qwenlm.github.io/blog/qwen2.5-vl/}, author = {Qwen Team}, month = {January}, year = {2025} } ```