|
--- |
|
library_name: transformers |
|
tags: |
|
- comics |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- VLR-CVC/ComicsPAP |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
--- |
|
|
|
# Lora Fine-Tune of Qwen2.5-VL-3B-Instruct on ComicsPAP datataset |
|
|
|
[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tunined simultaneously in all five tasks of the [ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP) dataset. |
|
The training was performed using a constant learning rate of 2e-4 with the AdamW optimizer. The model was trained for 5k steps using an effective batch size of 128. The LoRA configuration employed an α of 16, a dropout rate of 0.05, and a rank r = 8. |
|
|
|
## Results |
|
| Model | Repo | Sequence Filling (%) | Character Coherence (%) | Visual Closure (%) | Text Closure (%) | Caption Relevance (%) | Total (%) | |
|
| :------------------------: | :---------------------------------------------------------------------------------: | :------------------: | :---------------------: | :----------------: | :--------------: | :-------------------: | :-------: | |
|
| Random | | 20.22 | 50.00 | 14.41 | 25.00 | 25.00 | 24.30 | |
|
| Qwen2.5-VL-3B (Zero-Shot) | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 27.48 | 48.95 | 21.33 | 27.41 | 32.82 | 29.61 | |
|
| Qwen2.5-VL-7B (Zero-Shot) | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 30.53 | 54.55 | 22.00 | 37.45 | 40.84 | 34.91 | |
|
| Qwen2.5-VL-72B (Zero-Shot) | [Qwen/Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) | 46.88 | 53.84 | 23.66 | 55.60 | 38.17 | 41.27 | |
|
| Qwen2.5-VL-3B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-3B-Instruct-lora-ComicsPAP) | 62.21 | **93.01** | **42.33** | 63.71 | 35.49 | 55.55 | |
|
| Qwen2.5-VL-7B (Lora Fine-Tuned) | [VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP](https://huggingface.co/VLR-CVC/Qwen2.5-VL-7B-Instruct-lora-ComicsPAP) | **69.08** | **93.01** | 42.00 | **74.90** | **49.62** | **62.31** | |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
``` |
|
@misc{vivoli2025comicspap, |
|
title={ComicsPAP: understanding comic strips by picking the correct panel}, |
|
author={Emanuele Vivoli and Artemis Llabrés and Mohamed Ali Soubgui and Marco Bertini and Ernest Valveny Llobet and Dimosthenis Karatzas}, |
|
year={2025}, |
|
eprint={2503.08561}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2503.08561}, |
|
} |
|
|
|
@misc{qwen2.5-VL, |
|
title = {Qwen2.5-VL}, |
|
url = {https://qwenlm.github.io/blog/qwen2.5-vl/}, |
|
author = {Qwen Team}, |
|
month = {January}, |
|
year = {2025} |
|
} |
|
``` |