Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gemma results weirdness
#18
by
louisglobal
- opened
Hi,
There are incoherent results between Gemma 3 paper and this eval toolkit. On the paper https://arxiv.org/pdf/2503.19786, they claim a 68.8 score on ChartQA versus 33.7 on the leaderboard ? To be honest ,I was not able to reproduce either since inference simply does not work on gemma with any dataset with code from the VLMEval git.