Gemma results weirdness

#18
by louisglobal - opened

Hi,
There are incoherent results between Gemma 3 paper and this eval toolkit. On the paper https://arxiv.org/pdf/2503.19786, they claim a 68.8 score on ChartQA versus 33.7 on the leaderboard ? To be honest ,I was not able to reproduce either since inference simply does not work on gemma with any dataset with code from the VLMEval git.

Sign up or log in to comment