Here’s a cool paper I found: “Massive Image Embedding Benchmark (MIEB).” It is a new tool to test how good image embedding models are. It has 130 different tasks grouped into 8 categories, like image search, classification, clustering similar images, answering questions based on images, and understanding documents. It even covers 38 different languages.
The authors tested 50 models and found that no single model was best at everything. Some models were great at recognizing text inside images but struggled to handle complicated tasks like matching images and text that appear together.