parthpk commited on
Commit
9509736
·
verified ·
1 Parent(s): d2d57d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -1,3 +1,25 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <h1 align="left"> GAEA: A Geolocation Aware Conversational Model</h1>
6
+
7
+ <h3 align="left"> Summary</h3>
8
+
9
+ <p align="justify"> Image geolocalization, in which, traditionally, an AI model predicts the precise GPS coordinates of an image is a challenging task with many downstream applications. However, the user cannot utilize the model to further their knowledge other than the GPS coordinate; the model lacks an understanding of the location and the conversational ability to communicate with the user. In recent days, with tremendous progress of large multimodal models (LMMs)—proprietary and open-source—researchers attempted to geolocalize images via LMMs. However, the issues remain unaddressed; beyond general tasks, for more specialized downstream tasks, one of which is geolocalization, LMMs struggle. In this work, we propose to solve this problem by introducing a conversational model `GAEA` that can provide information regarding the location of an image, as required by a user. No large-scale dataset enabling the training of such a model exists. Thus we propose a comprehensive dataset `GAEA-Train` with 800K images and around 1.6M question-answer pairs constructed by leveraging OpenStreetMap (OSM) attributes and geographical context clues. For quantitative evaluation, we propose a diverse benchmark, `GAEA-Bench` comprising 4K image-text pairs to evaluate conversational capabilities equipped with diverse question types. We consider 11 state-of-the-art open-source and proprietary LMMs and demonstrate that `GAEA` significantly outperforms the best open-source model, LLaVA-OneVision by 25.69% and best proprietary model, GPT-4o by 8.28%. We will publicly release our dataset and codes. </p>
10
+
11
+ ## `GAEA` is the first open-source conversational model for conversational capabilities equipped with global-scale geolocalization.
12
+
13
+ **Main contributions:**
14
+ 1) **`GAEA-Train: A Diverse Training Dataset:`** We propose GAEA-Train, a new dataset designed for training conversational image geolocalization models, incorporating diverse visual and contextual data.
15
+ 2) **`GAEA-Bench: Evaluating Conversational Geolocalization:`** To assess conversational capabilities in geolocalization, we introduce GAEA-Bench, a benchmark featuring various question-answer formats.
16
+ 3) **`GAEA: An Interactive Geolocalization Chatbot:`** We present GAEA, a conversational chatbot that extends beyond geolocalization to provide rich contextual insights about locations from images.
17
+ 4) **`Benchmarking Against State-of-the-Art LMMs:`** We quantitatively compare our model’s performance against 8 open-source and 3 proprietary LMMs, including GPT-4o and Gemini-2.0-Flash.
18
+
19
+ <b> This page is dedicated to the GAEA model </b>
20
+
21
+ <p align="center">
22
+ <img src="Assets/GeoLLM-Bench.jpg" alt="Geo-LLM-Bench"></a>
23
+ </p>
24
+
25
+ <h2 align="left"> GAEA-Bench Curation Pipeline</h2>