File size: 1,360 Bytes
fbe8caf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: "Vanishing Voices: Language Atlas"
emoji: "๐ŸŒ"
colorFrom: "indigo"
colorTo: "blue"
sdk: "streamlit"
sdk_version: "1.29.0"
app_file: "rag_hf.py"
pinned: false
---
# Vanishing Voices: South America's Endangered Language Atlas ๐ŸŒ

This app explores three retrieval-augmented generation (RAG) methods to support the documentation of South America's endangered indigenous languages:

- **Standard Search**: Based on Wikipedia/Wikidata embeddings only.
- **Hybrid Search**: Combines embeddings with RDF cultural knowledge.
- **GraphSAGE Search**: Includes structural information from a graph neural network.

## ๐Ÿง  Powered by

- ๐Ÿค— [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints)
- ๐Ÿงฑ SentenceTransformers for multilingual embeddings
- ๐Ÿงฎ NetworkX + RDFLib for cultural graphs
- ๐Ÿ”— Glottolog, Wikidata, Wikipedia

## ๐Ÿ“Š Features

- RAG with local numpy embeddings
- RDF triple inspection
- Comparison of methods in terms of relevance and hallucination
- Custom prompt injected into a Hugging Face endpoint

> Note: This app requires your own HF API token in `.streamlit/secrets.toml`.

## ๐Ÿ“„ Instructions

1. Upload your own `.ttl`, `.pkl`, `.npy` files for graph and embeddings.
2. Set up `HF_ENDPOINT` and `HF_API_TOKEN` in `.streamlit/secrets.toml`.
3. Deploy via Streamlit or Hugging Face Spaces.

---