|
--- |
|
title: 'Vanishing Voices: Language Atlas' |
|
emoji: ๐ |
|
colorFrom: indigo |
|
colorTo: blue |
|
sdk: streamlit |
|
sdk_version: 1.44.1 |
|
app_file: rag_hf.py |
|
pinned: false |
|
--- |
|
# Vanishing Voices: South America's Endangered Language Atlas ๐ |
|
|
|
This app explores three retrieval-augmented generation (RAG) methods to support the documentation of South America's endangered indigenous languages: |
|
|
|
- **Standard Search**: Based on Wikipedia/Wikidata embeddings only. |
|
- **Hybrid Search**: Combines embeddings with RDF cultural knowledge. |
|
- **GraphSAGE Search**: Includes structural information from a graph neural network. |
|
|
|
## ๐ง Powered by |
|
|
|
- ๐ค [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) |
|
- ๐งฑ SentenceTransformers for multilingual embeddings |
|
- ๐งฎ NetworkX + RDFLib for cultural graphs |
|
- ๐ Glottolog, Wikidata, Wikipedia |
|
|
|
## ๐ Features |
|
|
|
- RAG with local numpy embeddings |
|
- RDF triple inspection |
|
- Comparison of methods in terms of relevance and hallucination |
|
- Custom prompt injected into a Hugging Face endpoint |
|
|
|
> Note: This app requires your own HF API token in `.streamlit/secrets.toml`. |
|
|
|
## ๐ Instructions |
|
|
|
1. Upload your own `.ttl`, `.pkl`, `.npy` files for graph and embeddings. |
|
2. Set up `HF_ENDPOINT` and `HF_API_TOKEN` in `.streamlit/secrets.toml`. |
|
3. Deploy via Streamlit or Hugging Face Spaces. |
|
|
|
--- |