Léo Bourrel commited on
Commit
9f4af6f
·
1 Parent(s): 346c949

doc: add global description doc

Browse files
Files changed (1) hide show
  1. docs/sorbobot.md +55 -0
docs/sorbobot.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sorbobot: Expert Finder Chatbot Documentation
2
+
3
+ ## Overview
4
+
5
+ Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.
6
+
7
+ ### Context
8
+
9
+ Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.
10
+
11
+ ## System Architecture
12
+
13
+ Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:
14
+
15
+ 1. **Retrieval**: Identifies publications most similar to the user queries.
16
+ 2. **Generation**: Produces responses based on the context extracted from relevant publications.
17
+
18
+ ## Implementation Details
19
+
20
+ ### Programming Language and Libraries
21
+
22
+ - **Language**: Python
23
+ - **Frontend**: Streamlit
24
+ - **Database**: PostgreSQL with pgvector for similarity search
25
+ - **NLP Processing**: langchain and GPT4all libraries
26
+
27
+ ### Database
28
+
29
+ - **Postgres with pgvector**: Used for storing data and performing similarity searches based on cosine similarity metrics.
30
+
31
+ ### Natural Language Processing
32
+
33
+ - **Abstracts as Data Source**: The chatbot utilizes publication abstracts to identify experts.
34
+ - **GPT4all for Word Embedding**: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.
35
+
36
+ ### Retrieval Process
37
+
38
+ 1. **Query Processing**: User queries are processed to extract key terms.
39
+ 2. **Similarity Search**: The system searches the database using pgvector to find publications with low cosine distance to the query.
40
+ 3. **Expert Identification**: The system identifies authors of these publications, ensuring unique identification of experts.
41
+
42
+ ### Generation Process
43
+
44
+ 1. **Context Extraction**: Relevant information is extracted from the identified publications.
45
+ 2. **Response Generation**: Utilizes a LLM to generate informative responses based on the extracted context.
46
+
47
+ ## User Interaction Flow
48
+
49
+ 1. **Query Submission**: Users submit queries related to their expert search.
50
+ 2. **Chatbot Processing**: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
51
+ 3. **Response Presentation**: The system presents a list of experts, including unique identifiers and relevant publication abstracts.
52
+
53
+ ## Conclusion
54
+
55
+ Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.