MindHeal Assistant

A multi-approach emotional support conversation system using the Emotional Support Conversations (esconv) dataset.[1]

Github Repo

https://github.com/DukeAIPI540Spring2025Meowth/nlp-demo

Overview

This project implements three approaches to emotional support conversations:

Naive Approach: Using a foundation model without special prompting, RAG, or finetuning
Traditional ML Approach: Hidden Markov Model (HMM)
Deep Learning Approach: Finetuned Llama-3.2-3B-Instruct model

Live Demo

The application is deployed on Digital Ocean: https://mindheal-assistant-7kfky.ondigitalocean.app/

Novelty and Contribution

Unlike existing chatbot-based emotional support systems, our approach integrates three distinct methodologies to compare and contrast effectiveness.
We introduce a new LLM-based evaluation framework using an LLM as a judge to provide structured scoring for emotional support conversations.
Our HMM-based emotion tracking model enhances structured dialogue generation in a way that hasn't been widely explored for emotional support systems.

Dataset

We used the esconv dataset, a crowd-sourced collection of emotional support conversations between therapists and patients.

Ethical Considerations on the Dataset

The esconv dataset consists of anonymized conversations between therapists and patients. While it provides a valuable resource for studying emotional support strategies, several ethical considerations must be addressed:

Bias and Representation: Since the dataset is anonymized, we do not have demographic information on the participants. This means we cannot ensure that it represents diverse populations across gender, race, socioeconomic status, or cultural backgrounds.
Therapeutic Quality: The dataset captures a range of therapist responses, but without knowing the professional qualifications of the individuals involved, we cannot verify whether all responses align with best practices in mental health support.
Potential for Misuse: As the dataset is used to train AI models, there is a risk that models may generate responses that appear empathetic but lack true understanding, which could be harmful in real-world mental health applications.
Limitations in Crisis Scenarios: The dataset does not include structured intervention for crisis situations such as imminent self-harm or suicide. Therefore, models trained on this dataset should not be relied upon for urgent mental health support.

We acknowledge these challenges and emphasize that MindHeal Assistant is an educational tool rather than a replacement for professional mental health services. We encourage future work on datasets that include more structured, clinically verified responses while ensuring inclusivity and representation.

Technical Details

Fine-tuning

Used torchtune with Low Rank Adaptation (LoRA) recipe for Llama-3.2-3B-Instruct
Used LORA configuration (3B_lora_single_device.yaml) copied and adapted with tune copy command
Training performed on Google Colab with A100 GPU
Fine-tuned for 5 epochs (~4-5 minutes per epoch)
Converted to GGUF format using llama-cpp
Applied quantization for model optimization to be runnable on CPUs

Hidden Markov Model (HMM)

Combines HMM for emotion state tracking with ML classifiers for emotion and problem detection
Uses TF-IDF vectorization with MultinomialNB for emotion classification
Employs RandomForest classifier for problem type categorization
Implements transition matrices between emotional states based on therapeutic progression
Maintains a library of response templates for different strategies (Question, Reflection, Suggestion, Information, Reassurance)
Response selection determined by current emotional state and conversation context

Evaluation (LLM-as-a-judge)

Implements a criteria-based evaluation framework using an LLM as a judge
Evaluates responses based on five key metrics:

Technical Accuracy (1-5): Application of proper therapeutic techniques
Structural Adherence (1-5): Following the ABCDE model in responses
Empathetic Tone (1-5): Level of emotional validation vs. robotic phrasing
Intervention Depth (1-5): Quality of follow-up questioning
Clinical Safety (1-5): Detection of risk factors and implementation of proper protocols

Compares performance across all three approaches (naive, traditional, and deep learning)

Results and Conclusion

Metric	Naive	ML	NN
Technical Accuracy	3.515	2.465	2.445
Structural Adherence	1.78	1.085	1.12
Empathetic Tone	4.275	3.275	3.45
Intervention Depth	2.475	1.66	1.66
Clinical Safety	2.865	2.055	2.12

Explanation of Results:

Naive Approach performed best in technical accuracy and empathetic tone, likely due to the foundation model's general-purpose conversational ability.
ML (HMM-based) and NN struggled with technical accuracy, potentially due to difficulty in mapping structured techniques to responses.
Structural adherence was low across all methods, with ML and NN slightly improving over the naive approach.
Empathy scores were highest for the naive approach, but this could be due to a lack of structured emotional support strategies.
Clinical safety scores were relatively low, indicating that no approach was fully adept at risk detection for sensitive topics like suicide intervention.

Presentation

For a detailed project overview, refer to our presentation.

Citation

[1] Liu, et al. (2021). Toward Emotional Support Dialog Systems. ACL.

Dataset

We used the esconv dataset, a crowd-sourced collection of emotional support conversations between therapists and patients.

haran-nallasivan
/

meowth-nlp-demo-0.1_llama-3.2-3b-instruct_q5_k_m_gguf

MindHeal Assistant

Github Repo

Overview

Live Demo

Novelty and Contribution

Dataset

Ethical Considerations on the Dataset

Technical Details

Fine-tuning

Hidden Markov Model (HMM)

Evaluation (LLM-as-a-judge)

Results and Conclusion

Explanation of Results:

Presentation

Citation

Dataset

Model tree for haran-nallasivan/meowth-nlp-demo-0.1_llama-3.2-3b-instruct_q5_k_m_gguf

Dataset used to train haran-nallasivan/meowth-nlp-demo-0.1_llama-3.2-3b-instruct_q5_k_m_gguf