π ViT Facial Expression Recognition
This model is a fine-tuned version of google/vit-base-patch16-224 for facial expression recognition on the FER2013 dataset.
π Model Performance
- Accuracy: 71.55%
- Dataset: FER2013 (35,887 images)
- Training Time: ~20 minutes on GPU
- Architecture: Vision Transformer (ViT-Base)
π― Supported Emotions
The model can classify faces into 7 different emotions:
- Angry π
- Disgust π€’
- Fear π¨
- Happy π
- Sad π’
- Surprise π²
- Neutral π
π Quick Start
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import torch
# Load model and processor
processor = ViTImageProcessor.from_pretrained('abhilash88/face-emotion-detection')
model = ViTForImageClassification.from_pretrained('abhilash88/face-emotion-detection')
# Load and preprocess image
image = Image.open('path_to_your_image.jpg')
inputs = processor(image, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# Emotion classes
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
predicted_emotion = emotions[predicted_class]
confidence = predictions[0][predicted_class].item()
print(f"Predicted Emotion: {predicted_emotion} ({confidence:.2f})")
πΈ Example Predictions
Here are some example predictions on real faces:
Smiling person
- True Emotion: Happy
- Predicted: Happy
- Confidence: 0.85
Person looking sad
- True Emotion: Sad
- Predicted: Sad
- Confidence: 0.40
Serious expression
- True Emotion: Angry
- Predicted: Neutral
- Confidence: 0.92
Surprised expression
- True Emotion: Surprise
- Predicted: Neutral
- Confidence: 0.69
Concerned look
- True Emotion: Fear
- Predicted: Happy
- Confidence: 0.85
Neutral expression
- True Emotion: Neutral
- Predicted: Happy
- Confidence: 0.58
Unpleasant expression
- True Emotion: Disgust
- Predicted: Neutral
- Confidence: 0.97
ποΈ Training Details
Training Hyperparameters
- Learning Rate: 5e-5
- Batch Size: 16
- Epochs: 3
- Optimizer: AdamW
- Weight Decay: 0.01
- Scheduler: Linear with warmup
Training Results
Epoch 1: Loss: 0.917, Accuracy: 66.90%
Epoch 2: Loss: 0.609, Accuracy: 69.32%
Epoch 3: Loss: 0.316, Accuracy: 71.55%
Data Preprocessing
- Image Resize: 224x224 pixels
- Normalization: ImageNet stats
- Data Augmentation:
- Random horizontal flip
- Random rotation (Β±15Β°)
- Color jitter
- Random translation
π Performance Analysis
The model achieves solid performance on FER2013, which is known to be a challenging dataset due to:
- Low resolution images (48x48 upscaled to 224x224)
- Crowdsourced labels with some noise
- High variation in lighting and pose
Accuracy by Emotion Class:
- Happy: ~86% (best performing)
- Surprise: ~84%
- Neutral: ~83%
- Angry: ~82%
- Sad: ~79%
- Fear: ~75%
- Disgust: ~68% (most challenging)
π§ Technical Details
Model Architecture
- Base Model: google/vit-base-patch16-224
- Parameters: ~86M
- Input Size: 224x224x3
- Patch Size: 16x16
- Number of Layers: 12
- Hidden Size: 768
- Attention Heads: 12
Dataset Information
- FER2013: 35,887 grayscale facial images
- Training Set: 28,709 images
- Validation Set: 3,589 images
- Test Set: 3,589 images
- Classes: 7 emotions (balanced evaluation set)
π‘ Usage Tips
- Best Results: Use clear, front-facing face images
- Preprocessing: Ensure faces are properly cropped and centered
- Lighting: Good lighting improves accuracy
- Resolution: Higher resolution images work better
π οΈ Model Limitations
- Trained only on FER2013 (limited diversity)
- May struggle with extreme poses or occlusions
- Performance varies across different demographics
- Best suited for clear facial expressions
π Citation
If you use this model, please cite:
@misc{face-emotion-detection,
author = {Abhilash},
title = {ViT Face Emotion Detection},
year = {2025},
publisher = {Hugging Face},
howpublished = {https://huggingface.co/abhilash88/face-emotion-detection}
}
π€ Acknowledgments
- FER2013 dataset creators
- Google Research for Vision Transformer
- Hugging Face for the transformers library
- The open-source ML community
π License
This model is released under the Apache 2.0 License.
Built with β€οΈ using Vision Transformers and PyTorch
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for abhilash88/face-emotion-detection
Base model
google/vit-base-patch16-224