IMDb Movie Review Sentiment Analysis
Model description
imdb-movie-review-sentiment-analysis is a machine learning model that predicts the sentiment (positive or negative) of movie reviews from IMDb.
It uses a TF-IDF vectorizer for feature extraction and an ensemble of two classic machine learning classifiers: Logistic Regression and Multinomial Naive Bayes.
The model was trained on the official IMDb Large Movie Review Dataset (50,000 labeled reviews) with standard NLP preprocessing: lowercasing, special character removal, tokenization, stopword removal, and lemmatization.
Intended use:
- Sentiment analysis of English-language movie reviews
- Educational and research purposes
- As a baseline for more advanced NLP projects
Example usage
from inference import SentimentAnalyzer
# Initialize the analyzer (make sure model files are in 'saved_models/')
analyzer = SentimentAnalyzer(model_dir="saved_models")
# Predict sentiment for a new review
review = "This movie was absolutely fantastic! I loved every minute of it."
result = analyzer.predict(review)
print(result)
# Output example:
# {
# 'logistic_regression': {'prediction': 'positive', 'confidence': 0.98, ...},
# 'naive_bayes': {'prediction': 'positive', 'confidence': 0.95, ...}
# }
Metrics
- Logistic Regression Accuracy: 88.47%
- Naive Bayes Accuracy: 85.20%
- Evaluated on a held-out test set (20% of the IMDb dataset, 10,000 reviews).
Limitations
- Only works for English text.
- Not robust to sarcasm, irony, or highly ambiguous reviews.
- May not generalize well to domains outside of movie reviews.
- Does not handle emojis, slang, or non-standard text well.
- Classic ML models (not deep learning): may underperform on very complex language.
Training data
- IMDb Large Movie Review Dataset: 50,000 movie reviews labeled as positive or negative.
- Balanced: 25,000 positive and 25,000 negative reviews.
- Reviews are preprocessed (lowercased, cleaned, tokenized, stopwords removed, lemmatized).
- Dataset is widely used for benchmarking sentiment analysis models.
License
MIT
Author
Abdelmonem Hatem
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- Logistic Regression Accuracy on IMDb Large Movie Review Datasetself-reported0.885
- Naive Bayes Accuracy on IMDb Large Movie Review Datasetself-reported0.852