IMDb Movie Review Sentiment Analysis

Model description

imdb-movie-review-sentiment-analysis is a machine learning model that predicts the sentiment (positive or negative) of movie reviews from IMDb.
It uses a TF-IDF vectorizer for feature extraction and an ensemble of two classic machine learning classifiers: Logistic Regression and Multinomial Naive Bayes.

The model was trained on the official IMDb Large Movie Review Dataset (50,000 labeled reviews) with standard NLP preprocessing: lowercasing, special character removal, tokenization, stopword removal, and lemmatization.

Intended use:

  • Sentiment analysis of English-language movie reviews
  • Educational and research purposes
  • As a baseline for more advanced NLP projects

Example usage

from inference import SentimentAnalyzer

# Initialize the analyzer (make sure model files are in 'saved_models/')
analyzer = SentimentAnalyzer(model_dir="saved_models")

# Predict sentiment for a new review
review = "This movie was absolutely fantastic! I loved every minute of it."
result = analyzer.predict(review)
print(result)
# Output example:
# {
#   'logistic_regression': {'prediction': 'positive', 'confidence': 0.98, ...},
#   'naive_bayes': {'prediction': 'positive', 'confidence': 0.95, ...}
# }

Metrics

  • Logistic Regression Accuracy: 88.47%
  • Naive Bayes Accuracy: 85.20%
  • Evaluated on a held-out test set (20% of the IMDb dataset, 10,000 reviews).

Limitations

  • Only works for English text.
  • Not robust to sarcasm, irony, or highly ambiguous reviews.
  • May not generalize well to domains outside of movie reviews.
  • Does not handle emojis, slang, or non-standard text well.
  • Classic ML models (not deep learning): may underperform on very complex language.

Training data

  • IMDb Large Movie Review Dataset: 50,000 movie reviews labeled as positive or negative.
  • Balanced: 25,000 positive and 25,000 negative reviews.
  • Reviews are preprocessed (lowercased, cleaned, tokenized, stopwords removed, lemmatized).
  • Dataset is widely used for benchmarking sentiment analysis models.

License

MIT


Author

Abdelmonem Hatem

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results

  • Logistic Regression Accuracy on IMDb Large Movie Review Dataset
    self-reported
    0.885
  • Naive Bayes Accuracy on IMDb Large Movie Review Dataset
    self-reported
    0.852