lokas commited on
Commit
87b08b3
·
verified ·
1 Parent(s): e58cd1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -2
README.md CHANGED
@@ -1,2 +1,60 @@
1
- # LSTM Spam Detector
2
- This is a simple LSTM model to detect spam messages.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - keras
6
+ - lstm
7
+ - spam-classification
8
+ - text-classification
9
+ - binary-classification
10
+ - email
11
+ - deep-learning
12
+ library_name: keras
13
+ pipeline_tag: text-classification
14
+ model_name: Spam Email Classifier (BiLSTM)
15
+ datasets:
16
+ - SetFit/enron_spam
17
+ ---
18
+
19
+ # 📧 Spam Email Classifier using BiLSTM
20
+
21
+ This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.
22
+
23
+ ---
24
+
25
+ ## 🧠 Model Architecture
26
+
27
+ - **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset
28
+ - **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
29
+ - **Model**: `Embedding → BiLSTM → Dropout → Dense(sigmoid)`
30
+ - **Input**: English email/message text
31
+ - **Output**: `0 = Ham`, `1 = Spam`
32
+
33
+ ---
34
+
35
+ ## 🧪 Example Usage
36
+
37
+ ```python
38
+ from tensorflow.keras.models import load_model
39
+ from huggingface_hub import hf_hub_download
40
+ import pickle
41
+ from tensorflow.keras.preprocessing.sequence import pad_sequences
42
+
43
+ # Load files from HF Hub
44
+ model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
45
+ tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")
46
+
47
+ # Load model and tokenizer
48
+ model = load_model(model_path)
49
+ with open(tokenizer_path, "rb") as f:
50
+ tokenizer = pickle.load(f)
51
+
52
+ # Prediction function
53
+ def predict_spam(text):
54
+ seq = tokenizer.texts_to_sequences([text])
55
+ padded = pad_sequences(seq, maxlen=50) # must match training maxlen
56
+ pred = model.predict(padded)[0][0]
57
+ return "🚫 Spam" if pred > 0.5 else "✅ Not Spam"
58
+
59
+ # Example
60
+ print(predict_spam("Win a free iPhone now!"))