File size: 1,939 Bytes
53883b4
e45188a
 
 
 
 
 
 
 
 
 
 
 
53883b4
 
f122193
53883b4
f122193
53883b4
636884e
 
e45188a
53883b4
f122193
53883b4
e45188a
53883b4
f122193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53883b4
e45188a
53883b4
f122193
 
 
 
 
 
 
 
53883b4
e45188a
53883b4
e45188a
53883b4
e45188a
 
 
 
 
 
 
 
 
53883b4
e45188a
53883b4
e45188a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: apache-2.0
base_model: albert/albert-base-v2
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: classify-clickbait-titll
  results: []
---

# Identify Clickbait Articles

This model is a fine-tuned version of [albert/albert-base-v2](https://huggingface.co/albert/albert-base-v2) on a synthetic dataset with 65% factual article titles and 35% clickbait articles.

Built to demonstrate the use of synthetic data following, see the article [here](https://towardsdatascience.com/fine-tune-smaller-transformer-models-text-classification-77cbbd3bf02b). 

## Model description

Built to identify factual vs clickbait titles.

## Intended uses & limitations

Use it on any title to understand how the model is interpreting the title, whether it is factual or clickbait. 

Go ahead and try a few of your own. 

Here are a few examples:

**Title:** A Comprehensive Guide for Getting Started with Hugging Face
**Output:** Factual

**Title:** OpenAI GPT-4o: The New Best AI Model in the World. Like in the Movies. For Free
**Output:** Clickbait

**Title:** GPT4 Omni — So much more than just a voice assistant
**Output:** Clickbait

**Title:** Building Vector Databases with FastAPI and ChromaDB
**Output:** Factual

## Training and evaluation data

It achieves the following results on the evaluation set:
- Loss: 0.0173
- Accuracy: 0.9951
- F1: 0.9951
- Precision: 0.9951
- Recall: 0.9951
- Accuracy Label Clickbait: 0.9866
- Accuracy Label Factual: 1.0

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 3

### Framework versions

- Transformers 4.41.0
- Pytorch 2.2.1+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1