amannor commited on
Commit
6f1363f
·
verified ·
1 Parent(s): 8c309be

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-classification
4
+ - sustainable-development-goals
5
+ - SDG
6
+ - transformers
7
+ - bert
8
+ - social-impact
9
+ license: mit
10
+ language:
11
+ - en
12
+ base_model:
13
+ - google-bert/bert-base-uncased
14
+ ---
15
+
16
+ # SDG Startup Classifier (18-label BERT-based Model)
17
+
18
+ [![Model](https://img.shields.io/badge/model-BERT--base--uncased-blue)](https://huggingface.co/bert-base-uncased)
19
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
20
+ [![Hugging Face](https://img.shields.io/badge/HuggingFace-BERT%20SDG%20Classifier-green)](https://huggingface.co/your-hf-username/your-model-repo-name)
21
+
22
+ ---
23
+
24
+ ## Model Overview
25
+
26
+ This model is a **BERT-base-uncased** transformer fine-tuned for multiclass classification of startup companies into **18 categories**: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.
27
+
28
+ It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:
29
+
30
+ > *Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals*
31
+ > Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
32
+
33
+ The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.
34
+
35
+ ---
36
+
37
+ ## Intended Use
38
+
39
+ - Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
40
+ - Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
41
+ - Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.
42
+
43
+ ---
44
+
45
+ ## Model Details
46
+
47
+ - **Architecture:** BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers)
48
+ - **Number of labels:** 18 (17 SDGs + 1 no-impact)
49
+ - **Tokenizer:** BERT-base-uncased WordPiece tokenizer
50
+ - **Training data:** Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
51
+ - **Training details:** Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
52
+ - **Performance:** Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)
53
+
54
+ ---
55
+
56
+ ## How to Use
57
+
58
+ Minimal example code to load and run inference using the Hugging Face Transformers library:
59
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
60
+ import torch
61
+
62
+ model_name = "amannor/bert-base-uncased-sdg-classifier"
63
+ Load tokenizer and model from Hugging Face Hub
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
66
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
67
+ Input startup description text
68
+
69
+ text = "This startup develops affordable solar panels to improve clean energy access."
70
+ Tokenize input text
71
+
72
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
73
+ Forward pass
74
+
75
+ outputs = model(**inputs)
76
+ Predicted class index (0 to 17, aligned with SDGs + no-impact)
77
+
78
+ predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()
79
+
80
+ print(f"Predicted SDG label ID: {predicted_label_id}")
81
+
82
+ ---
83
+
84
+ ## Limitations
85
+
86
+ - The model relies solely on **textual company descriptions**, which might be promotional or biased (“greenwashing”).
87
+ - Performance may degrade on short, noisy, or non-English inputs.
88
+ - The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
89
+ - Intended to assist, not replace, expert judgment.
90
+
91
+ ---
92
+
93
+ ## Citation
94
+
95
+ If you use this model, please cite:
96
+
97
+ @inproceedings{bar2022ijcai,
98
+ title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
99
+ author={Bar, Kfir},
100
+ booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
101
+ year={2022}
102
+ }
103
+
104
+
105
+ You may also wish to reference the accompanying repository:
106
+ https://github.com/Amannor/sdg-codebase
107
+
108
+ ---
109
+
110
+ ## License
111
+
112
+ This model is released under the **MIT License**. For more information, see the LICENSE file in this repository.
113
+
114
+ ---
115
+
116
+ ## Links and Resources
117
+
118
+ - [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase)
119
+ - [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
120
+
121
+ ---
122
+
123
+ *For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.*