SDG Startup Classifier (18-label BERT-based Model)
Model Overview
This model is a BERT-base-uncased transformer fine-tuned for multiclass classification of startup companies into 18 categories: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.
It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:
Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals
Kfir Bar (2022) — Paper PDF
The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.
Intended Use
- Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
- Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
- Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.
Model Details
- Architecture: BERT-base-uncased (
bert-base-uncased
from Hugging Face Transformers) - Number of labels: 18 (17 SDGs + 1 no-impact)
- Tokenizer: BERT-base-uncased WordPiece tokenizer
- Training data: Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
- Training details: Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
- Performance: Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)
How to Use
Minimal example code to load and run inference using the Hugging Face Transformers library: from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch
model_name = "amannor/bert-base-uncased-sdg-classifier" Load tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) Input startup description text
text = "This startup develops affordable solar panels to improve clean energy access." Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) Forward pass
outputs = model(**inputs) Predicted class index (0 to 17, aligned with SDGs + no-impact)
predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()
print(f"Predicted SDG label ID: {predicted_label_id}")
Limitations
- The model relies solely on textual company descriptions, which might be promotional or biased (“greenwashing”).
- Performance may degrade on short, noisy, or non-English inputs.
- The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
- Intended to assist, not replace, expert judgment.
Citation
If you use this model, please cite:
@inproceedings{bar2022ijcai, title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, author={Bar, Kfir}, booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)}, year={2022} }
You may also wish to reference the accompanying repository:
https://github.com/Amannor/sdg-codebase
License
This model is released under the MIT License. For more information, see the LICENSE file in this repository.
Links and Resources
For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.
- Downloads last month
- 23
Model tree for amannor/bert-base-uncased-sdg-classifier
Base model
google-bert/bert-base-uncased