SDG Startup Classifier (18-label BERT-based Model)

Model Overview

This model is a BERT-base-uncased transformer fine-tuned for multiclass classification of startup companies into 18 categories: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.

It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:

Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals
Kfir Bar (2022) — Paper PDF

The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.

Intended Use

Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.

Model Details

Architecture: BERT-base-uncased (bert-base-uncased from Hugging Face Transformers)
Number of labels: 18 (17 SDGs + 1 no-impact)
Tokenizer: BERT-base-uncased WordPiece tokenizer
Training data: Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
Training details: Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
Performance: Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)

How to Use

Minimal example code to load and run inference using the Hugging Face Transformers library: from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch

model_name = "amannor/bert-base-uncased-sdg-classifier" Load tokenizer and model from Hugging Face Hub

tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) Input startup description text

text = "This startup develops affordable solar panels to improve clean energy access." Tokenize input text

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) Forward pass

outputs = model(**inputs) Predicted class index (0 to 17, aligned with SDGs + no-impact)

predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted SDG label ID: {predicted_label_id}")

Limitations

The model relies solely on textual company descriptions, which might be promotional or biased (“greenwashing”).
Performance may degrade on short, noisy, or non-English inputs.
The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
Intended to assist, not replace, expert judgment.

Citation

If you use this model, please cite:

@inproceedings{bar2022ijcai, title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, author={Bar, Kfir}, booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)}, year={2022} }

You may also wish to reference the accompanying repository:
https://github.com/Amannor/sdg-codebase

License

This model is released under the MIT License. For more information, see the LICENSE file in this repository.

Links and Resources

For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.

amannor
/

bert-base-uncased-sdg-classifier