SDG Startup Classifier (18-label BERT-based Model)

Model
License: MIT
Hugging Face


Model Overview

This model is a BERT-base-uncased transformer fine-tuned for multiclass classification of startup companies into 18 categories: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.

It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:

Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals
Kfir Bar (2022) — Paper PDF

The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.


Intended Use

  • Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
  • Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
  • Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.

Model Details

  • Architecture: BERT-base-uncased (bert-base-uncased from Hugging Face Transformers)
  • Number of labels: 18 (17 SDGs + 1 no-impact)
  • Tokenizer: BERT-base-uncased WordPiece tokenizer
  • Training data: Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
  • Training details: Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
  • Performance: Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)

How to Use

Minimal example code to load and run inference using the Hugging Face Transformers library: from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch

model_name = "amannor/bert-base-uncased-sdg-classifier" Load tokenizer and model from Hugging Face Hub

tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) Input startup description text

text = "This startup develops affordable solar panels to improve clean energy access." Tokenize input text

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) Forward pass

outputs = model(**inputs) Predicted class index (0 to 17, aligned with SDGs + no-impact)

predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted SDG label ID: {predicted_label_id}")


Limitations

  • The model relies solely on textual company descriptions, which might be promotional or biased (“greenwashing”).
  • Performance may degrade on short, noisy, or non-English inputs.
  • The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
  • Intended to assist, not replace, expert judgment.

Citation

If you use this model, please cite:

@inproceedings{bar2022ijcai, title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, author={Bar, Kfir}, booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)}, year={2022} }

You may also wish to reference the accompanying repository:
https://github.com/Amannor/sdg-codebase


License

This model is released under the MIT License. For more information, see the LICENSE file in this repository.


Links and Resources


For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amannor/bert-base-uncased-sdg-classifier

Finetuned
(5654)
this model