--- license: apache-2.0 language: - en metrics: - f1 base_model: - state-spaces/mamba-130m pipeline_tag: text-classification tags: - CEFR library_name: mamba-ssm --- --- [](https://huggingface.co/fjmgAI) ## Fine-Tuned Model **`fjmgAI/m1-130M-mamba-CEFR-EN`** ## Base Model **`state-spaces/mamba-130m`** ## Fine-Tuning Method This is a [Mamba](https://github.com/state-spaces/mamba) model finetuned from [state-spaces/mamba-130m](https://huggingface.co/state-spaces/mamba-130m) using a custom Mamba Trainer for Text Classification. ## Dataset **[`amontgomerie/cefr-levelled-english-texts`](https://www.kaggle.com/datasets/amontgomerie/cefr-levelled-english-texts)** ### Description This is a dataset of about 1500 English texts labelled with Common European Framework of Reference (CEFR) levels (A1, A2, B1, B2, C1, C2) representing the difficulty of the text for reading comprehension for language learners. The content is a mixture of dialogues, descriptions, short stories, newspaper stories, and other articles (or shorter extracts from stories/articles). ## Fine-Tuning Details - The model was trained using 1493 training samples. - The model has been trained using a Custom Mamba Model for Text Classification. | Metric | Value | |:-------------|:-----------| | **F1** | **0.733333** | ## Usage ### Direct Usage (Sentence Transformers) First install the dependencies: ```bash pip install uv uv venv source .venv/bin/activate uv pip install --system datasets evaluate accelerate uv pip install torch uv pip install --system --no-build-isolation causal-conv1d>=1.1.0 uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4" export LC_ALL="en_US.UTF-8" export LD_LIBRARY_PATH="/usr/lib64-nvidia" export LIBRARY_PATH="/usr/local/cuda/lib64/stubs" ldconfig /usr/lib64-nvidia ``` Then you can load this model with custom Mamba Model and run inference. ```python import os import json import torch import random import evaluate import numpy as np import torch.nn as nn from typing import List from transformers import Trainer from datasets import load_dataset from collections import namedtuple from dataclasses import dataclass, field, asdict from transformers import AutoTokenizer, TrainingArguments from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel from mamba_ssm.utils.hf import load_config_hf, load_state_dict_hf @dataclass class MambaConfig: d_model: int = 2560 n_layer: int = 64 vocab_size: int = 50277 d_intermediate: int = 0 tie_embeddings: bool = True attn_cfg: dict = field(default_factory=dict) attn_layer_idx: List[int] = field(default_factory=list) ssm_cfg: dict = field(default_factory=dict) rms_norm: bool = True residual_in_fp32: bool = True fused_add_norm: bool = True pad_vocab_size_multiple: int = 8 def to_json_string(self): return json.dumps(asdict(self)) def to_dict(self): return asdict(self) class MambaClassificationHead(nn.Module): def __init__(self, d_model, num_classes, **kwargs): super(MambaClassificationHead, self).__init__() self.classification_head = nn.Linear(d_model, num_classes, **kwargs) def forward(self, hidden_states): return self.classification_head(hidden_states) class MambaTextClassification(MambaLMHeadModel): def __init__( self, config: MambaConfig, initializer_cfg=None, device=None, dtype=None, ) -> None: super().__init__(config, initializer_cfg, device, dtype) self.classification_head = MambaClassificationHead(d_model=config.d_model, num_classes=6) del self.lm_head def forward(self, input_ids, attention_mask=None, labels=None): hidden_states = self.backbone(input_ids) mean_hidden_states = hidden_states.mean(dim=1) logits = self.classification_head(mean_hidden_states) if labels is None: ClassificationOutput = namedtuple("ClassificationOutput", ["logits"]) return ClassificationOutput(logits=logits) else: ClassificationOutput = namedtuple("ClassificationOutput", ["loss", "logits"]) loss_fct = nn.CrossEntropyLoss() loss = loss_fct(logits, labels) return ClassificationOutput(loss=loss, logits=logits) def predict(self, text, tokenizer, id2label=None): input_ids = torch.tensor(tokenizer(text)['input_ids'], device='cuda')[None] with torch.no_grad(): logits = self.forward(input_ids).logits[0] label = np.argmax(logits.cpu().numpy()) if id2label is not None: return id2label[label] else: return label @classmethod def from_pretrained(cls, pretrained_model_name, device=None, dtype=None, **kwargs): config_data = load_config_hf(pretrained_model_name) config = MambaConfig(**config_data) model = cls(config, device=device, dtype=dtype, **kwargs) model_state_dict = load_state_dict_hf(pretrained_model_name, device=device, dtype=dtype) model.load_state_dict(model_state_dict, strict=False) print("Newly initialized embedding:", set(model.state_dict().keys()) - set(model_state_dict.keys())) return model model = MambaTextClassification.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN") tokenizer = AutoTokenizer.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN") model.to("cuda") tokenizer.pad_token_id = tokenizer.eos_token_id id2label = {0: "A1", 1: "A2", 2: "B1", 3: "B2", 4: "C1", 5:"C2"} text = 'example text' response = model.predict(text, tokenizer, id2label) ``` ### Framework Versions - Python: 3.11.13 - Transformers: 4.52.4 - PyTorch: 2.6.0+cu124 - Accelerate: 1.7.0 - Datasets: 2.14.4 - Tokenizers: 0.21.1 - Mamba-ssm: 2.2.4 - Causal-conv1d: 1.5.0.post8 ## Purpose This fine-tuned Mamba model is optimized for **English text classification**, accurately assigning **CEFR proficiency levels** by leveraging advanced sequence modeling, ideal for **language assessment and educational applications**. - **Developed by:** fjmgAI - **License:** apache-2.0