---
license: apache-2.0
language:
- en
metrics:
- f1
base_model:
- state-spaces/mamba-130m
pipeline_tag: text-classification
tags:
- CEFR
library_name: mamba-ssm
---

---
[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)

## Fine-Tuned Model

**`fjmgAI/m1-130M-mamba-CEFR-EN`**

## Base Model
**`state-spaces/mamba-130m`**

## Fine-Tuning Method
This is a [Mamba](https://github.com/state-spaces/mamba) model finetuned from [state-spaces/mamba-130m](https://huggingface.co/state-spaces/mamba-130m) using a custom Mamba Trainer for Text Classification.

## Dataset
**[`amontgomerie/cefr-levelled-english-texts`](https://www.kaggle.com/datasets/amontgomerie/cefr-levelled-english-texts)**

### Description
This is a dataset of about 1500 English texts labelled with Common European Framework of Reference (CEFR) levels (A1, A2, B1, B2, C1, C2) representing the difficulty of the text for reading comprehension for language learners. The content is a mixture of dialogues, descriptions, short stories, newspaper stories, and other articles (or shorter extracts from stories/articles).

## Fine-Tuning Details
- The model was trained using 1493 training samples.
- The model has been trained using a Custom Mamba Model for Text Classification.

| Metric       | Value      |
|:-------------|:-----------|
| **F1** | **0.733333** |


## Usage

### Direct Usage (Sentence Transformers)

First install the dependencies:

```bash
pip install uv

uv venv
source .venv/bin/activate

uv pip install --system datasets evaluate accelerate
uv pip install torch
uv pip install --system --no-build-isolation causal-conv1d>=1.1.0
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"

export LC_ALL="en_US.UTF-8"
export LD_LIBRARY_PATH="/usr/lib64-nvidia"
export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"

ldconfig /usr/lib64-nvidia
```

Then you can load this model with custom Mamba Model and run inference.
```python
import os
import json
import torch
import random
import evaluate
import numpy as np
import torch.nn as nn
from typing import List
from transformers import Trainer
from datasets import load_dataset
from collections import namedtuple
from dataclasses import dataclass, field, asdict
from transformers import AutoTokenizer, TrainingArguments
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.utils.hf import load_config_hf, load_state_dict_hf


@dataclass
class MambaConfig:
    d_model: int = 2560
    n_layer: int = 64
    vocab_size: int = 50277
    d_intermediate: int = 0
    tie_embeddings: bool = True
    attn_cfg: dict = field(default_factory=dict)
    attn_layer_idx: List[int] = field(default_factory=list)
    ssm_cfg: dict = field(default_factory=dict)
    rms_norm: bool = True
    residual_in_fp32: bool = True
    fused_add_norm: bool = True
    pad_vocab_size_multiple: int = 8

    def to_json_string(self):
        return json.dumps(asdict(self))

    def to_dict(self):
        return asdict(self)

class MambaClassificationHead(nn.Module):
    def __init__(self, d_model, num_classes, **kwargs):
        super(MambaClassificationHead, self).__init__()
        self.classification_head = nn.Linear(d_model, num_classes, **kwargs)

    def forward(self, hidden_states):
        return self.classification_head(hidden_states)

class MambaTextClassification(MambaLMHeadModel):
    def __init__(
        self,
        config: MambaConfig,
        initializer_cfg=None,
        device=None,
        dtype=None,
    ) -> None:
        super().__init__(config, initializer_cfg, device, dtype)
        self.classification_head = MambaClassificationHead(d_model=config.d_model, num_classes=6)
        del self.lm_head

    def forward(self, input_ids, attention_mask=None, labels=None):
        hidden_states = self.backbone(input_ids)
        mean_hidden_states = hidden_states.mean(dim=1)
        logits = self.classification_head(mean_hidden_states)
        if labels is None:
          ClassificationOutput = namedtuple("ClassificationOutput", ["logits"])
          return ClassificationOutput(logits=logits)
        else:
          ClassificationOutput = namedtuple("ClassificationOutput", ["loss", "logits"])
          loss_fct = nn.CrossEntropyLoss()
          loss = loss_fct(logits, labels)
          return ClassificationOutput(loss=loss, logits=logits)

    def predict(self, text, tokenizer, id2label=None):
        input_ids = torch.tensor(tokenizer(text)['input_ids'], device='cuda')[None]
        with torch.no_grad():
          logits = self.forward(input_ids).logits[0]
          label = np.argmax(logits.cpu().numpy())
        if id2label is not None:
          return id2label[label]
        else:
          return label

    @classmethod
    def from_pretrained(cls, pretrained_model_name, device=None, dtype=None, **kwargs):
        config_data = load_config_hf(pretrained_model_name)
        config = MambaConfig(**config_data)
        model = cls(config, device=device, dtype=dtype, **kwargs)
        model_state_dict = load_state_dict_hf(pretrained_model_name, device=device, dtype=dtype)
        model.load_state_dict(model_state_dict, strict=False)
        print("Newly initialized embedding:", set(model.state_dict().keys()) - set(model_state_dict.keys()))
        return model


model = MambaTextClassification.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")
tokenizer = AutoTokenizer.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")

model.to("cuda")
tokenizer.pad_token_id = tokenizer.eos_token_id

id2label = {0: "A1", 1: "A2", 2: "B1", 3: "B2", 4: "C1", 5:"C2"}

text = 'example text'

response = model.predict(text, tokenizer, id2label)
```

### Framework Versions
- Python: 3.11.13
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1
- Mamba-ssm: 2.2.4
- Causal-conv1d: 1.5.0.post8

## Purpose
This fine-tuned Mamba model is optimized for **English text classification**, accurately assigning **CEFR proficiency levels** by leveraging advanced sequence modeling, ideal for **language assessment and educational applications**.

- **Developed by:** fjmgAI
- **License:** apache-2.0