fjmgAI/m1-130M-mamba-CEFR-EN

Fine-Tuned Model

fjmgAI/m1-130M-mamba-CEFR-EN

Base Model

state-spaces/mamba-130m

Fine-Tuning Method

This is a Mamba model finetuned from state-spaces/mamba-130m using a custom Mamba Trainer for Text Classification.

Dataset

amontgomerie/cefr-levelled-english-texts

Description

This is a dataset of about 1500 English texts labelled with Common European Framework of Reference (CEFR) levels (A1, A2, B1, B2, C1, C2) representing the difficulty of the text for reading comprehension for language learners. The content is a mixture of dialogues, descriptions, short stories, newspaper stories, and other articles (or shorter extracts from stories/articles).

Fine-Tuning Details

The model was trained using 1493 training samples.
The model has been trained using a Custom Mamba Model for Text Classification.

Metric	Value
F1	0.733333

Usage

Direct Usage (Sentence Transformers)

First install the dependencies:

pip install uv

uv venv
source .venv/bin/activate

uv pip install --system datasets evaluate accelerate
uv pip install torch
uv pip install --system --no-build-isolation causal-conv1d>=1.1.0
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"

export LC_ALL="en_US.UTF-8"
export LD_LIBRARY_PATH="/usr/lib64-nvidia"
export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"

ldconfig /usr/lib64-nvidia

Then you can load this model with custom Mamba Model and run inference.

import os
import json
import torch
import random
import evaluate
import numpy as np
import torch.nn as nn
from typing import List
from transformers import Trainer
from datasets import load_dataset
from collections import namedtuple
from dataclasses import dataclass, field, asdict
from transformers import AutoTokenizer, TrainingArguments
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.utils.hf import load_config_hf, load_state_dict_hf


@dataclass
class MambaConfig:
    d_model: int = 2560
    n_layer: int = 64
    vocab_size: int = 50277
    d_intermediate: int = 0
    tie_embeddings: bool = True
    attn_cfg: dict = field(default_factory=dict)
    attn_layer_idx: List[int] = field(default_factory=list)
    ssm_cfg: dict = field(default_factory=dict)
    rms_norm: bool = True
    residual_in_fp32: bool = True
    fused_add_norm: bool = True
    pad_vocab_size_multiple: int = 8

    def to_json_string(self):
        return json.dumps(asdict(self))

    def to_dict(self):
        return asdict(self)

class MambaClassificationHead(nn.Module):
    def __init__(self, d_model, num_classes, **kwargs):
        super(MambaClassificationHead, self).__init__()
        self.classification_head = nn.Linear(d_model, num_classes, **kwargs)

    def forward(self, hidden_states):
        return self.classification_head(hidden_states)

class MambaTextClassification(MambaLMHeadModel):
    def __init__(
        self,
        config: MambaConfig,
        initializer_cfg=None,
        device=None,
        dtype=None,
    ) -> None:
        super().__init__(config, initializer_cfg, device, dtype)
        self.classification_head = MambaClassificationHead(d_model=config.d_model, num_classes=6)
        del self.lm_head

    def forward(self, input_ids, attention_mask=None, labels=None):
        hidden_states = self.backbone(input_ids)
        mean_hidden_states = hidden_states.mean(dim=1)
        logits = self.classification_head(mean_hidden_states)
        if labels is None:
          ClassificationOutput = namedtuple("ClassificationOutput", ["logits"])
          return ClassificationOutput(logits=logits)
        else:
          ClassificationOutput = namedtuple("ClassificationOutput", ["loss", "logits"])
          loss_fct = nn.CrossEntropyLoss()
          loss = loss_fct(logits, labels)
          return ClassificationOutput(loss=loss, logits=logits)

    def predict(self, text, tokenizer, id2label=None):
        input_ids = torch.tensor(tokenizer(text)['input_ids'], device='cuda')[None]
        with torch.no_grad():
          logits = self.forward(input_ids).logits[0]
          label = np.argmax(logits.cpu().numpy())
        if id2label is not None:
          return id2label[label]
        else:
          return label

    @classmethod
    def from_pretrained(cls, pretrained_model_name, device=None, dtype=None, **kwargs):
        config_data = load_config_hf(pretrained_model_name)
        config = MambaConfig(**config_data)
        model = cls(config, device=device, dtype=dtype, **kwargs)
        model_state_dict = load_state_dict_hf(pretrained_model_name, device=device, dtype=dtype)
        model.load_state_dict(model_state_dict, strict=False)
        print("Newly initialized embedding:", set(model.state_dict().keys()) - set(model_state_dict.keys()))
        return model


model = MambaTextClassification.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")
tokenizer = AutoTokenizer.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")

model.to("cuda")
tokenizer.pad_token_id = tokenizer.eos_token_id

id2label = {0: "A1", 1: "A2", 2: "B1", 3: "B2", 4: "C1", 5:"C2"}

text = 'example text'

response = model.predict(text, tokenizer, id2label)

Framework Versions

Python: 3.11.13
Transformers: 4.52.4
PyTorch: 2.6.0+cu124
Accelerate: 1.7.0
Datasets: 2.14.4
Tokenizers: 0.21.1
Mamba-ssm: 2.2.4
Causal-conv1d: 1.5.0.post8

Purpose

This fine-tuned Mamba model is optimized for English text classification, accurately assigning CEFR proficiency levels by leveraging advanced sequence modeling, ideal for language assessment and educational applications.

Developed by: fjmgAI
License: apache-2.0

fjmgAI
/

m1-130M-mamba-CEFR-EN