Fine-Tuned Model
fjmgAI/m1-130M-mamba-CEFR-EN
Base Model
state-spaces/mamba-130m
Fine-Tuning Method
This is a Mamba model finetuned from state-spaces/mamba-130m using a custom Mamba Trainer for Text Classification.
Dataset
amontgomerie/cefr-levelled-english-texts
Description
This is a dataset of about 1500 English texts labelled with Common European Framework of Reference (CEFR) levels (A1, A2, B1, B2, C1, C2) representing the difficulty of the text for reading comprehension for language learners. The content is a mixture of dialogues, descriptions, short stories, newspaper stories, and other articles (or shorter extracts from stories/articles).
Fine-Tuning Details
- The model was trained using 1493 training samples.
- The model has been trained using a Custom Mamba Model for Text Classification.
Metric | Value |
---|---|
F1 | 0.733333 |
Usage
Direct Usage (Sentence Transformers)
First install the dependencies:
pip install uv
uv venv
source .venv/bin/activate
uv pip install --system datasets evaluate accelerate
uv pip install torch
uv pip install --system --no-build-isolation causal-conv1d>=1.1.0
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
export LC_ALL="en_US.UTF-8"
export LD_LIBRARY_PATH="/usr/lib64-nvidia"
export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
ldconfig /usr/lib64-nvidia
Then you can load this model with custom Mamba Model and run inference.
import os
import json
import torch
import random
import evaluate
import numpy as np
import torch.nn as nn
from typing import List
from transformers import Trainer
from datasets import load_dataset
from collections import namedtuple
from dataclasses import dataclass, field, asdict
from transformers import AutoTokenizer, TrainingArguments
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.utils.hf import load_config_hf, load_state_dict_hf
@dataclass
class MambaConfig:
d_model: int = 2560
n_layer: int = 64
vocab_size: int = 50277
d_intermediate: int = 0
tie_embeddings: bool = True
attn_cfg: dict = field(default_factory=dict)
attn_layer_idx: List[int] = field(default_factory=list)
ssm_cfg: dict = field(default_factory=dict)
rms_norm: bool = True
residual_in_fp32: bool = True
fused_add_norm: bool = True
pad_vocab_size_multiple: int = 8
def to_json_string(self):
return json.dumps(asdict(self))
def to_dict(self):
return asdict(self)
class MambaClassificationHead(nn.Module):
def __init__(self, d_model, num_classes, **kwargs):
super(MambaClassificationHead, self).__init__()
self.classification_head = nn.Linear(d_model, num_classes, **kwargs)
def forward(self, hidden_states):
return self.classification_head(hidden_states)
class MambaTextClassification(MambaLMHeadModel):
def __init__(
self,
config: MambaConfig,
initializer_cfg=None,
device=None,
dtype=None,
) -> None:
super().__init__(config, initializer_cfg, device, dtype)
self.classification_head = MambaClassificationHead(d_model=config.d_model, num_classes=6)
del self.lm_head
def forward(self, input_ids, attention_mask=None, labels=None):
hidden_states = self.backbone(input_ids)
mean_hidden_states = hidden_states.mean(dim=1)
logits = self.classification_head(mean_hidden_states)
if labels is None:
ClassificationOutput = namedtuple("ClassificationOutput", ["logits"])
return ClassificationOutput(logits=logits)
else:
ClassificationOutput = namedtuple("ClassificationOutput", ["loss", "logits"])
loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(logits, labels)
return ClassificationOutput(loss=loss, logits=logits)
def predict(self, text, tokenizer, id2label=None):
input_ids = torch.tensor(tokenizer(text)['input_ids'], device='cuda')[None]
with torch.no_grad():
logits = self.forward(input_ids).logits[0]
label = np.argmax(logits.cpu().numpy())
if id2label is not None:
return id2label[label]
else:
return label
@classmethod
def from_pretrained(cls, pretrained_model_name, device=None, dtype=None, **kwargs):
config_data = load_config_hf(pretrained_model_name)
config = MambaConfig(**config_data)
model = cls(config, device=device, dtype=dtype, **kwargs)
model_state_dict = load_state_dict_hf(pretrained_model_name, device=device, dtype=dtype)
model.load_state_dict(model_state_dict, strict=False)
print("Newly initialized embedding:", set(model.state_dict().keys()) - set(model_state_dict.keys()))
return model
model = MambaTextClassification.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")
tokenizer = AutoTokenizer.from_pretrained("fjmgAI/m1-130M-mamba-CEFR-EN")
model.to("cuda")
tokenizer.pad_token_id = tokenizer.eos_token_id
id2label = {0: "A1", 1: "A2", 2: "B1", 3: "B2", 4: "C1", 5:"C2"}
text = 'example text'
response = model.predict(text, tokenizer, id2label)
Framework Versions
- Python: 3.11.13
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1
- Mamba-ssm: 2.2.4
- Causal-conv1d: 1.5.0.post8
Purpose
This fine-tuned Mamba model is optimized for English text classification, accurately assigning CEFR proficiency levels by leveraging advanced sequence modeling, ideal for language assessment and educational applications.
- Developed by: fjmgAI
- License: apache-2.0
- Downloads last month
- 5
Model tree for fjmgAI/m1-130M-mamba-CEFR-EN
Base model
state-spaces/mamba-130m