Update README.md

39de9ad verified about 1 month ago

440 Bytes

metadata

license: apache-2.0
base_model:
  - nvidia/OpenReasoning-Nemotron-32B
datasets:
  - HuggingFaceH4/ultrachat_200k

OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic

Method

Quantised using vllm-project/llm-compressor and the following configs:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
]