metadata
base_model:
- meta-llama/Llama-2-70b-hf
base_model_relation: quantized
license: llama2
Model Card
- Base model:
meta-llama/Llama-2-70b-hf
- Quantization method: LNQ
- Target bit-width: 4
- Backend kernel: Any-Precision-LLM kernel (
ap-gemv
) - Calibration data: RedPajama (1024 sentences / 4096 tokens)
- Calibration objective: Next-token prediction
How to run
- Follow the instruction in https://github.com/snu-mllab/GuidedQuant.