truncation in tokenizer.json
#1
by
pooya-davoodi-parasail
- opened
Some of the models in neuralmagic repo have truncation set to null in their tokenizer.json but some of the others are set to a config with max_length.
I was wondering why the max_length is set and what benefits it has. We see some vllm errors that may be related to this.
Models that set truncation:neuralmagic/Qwen2.5-VL-3B-Instruct-quantized.w8a8
neuralmagic/Qwen2.5-VL-72B-Instruct-quantized.w8a8
"truncation": {
"direction": "Right",
"max_length": 2048,
"strategy": "LongestFirst",
"stride": 0
},
Models that don't set truncation:neuralmagic/Qwen2-VL-72B-Instruct-FP8-dynamic
neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w4a16
"truncation": null,