truncation in tokenizer.json

#1
by pooya-davoodi-parasail - opened

Some of the models in neuralmagic repo have truncation set to null in their tokenizer.json but some of the others are set to a config with max_length.
I was wondering why the max_length is set and what benefits it has. We see some vllm errors that may be related to this.

Models that set truncation:
neuralmagic/Qwen2.5-VL-3B-Instruct-quantized.w8a8
neuralmagic/Qwen2.5-VL-72B-Instruct-quantized.w8a8

  "truncation": {
    "direction": "Right",
    "max_length": 2048,
    "strategy": "LongestFirst",
    "stride": 0
  },

Models that don't set truncation:
neuralmagic/Qwen2-VL-72B-Instruct-FP8-dynamic
neuralmagic/Qwen2.5-VL-7B-Instruct-quantized.w4a16

  "truncation": null,

Sign up or log in to comment