2048 context length?

#1
by huyzed - opened
MLX Community org

In LM Studio, the context window is set to a max of 2048.

Is that expected? It seems quite low compared to all the other recent models I've worked with.

MLX Community org

Yea I know its weird but no this model along with the others has a context of 32768

MLX Community org

@Goekdeniz-Guelmez any idea how to override this max in LM Studio?

MLX Community org
edited Apr 30

Edit: Seems this doesn't work anymore.

@Goekdeniz-Guelmez any idea how to override this max in LM Studio?

In config.json file in a model folder change the value of line "max_position_embeddings" to "max_position_embeddings": 32768,

tl;dr
The model is fine – LM Studio guesses 2048 for MLX builds. Set the Context Length manually (gear icon ▶ 32768)

Why “Max context 2048”?

Good news it that seems to be just cosmetic – override it at load time.


The quick fix with no file editing

  1. In My Models ▸ GLM-4-32B-0414-4bit click ⚙︎ Load settings.
  2. Change Context length from 2048 → 32768 (or whatever your VRAM allows).
32 k @ 4-bit on a 32 B model is ~18 GiB just for the KV-cache – start lower if you’re on an M-series with <64 GB unified memory.
  3. Press Save as default → Load model.
 It might say “max 2048” in some places, but generation runs past that. I had it make a bunch of scripts for me and it didn't even get close to the context window filling up.

If loading with the REST SDK just add the parameter also:

{
  model: "mlx-community/GLM-4-32B-0414-4bit",
  loadConfig: {
    contextLength: 32768,
    ropeFrequencyBase: 1_000_000,   // optional but helps with >8 k
    ropeFrequencyScale: 1.0
  }
}

had done some other stuff but that seems to have fixed it for me. It didn't have a bos_token in the config and the eos_token was two values, also there was double quant stuff so also changed that. Im including my jacked up config if the above doesn't work by itself. But believe the above is what actually got it to work and if not my config is below.

{
  "architectures": ["Glm4ForCausalLM"],
  "attention_bias": false,
  "attention_dropout": 0.0,

  "bos_token_id": 151329,
  "eos_token_id": 151336,

  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 6144,
  "initializer_range": 0.02,
  "intermediate_size": 23040,
  "max_position_embeddings": 32768,
  "model_type": "glm4",
  "num_attention_heads": 48,
  "num_hidden_layers": 61,
  "num_key_value_heads": 2,
  "pad_token_id": 151329,
  "partial_rotary_factor": 0.5,

  "quantization": {
    "group_size": 64,
    "bits": 4
  },

  "rms_norm_eps": 1e-05,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.0",
  "use_cache": true,
  "vocab_size": 151552,

  "additional_eos_token_ids": [151329, 151338]
}

model_folder.png

modification.png

Sign up or log in to comment