switched too Qwen 3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 for the vocab tokenizer as it currently the smartest AGI LLM that is open source; which make it incompitable with Mark1 as it uses deepseek-r1-0528 token/vocab
Activated conda/uv virtual environment at /venv/main
(main) root@C.25031464:/workspace$ python3 5p10.py train --preset small --amp --x2 --fresh
--block 1024
--save_dir /workspace/ckpts_qwen3_small_x2_1024
--save_every_sec 259200
tokenizer_config.json: 10.8kB [00:00, 31.5MB/s]
vocab.json: 2.78MB [00:00, 22.9MB/s]
merges.txt: 1.67MB [00:00, 26.0MB/s]
tokenizer.json: 7.03MB [00:00, 39.6MB/s]
[auto-steps] 3,229,687 training steps (@ 1024 tokens/step)
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 59166/59166 [00:22<00:00, 2673.56it/s]
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31428/31428 [00:00<00:00, 259629.53it/s]
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31411/31411 [00:00<00:00, 242792.61it/