Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
switched too Qwen 3 235B A22B Thinking 2507 qwen3-235b-a22b-thinking-2507 for the vocab tokenizer as it currently the smartest AGI LLM that is open source; which make it incompitable with Mark1 as it uses deepseek-r1-0528 token/vocab
|
2 |
+
Activated conda/uv virtual environment at /venv/main
|
3 |
+
(main) root@C.25031464:/workspace$ python3 5p10.py train --preset small --amp --x2 --fresh \
|
4 |
+
--block 1024 \
|
5 |
+
--save_dir /workspace/ckpts_qwen3_small_x2_1024 \
|
6 |
+
--save_every_sec 259200
|
7 |
+
tokenizer_config.json: 10.8kB [00:00, 31.5MB/s]
|
8 |
+
vocab.json: 2.78MB [00:00, 22.9MB/s]
|
9 |
+
merges.txt: 1.67MB [00:00, 26.0MB/s]
|
10 |
+
tokenizer.json: 7.03MB [00:00, 39.6MB/s]
|
11 |
+
[auto-steps] 3,229,687 training steps (@ 1024 tokens/step)
|
12 |
+
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 59166/59166 [00:22<00:00, 2673.56it/s]
|
13 |
+
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31428/31428 [00:00<00:00, 259629.53it/s]
|
14 |
+
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31411/31411 [00:00<00:00, 242792.61it/
|