update readme
Browse files
README.md
CHANGED
@@ -8,4 +8,6 @@ These SAEs were trained using a blend of chat ([lmsys/lmsys-chat-1m](https://hug
|
|
8 |
|
9 |
Each SAE is trained using [BatchTopK](https://arxiv.org/abs/2412.06410). For each layer, we train 4 SAEs, with `k=32,64,128,256`.
|
10 |
|
11 |
-
For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen.
|
|
|
|
|
|
8 |
|
9 |
Each SAE is trained using [BatchTopK](https://arxiv.org/abs/2412.06410). For each layer, we train 4 SAEs, with `k=32,64,128,256`.
|
10 |
|
11 |
+
For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen.
|
12 |
+
|
13 |
+
Note: the first 8 tokens of each sample are excluded from training, and additionally activations with large outlier norms (>10x median norm of a batch) are filtered out.
|