andyrdt commited on
Commit
c37e53c
·
verified ·
1 Parent(s): 9e5ce2d

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,4 +8,6 @@ These SAEs were trained using a blend of chat ([lmsys/lmsys-chat-1m](https://hug
8
 
9
  Each SAE is trained using [BatchTopK](https://arxiv.org/abs/2412.06410). For each layer, we train 4 SAEs, with `k=32,64,128,256`.
10
 
11
- For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen.
 
 
 
8
 
9
  Each SAE is trained using [BatchTopK](https://arxiv.org/abs/2412.06410). For each layer, we train 4 SAEs, with `k=32,64,128,256`.
10
 
11
+ For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen.
12
+
13
+ Note: the first 8 tokens of each sample are excluded from training, and additionally activations with large outlier norms (>10x median norm of a batch) are filtered out.