--- license: apache-2.0 --- Residual stream SAEs for [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). These SAEs were trained using a blend of chat ([lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)) and pretraining data ([monology/pile-uncopyrighted](https://huggingface.co/datasets/monology/pile-uncopyrighted)), and also a small amount of [emergent misalignment data](https://github.com/emergent-misalignment/emergent-misalignment/). Each SAE is trained using [BatchTopK](https://arxiv.org/abs/2412.06410). For each layer, we train 4 SAEs, with `k=32,64,128,256`. For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen. Note: the first 8 tokens of each sample are excluded from training, and additionally activations with large outlier norms (>10x median norm of a batch) are filtered out.