Residual stream SAEs for Qwen2.5-7B-Instruct.
These SAEs were trained using a blend of chat (lmsys/lmsys-chat-1m) and pretraining data (monology/pile-uncopyrighted), and also a small amount of emergent misalignment data.
Each SAE is trained using BatchTopK. For each layer, we train 4 SAEs, with k=32,64,128,256
.
For more training details, see https://github.com/andyrdt/dictionary_learning/tree/andyrdt/qwen.
Note: the first 8 tokens of each sample are excluded from training, and additionally activations with large outlier norms (>10x median norm of a batch) are filtered out.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support