Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Paper • 2408.00113 • Published Jul 31, 2024 • 8
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 36
A Configurable Library for Generating and Manipulating Maze Datasets Paper • 2309.10498 • Published Sep 19, 2023