Callum McDougall's profile photo

Callum McDougall

Featured in:

Articles

  • Apr 1, 2024 | lesswrong.com | Callum McDougall |Joseph Bloom

    Epistemic status - self-evident. In this post, we interpret a small sample of Sparse Autoencoder features which reveal meaningful computational structure in the model that is clearly highly researcher-independent and of significant relevance to AI alignment. Recent excitement about Sparse Autoencoders (SAEs) has been mired by the following question: Do  SAE features reflect properties of the model, or just capture correlational structure in the underlying data distribution?

  • Mar 31, 2024 | lesswrong.com | Callum McDougall |Joseph Bloom

    This is a post to officially announce the sae-vis library, which was designed to create feature dashboards like those from Anthropic's research. There are 2 types of visualisations supported by this library: feature-centric and prompt-centric. The feature-centric vis is the standard from Anthropic’s post, it looks like the image below. There’s an option to navigate through different features via a dropdown in the top left.

  • Nov 29, 2023 | lesswrong.com | Callum McDougall |James Dao

    This is a linkpost for some exercises in sparse autoencoders, which I've recently finished working on as part of the upcoming ARENA 3.0 iteration. Having spoken to Neel Nanda and others in interpretability-related MATS streams, it seemed useful to make these exercises accessible out of the context of the rest of the ARENA curriculum. Links to Colabs (updated): Exercises, Solutions.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →