
Stefan Hex
Articles
-
2 months ago |
lesswrong.com | Lucius Bushnaq |Dan Braun |Stefan Hex |Lee Sharkey
This is a linkpost for Apollo Research's new interpretability paper: "Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition". We introduce a new method for directly decomposing neural network parameters into mechanistic components. At Apollo, we've spent a lot of time thinking about how the computations of neural networks might be structured, and how those computations might be embedded in networks' parameters.
-
Aug 8, 2024 |
lesswrong.com | Stefan Hex
This work was produced at Apollo Research, based on initial research done at MATS. LayerNorm is annoying for mechanstic interpretability research (“[...] reason #78 for why interpretability researchers hate LayerNorm” – Anthropic, 2023). Here’s a Hugging Face link to a GPT2-small model without any LayerNorm.
-
Jul 18, 2024 |
lesswrong.com | Lee Sharkey |Lucius Bushnaq |Dan Braun |Stefan Hex
Why we made this list: The interpretability team at Apollo Research wrapped up a few projects recently. In order to decide what we’d work on next, we generated a lot of different potential projects. Unfortunately, we are computationally bounded agents, so we can't work on every project idea that we were excited about! Previous lists of project ideas (such as Neel’s collation of 200 Concrete Open Problems in Mechanistic Interpretability) have been very useful for people breaking into the field.
-
Jul 5, 2024 |
lesswrong.com | Stefan Hex
This part-report / part-proposal describes ongoing research, but I'd like to share early results for feedback. I am especially interested in any comment finding mistakes or trivial explanations for these results. I will work on this proposal with a LASR Labs team over the next 3 months.
-
Jul 5, 2024 |
lesswrong.com | Stefan Hex
This is a special post for quick takes by. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →