
Joseph Miller
Articles
-
Jan 8, 2025 |
lesswrong.com | Lucius Bushnaq |Daniel Murfet |Joseph Miller |Matthew Clarke
TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the features the model’s own computations make use of.
-
Jul 12, 2024 |
lesswrong.com | Joseph Miller
When you think you've found a circuit in a language model, how do you know if it does what you think it does? Typically, you ablate / resample the activations of the model in order to isolate the circuit. Then you measure if the model can still perform the task you're investigating. We identify six ways in which ablation experiments often vary. How do these variations change the results of experiments that measure circuit faithfulness?
-
May 21, 2024 |
lesswrong.com | Joseph Miller
This is a special post for quick takes by. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page
-
May 11, 2024 |
lesswrong.com | Joseph Miller
This post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023). I introduce new terminology to clarify the distinction between different types of activation patching. Node PatchingNode Patching (aka.
-
Apr 30, 2024 |
lesswrong.com | Joseph Miller
GPT-5 training is probably starting around now. It seems very unlikely that GPT-5 will cause the end of the world. But it’s hard to be sure. I would guess that GPT-5 is more likely to kill me than an asteroid, a supervolcano, a plane crash or a brain tumor. We can predict fairly well what the cross-entropy loss will be, but pretty much nothing else. Maybe we will suddenly discover that the difference between GPT-4 and superhuman level is actually quite small.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →