
Daniel Murfet
Articles
-
Jan 8, 2025 |
lesswrong.com | Lucius Bushnaq |Daniel Murfet |Joseph Miller |Matthew Clarke
TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the features the model’s own computations make use of.
-
Dec 8, 2024 |
lesswrong.com | Daniel Murfet
Introduces the idea of cognitive work as a parallel to physical work, and explains why concentrated sources of cognitive work may pose a risk to human safety. Acknowledgements. Thanks to Echo Zhou for feedback and suggestions. Some of these ideas were presented originally in a talk in November 2024 at the Australian AI Safety Forum slides for which are here: Technical AI Safety (Aus Safety Forum 24)and the video is available on YouTube.
-
Dec 8, 2024 |
lesswrong.com | Daniel Murfet
A short story, best enjoyed in-context... try exploring possible meanings of some of the unusual terms with your favourite commercially available Solomonoff inductor. Acknowledgements. Thanks to Simon Pepin Lehalleur and Ziling Ye for feedback and suggestions. This post is the "fun" half of a pair, for the serious version see Cognitive Work and AI Safety. The sand was still cool under Dr. Chen's feet, the sun barely cresting the horizon.
-
Nov 27, 2024 |
lesswrong.com | Daniel Murfet
Our large learning machines find patterns in the world and use them to predict. When these machines exceed us and become superhuman, one of those patterns will be relative human incompetence. How comfortable are we with the incorporation of this pattern into their predictions, when those predictions become the actions that shape the world? My thanks to Matthew Farrugia-Roberts for feedback and discussion.
-
Oct 26, 2024 |
lesswrong.com | Lucius Bushnaq |Stefan Heimersheim |Daniel Murfet
Epistemic status: Low confidence. This is a bit of an experiment in lowering my bar for what I make a proper post instead of a shortform. It's just an early stage idea that might go nowhere. It looks to me like neural networks might naturally develop error-correction mechanisms because error correction dramatically increases the volume of solutions in the loss landscape. In other words, neural networks with error correction seem like they’d tend to have much lower learning coefficients.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →