
Nora Belrose
Articles
-
2 months ago |
arxiv.org | Nora Belrose
-
Oct 17, 2024 |
arxiv.org | Nora Belrose
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
-
Aug 6, 2024 |
lesswrong.com | Nora Belrose |David Johnston
Over the last few months, the EleutherAI interpretability team pioneered novel, mechanistic methods for detecting anomalous behavior in language models based on Neel Nanda's attribution patching technique. Unfortunately, none of these methods consistently outperform non-mechanistic baselines which look only at activations. We find that we achieve better anomaly detection performance with methods that evaluate entire batches of test data, rather than considering test points one at a time.
-
Apr 4, 2024 |
lesswrong.com | Gerald M. Monroe |Victor Ashioya |M. Y. Zuo |Nora Belrose
Summary: the moderators appear to be soft banning users with 'rate-limits' without feedback. A careful review of each banned user reveals it's common to be banned despite earnestly attempting to contribute to the site. Some of the most intelligent banned users have mainstream instead of EA views on AI. Note how the punishment lengths are all the same, I think it was a mass ban-wave of 3 week bans:Gears to ascension was here but is no longer, guess she convinced them it was a mistake.
-
Mar 11, 2024 |
lesswrong.com | Steven Byrnes |Nora Belrose |Charlie Steiner
I mean, the most straightforward reading of Chapters 7 and 8 of Superintelligence is just a possibility-therefore-probability fallacy in my opinion.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →