Seb Farquhar's profile photo

Seb Farquhar

Featured in:

Articles

  • Jan 23, 2025 | lesswrong.com | Seb Farquhar |David H. Lindner |Rohin Shah

    Blog post by Sebastian Farquhar, David Lindner, Rohin Shah. It discusses the paper MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking by Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, and Rohin Shah. Our paper tries to make agents that are safer in ways that we may not be able to  evaluate through Myopic Optimization with Non-myopic Approval (MONA).

  • Aug 20, 2024 | lesswrong.com | Rohin Shah |Seb Farquhar |Anca Dragan

    We wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we have done and see how their work fits with ours. We’re the main team at Google DeepMind working on technical approaches to existential risk from AI systems.

  • Dec 18, 2023 | lesswrong.com | Vikrant Varma |Vlad Mikulik |Rohin Shah |Seb Farquhar

    TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly helpful for implementations of alignment strategies (>95%). Instead of finding knowledge, it seems to find the most prominent feature. We are less sure about the wider category of unsupervised consistency-based methods, but tend to think they won’t be directly helpful either (70%). We’ve written a paper about some of our detailed experiences with it.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →