Rubi J. Hudson

Featured in:

Articles

Safe Predictive Agents with Joint Scoring Rules — LessWrong

Oct 9, 2024 | lesswrong.com | Rubi J. Hudson

Thanks to Evan Hubinger for funding this project and for introducing me to predictive models, Johannes Treutlein for many fruitful discussions on related topics, and Dan Valentine for providing valuable feedback on my code implementation. In September 2023, I received four months of funding through Manifund to extend my initial results on avoiding self-fulfilling prophecies in predictive models. Eleven months later, the project was finished, and the results were submitted as a conference paper.
Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural — LessWrong

Jul 16, 2024 | lesswrong.com | Rubi J. Hudson

Max Harms recently published an interesting series of posts on corrigibility, which argue that corrigibility should be the sole objective we try to give to a potentially superintelligent AI. A large installment in the series is dedicated to cataloging the properties that make up such a goal, with open questions including whether the list is exhaustive and how to trade off between the items that make it up. I take the opposite approach to thinking about corrigibility.
A Basic Economic-Style Model of AI Existential Risk — LessWrong

Jun 24, 2024 | lesswrong.com | Rubi J. Hudson

Crossposted with my new blog, Crossing the Rubicon, and primarily aimed at x-risk skeptics from economics backgrounds. If you're interested in novel takes on theoretical AI safety, please consider subscribing!“So, when it comes to AGI and existential risk, it turns out as best I can ascertain, in the 20 years or so we've been talking about this seriously, there isn't a single model done. Period. Flat out. So, I don't think any idea should be dismissed.
The Case for Predictive Models — LessWrong

Apr 3, 2024 | lesswrong.com | Rubi J. Hudson

I'm also posting this on my new blog, Crossing the Rubicon, where I'll be writing about ideas in alignment. Thanks to Johannes Treutlein and Paul Colognese for feedback on this post. Just over a year ago, the Conditioning Predictive Models paper was released. It laid out an argument and a plan for using powerful predictive models to reduce existential risk from AI, and outlined some foreseeable challenges to doing so.
Searching for Searching for Search — LessWrong

Feb 14, 2024 | lesswrong.com | Rubi J. Hudson

Thanks to Leo Gao, Nicholas Dupuis, Paul Colognese, Janus, and Andrei Alexandru for their thoughts. This post was mostly written in 2022, and pulled out of my drafts after recent conversations on the topic. Searching for Search is the research direction that looks into how neural networks implement search algorithms to determine an action. The hope is that if we can find the search process, we can then determine which goal motivates it, which may otherwise be a much more difficult task.
New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

Nov 15, 2023 | lesswrong.com | Steven Byrnes |Joe Carlsmith |Quintin Pope |Rubi J. Hudson

(Cross-posted from my website)I’ve written a report about whether advanced AIs will fake alignment during training in order to get power later – a behavior I call “scheming” (also sometimes called “deceptive alignment”). The report is available on arXiv here. There’s also an audio version here, and I’ve included the introductory section below (with audio for that section here). This section includes a full summary of the report, which covers most of the main points and technical terminology.