
Kshitij Sachan
Articles
-
Dec 13, 2023 |
lesswrong.com | Fabien Roger |Kshitij Sachan
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:We summarize the paper;We compare our methodology to the methodology of other safety papers.
-
Aug 10, 2023 |
lesswrong.com | Kshitij Sachan |Jacob Pfau |Quintin Pope |Violet Hour
This work was done at Redwood Research. The views expressed are my own and do not necessarily reflect the views of the organization. Thanks to Ryan Greenblatt, Fabien Roger, and Jenny Nitishinskaya for running some of the initial experiments and to Gabe Wu and Max Nadeau for revising this post. I conducted experiments to see if language models could use 'filler tokens'—unrelated text output before the final answer—for additional computation.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →