Kshitij Sachan's profile photo

Kshitij Sachan

Articles

  • Dec 13, 2023 | lesswrong.com | Fabien Roger |Kshitij Sachan

    We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:We summarize the paper;We compare our methodology to the methodology of other safety papers.

  • Aug 10, 2023 | lesswrong.com | Kshitij Sachan |Jacob Pfau |Quintin Pope |Violet Hour

    This work was done at Redwood Research. The views expressed are my own and do not necessarily reflect the views of the organization. Thanks to Ryan Greenblatt, Fabien Roger, and Jenny Nitishinskaya for running some of the initial experiments and to Gabe Wu and Max Nadeau for revising this post. I conducted experiments to see if language models could use 'filler tokens'—unrelated text output before the final answer—for additional computation.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →