Kshitij Sachan

Featured in: Favicon

medium.com Favicon

browndailyherald.com

Articles

AI Control: Improving Safety Despite Intentional Subversion — LessWrong

Dec 13, 2023 | lesswrong.com | Fabien Roger |Kshitij Sachan

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:We summarize the paper;We compare our methodology to the methodology of other safety papers.
LLMs are (mostly) not helped by filler tokens — LessWrong

Aug 10, 2023 | lesswrong.com | Kshitij Sachan |Jacob Pfau |Quintin Pope |Violet Hour

This work was done at Redwood Research. The views expressed are my own and do not necessarily reflect the views of the organization. Thanks to Ryan Greenblatt, Fabien Roger, and Jenny Nitishinskaya for running some of the initial experiments and to Gabe Wu and Max Nadeau for revising this post. I conducted experiments to see if language models could use 'filler tokens'—unrelated text output before the final answer—for additional computation.

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →