Fabien Roger's profile photo

Fabien Roger

Featured in:

Articles

  • Dec 9, 2024 | lesswrong.com | Fabien Roger

    Work done with James Faina, Evan Hubinger and Ethan Perez. This is ongoing research on speculative threats, which might be of interest to researchers in the field of AI safety. We ask that readers treat this as a work in progress, rather than as a mature research paper. To prevent highly capable and potentially misaligned LLMs from taking bad actions, we might want to monitor all of their outputs (Inan et al. 2023, Greenblatt 2024, Balesni 2024).

  • Oct 14, 2024 | lesswrong.com | Fabien Roger

    What if you could remove some information from the weights of an AI? Would that be helpful? It is clearly useful against some misuse concerns: if you are concerned that LLMs will make it easier to build bioweapons because they have memorized such information, removing the memorized facts would remove this misuse concern.

  • Sep 27, 2024 | lesswrong.com | Fabien Roger

    Many sources report that cybercrime costs the global economy trillions of dollars per year. It is the top Google search result and it is quoted on Wikipedia. But I am not able to track down how the number was computed, or find criticism of these numbers. This would be insanely high if true: the world GDP is only 100 trillion / year, and the software industry is only around 1 trillion / year (according to a quick Google search).

  • Jun 27, 2024 | lesswrong.com | Fabien Roger

    This is a rough write-up of ideas by Ryan Greenblatt. The core ideas are his and mistakes are mine. Thanks to him, Buck Shlegeris and to Aghyad Deeb for feedback on the draft of this post. One way to use scheming AIs is to train them to imitate humans. When does imitation robustly avoid frequent and intentional failures?

  • Jun 21, 2024 | lesswrong.com | Fabien Roger

    In this post, I share some quick results that I got as part of a bigger project that pivoted. TL;DR:When trying to generate Math reasoning with current LLMs, optimizing too hard against their judgment of what is “good reasoning” can make the proportion of correct reasoning diminish.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →