Johannes Treutlein

Featured in:

Articles

Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data — LessWrong

Jun 21, 2024 | lesswrong.com | Johannes Treutlein

TL;DR: We published a new paper on out-of-context reasoning in LLMs. We show that LLMs can infer latent information from training data and use this information for downstream tasks, without any in-context learning or CoT. For instance, we finetune GPT-3.5 on pairs (x,f(x)) for some unknown function f. We find that the LLM can (a) define f in Python, (b) invert f, (c) compose f with other functions, for simple functions such as x+14, x // 3, 1.75x, and 3x+2.
Non-myopia stories — LessWrong

Nov 13, 2023 | lesswrong.com | Johannes Treutlein

Written under the supervision of Lionel Levine. Thanks to Owain Evans, Aidan O’Gara, Max Kaufmann, and Johannes Treutlein for comments. This post is a synthesis of arguments made by other people. It provides a collection of answers to the question, "Why would an AI become non-myopic?" In this post I’ll describe a model as myopic if it cares only about what happens in the current training episode. This form of myopia is called episodic myopia.
Deep Deceptiveness — LessWrong

Mar 21, 2023 | lesswrong.com | Steven Byrnes |Ben Pace |Lauro Langosco |Johannes Treutlein

MetaThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs.
Conditioning Predictive Models: Outer alignment via careful conditioning — LessWrong

Feb 2, 2023 | lesswrong.com | Adam S. Jermyn |Johannes Treutlein |Rubi J. Hudson |Charlie Steiner

Thanks for your comment!Regarding 1: I don't think it would be good to simulate superintelligences with our predictive models. Rather, we want to simulate humans to elicit safe capabilities. We talk more about competitiveness of the approach in Section III. Regarding 3: I agree it might have been good to discuss cyborgism specifically. I think cyborgism is to some degree compatible with careful conditioning.
Conditioning Predictive Models: Large language models as predictors — LessWrong

Feb 2, 2023 | lesswrong.com | Adam S. Jermyn |Johannes Treutlein |Rubi J. Hudson |Roman Leventov

Overall: I directionally and conceptually agree with most of what is said in this post, and only highlight and comment on the things that I disagree about (or not fully agree, or find ontologically or conceptually somewhat off). an agent minimizing its cross-entropy loss,I understand this is not the point of your paper and is just an example, yet I want to use the opportunity to discuss it. The training loss is not the agent's surprise.