
Johannes Treutlein
Articles
-
Jun 21, 2024 |
lesswrong.com | Johannes Treutlein
TL;DR: We published a new paper on out-of-context reasoning in LLMs. We show that LLMs can infer latent information from training data and use this information for downstream tasks, without any in-context learning or CoT. For instance, we finetune GPT-3.5 on pairs (x,f(x)) for some unknown function f. We find that the LLM can (a) define f in Python, (b) invert f, (c) compose f with other functions, for simple functions such as x+14, x // 3, 1.75x, and 3x+2.
-
Nov 13, 2023 |
lesswrong.com | Johannes Treutlein
Written under the supervision of Lionel Levine. Thanks to Owain Evans, Aidan O’Gara, Max Kaufmann, and Johannes Treutlein for comments. This post is a synthesis of arguments made by other people. It provides a collection of answers to the question, "Why would an AI become non-myopic?" In this post I’ll describe a model as myopic if it cares only about what happens in the current training episode. This form of myopia is called episodic myopia.
-
Mar 21, 2023 |
lesswrong.com | Steven Byrnes |Ben Pace |Lauro Langosco |Johannes Treutlein
MetaThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs.
-
Feb 2, 2023 |
lesswrong.com | Adam S. Jermyn |Johannes Treutlein |Rubi J. Hudson |Charlie Steiner
Thanks for your comment!Regarding 1: I don't think it would be good to simulate superintelligences with our predictive models. Rather, we want to simulate humans to elicit safe capabilities. We talk more about competitiveness of the approach in Section III. Regarding 3: I agree it might have been good to discuss cyborgism specifically. I think cyborgism is to some degree compatible with careful conditioning.
-
Feb 2, 2023 |
lesswrong.com | Adam S. Jermyn |Johannes Treutlein |Rubi J. Hudson |Roman Leventov
Overall: I directionally and conceptually agree with most of what is said in this post, and only highlight and comment on the things that I disagree about (or not fully agree, or find ontologically or conceptually somewhat off). an agent minimizing its cross-entropy loss,I understand this is not the point of your paper and is just an example, yet I want to use the opportunity to discuss it. The training loss is not the agent's surprise.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →