Tomek Korbak

Featured in:

Articles

Pretraining Language Models with Human Preferences — LessWrong

Feb 21, 2023 | lesswrong.com | Tomek Korbak |Sam Bowman |Ethan Perez |Tao Lin

This post summarizes the main results from our recently released paper Pretraining Language Models with Human Preferences, and puts them in the broader context of AI safety. For a quick summary of the paper, take a look at our Twitter thread. TL;DR: In the paper, we show how to train LMs with human preferences (as in RLHF), but during LM pretraining.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.