
Zhu Xiaohu
Articles
-
Mar 15, 2023 |
lesswrong.com | Sam Bowman |Kshitij Sachan |Roman Leventov |Zhu Xiaohu
My summary:Evan expresses pessimism about our ability to use behavioral-based evaluations (like the capabilities evals ARC did for GPT-4) to test for alignment properties in the future. Detecting for alignment may be quite hard because you might be up against a highly capable adversary that is trying to evade detection; this might even be harder than training an aligned system to begin with.
-
Mar 15, 2023 |
lesswrong.com | Sam Bowman |Kshitij Sachan |Roman Leventov |Zhu Xiaohu
My summary:Evan expresses pessimism about our ability to use behavioral-based evaluations (like the capabilities evals ARC did for GPT-4) to test for alignment properties in the future. Detecting for alignment may be quite hard because you might be up against a highly capable adversary that is trying to evade detection; this might even be harder than training an aligned system to begin with.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →