Gunnar Zarnacke

Featured in:

Articles

Self-Other Overlap: A Neglected Approach to AI Alignment — LessWrong

Jul 30, 2024 | lesswrong.com | Steve Byrnes |Marc Carauleanu |Mike Vaiana |Gunnar Zarnacke

Many thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. SummaryIn this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about itself and others while preserving performance.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.