Mike Vaiana's profile photo

Mike Vaiana

Featured in:

Articles

  • Jul 30, 2024 | lesswrong.com | Steve Byrnes |Marc Carauleanu |Mike Vaiana |Gunnar Zarnacke

    Many thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. SummaryIn this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about itself and others while preserving performance.

  • Jul 10, 2024 | lesswrong.com | Mike Vaiana

    Many thanks to Diogo de Lucena, Cameron Berg, Judd Rosenblatt, and Philip Gubbins for support and feedback on this post. TL;DRReinforcement Learning from Human Feedback (RLHF) is one of the leading methods for fine-tuning foundational models to be helpful, harmless, and honest.  But it’s complicated and the standard implementation requires a pool of crowdsourced workers to provide feedback.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →