Seth Herd's profile photo

Seth Herd

Featured in:

Articles

  • Dec 5, 2024 | lesswrong.com | Jonas Hallgren |Seth Herd

    EVERYONE, CALM DOWN!Meaning Alignment Institute just dropped their first post in basically a year and it seems like they've been up to some cool stuff. Their perspective on value alignment really grabbed my attention because it reframes our usual technical alignment conversations around rules and reward functions into something more fundamental - what makes humans actually reliably good and cooperative?

  • Nov 28, 2024 | lesswrong.com | Seth Herd

    Epistemic status: I wish I'd thought of writing this before the day rolled around. Brief and unpolished, although this is something I've thought about a lot on both personal and computational neuroscience levels. But there are no strong conclusions, just some brief thoughts you may find interesting or even useful. Hopefully you've had a fun Thanksgiving celebration, including feasting and appreciating family and friends.

  • Nov 16, 2024 | lesswrong.com | Seth Herd |Thane Ruthenis |Leon Lang |Thomas Kehrenberg

    As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used some AI assistance to generate this originally, and then went meticulously through each email and checked them for differences.

  • Nov 12, 2024 | lesswrong.com | Seth Herd

    Epistemic status: Sudden public attitude shift seems quite possible, but I haven't seen it much in discussion, so I thought I'd float the idea again. This is somewhat dashed off since the goal is just to toss out a few possibilities and questions. In Current AIs Provide Nearly No Data Relevant to AGI Alignment, Thane Ruthenis argues that current AI is almost irrelevant to the project of aligning AGIs. Current AI is simply not what we're talking about when we worry about alignment, he says.

  • Nov 11, 2024 | lesswrong.com | Seth Herd

    I read the whole thing because of its similarity to my proposals about metacognition as an aid to both capabilities and alignment in language model agents.  In this and my work, metacognition is a way to keep AI from doing the wrong thing (from the AIs perspective). They explicitly do not address the broader alignment problem of AIs wanting the wrong things (from humans' perspective). They note that "wiser" humans are more prone to serve the common good, by taking more perspectives into account.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →