Seth Herd

Featured in:

Articles

Model Integrity: MAI on Value Alignment — LessWrong

Dec 5, 2024 | lesswrong.com | Jonas Hallgren |Seth Herd

EVERYONE, CALM DOWN!Meaning Alignment Institute just dropped their first post in basically a year and it seems like they've been up to some cool stuff. Their perspective on value alignment really grabbed my attention because it reframes our usual technical alignment conversations around rules and reward functions into something more fundamental - what makes humans actually reliably good and cooperative?
Gratitudes: Rationalist Thanks Giving — LessWrong

Nov 28, 2024 | lesswrong.com | Seth Herd

Epistemic status: I wish I'd thought of writing this before the day rolled around. Brief and unpolished, although this is something I've thought about a lot on both personal and computational neuroscience levels. But there are no strong conclusions, just some brief thoughts you may find interesting or even useful. Hopefully you've had a fun Thanksgiving celebration, including feasting and appreciating family and friends.
OpenAI Email Archives (from Musk v. Altman) — LessWrong

Nov 16, 2024 | lesswrong.com | Seth Herd |Thane Ruthenis |Leon Lang |Thomas Kehrenberg

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used some AI assistance to generate this originally, and then went meticulously through each email and checked them for differences.
Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI — LessWrong

Nov 12, 2024 | lesswrong.com | Seth Herd

Epistemic status: Sudden public attitude shift seems quite possible, but I haven't seen it much in discussion, so I thought I'd float the idea again. This is somewhat dashed off since the goal is just to toss out a few possibilities and questions. In Current AIs Provide Nearly No Data Relevant to AGI Alignment, Thane Ruthenis argues that current AI is almost irrelevant to the project of aligning AGIs. Current AI is simply not what we're talking about when we worry about alignment, he says.
Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al. — LessWrong

Nov 11, 2024 | lesswrong.com | Seth Herd

I read the whole thing because of its similarity to my proposals about metacognition as an aid to both capabilities and alignment in language model agents. In this and my work, metacognition is a way to keep AI from doing the wrong thing (from the AIs perspective). They explicitly do not address the broader alignment problem of AIs wanting the wrong things (from humans' perspective). They note that "wiser" humans are more prone to serve the common good, by taking more perspectives into account.
Intent alignment as a stepping-stone to value alignment — LessWrong

Nov 5, 2024 | lesswrong.com | Seth Herd

I think Instruction-following AGI is easier and more likely than value aligned AGI, and that this accounts for one major crux of disagreement on alignment difficulty. I got several responses to that piece that didn't dispute that intent alignment is easier, but argued we shouldn't give up on value alignment. I think that's right. Here's another way to frame the value of personal intent alignment: we can use a superintelligent instruction-following AGI to solve full value alignment.
What TMS is like — LessWrong

Oct 30, 2024 | lesswrong.com | Keenan Pepper |Seth Herd |Alex K. Chen

There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter. TMS stands for . Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do. I don’t know a whole lot about the neuroscience - this post isn’t about the how or the why. It’s from the perspective of a patient, and it’s about the what. What is it like to get TMS?
Could randomly choosing people to serve as representatives lead to better government? — LessWrong

Oct 21, 2024 | lesswrong.com | John Huang |Seth Herd

I'm an advocate of something known as sortition. The premise is simple. Choose people at random, to serve a finite term, in some decision making capacity. Pay them to be there. Sounds ridiculous, right? How could we possibly trust ignorant, stupid, normal people to make good decisions? What would this look like? Why would this be better than electing our officials to office? Descriptive RepresentationImagine a Congress that actually looks like America.
Could literally randomly choosing people to serve as our political representatives lead to better government? — LessWrong

Oct 21, 2024 | lesswrong.com | John Huang |Seth Herd

I'm an advocate of something known as sortition. The premise is simple. Choose people at random, to serve a finite term, in some decision making capacity. Pay them to be there. Sounds ridiculous, right? How could we possibly trust ignorant, stupid, normal people to make good decisions? What would this look like? Why would this be better than electing our officials to office? Descriptive RepresentationImagine a Congress that actually looks like America.
RREAL AGI - LessWrong

Sep 13, 2024 | lesswrong.com | Seth Herd

I see "AGI" used for everything from existing LLMs to superintelligence, and massive resulting confusion and illusory disagreements. I finally thought of a term I like for what I mean by AGI. It's an acronym that's also somewhat intuitive without reading the definition:Reasoning, Reflective Entities with Autonomy and Learningmight be called "(R)REAL AGI" or "real AGI". See below for further definitions.

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →

Seth Herd

Articles

Model Integrity: MAI on Value Alignment — LessWrong

Gratitudes: Rationalist Thanks Giving — LessWrong

OpenAI Email Archives (from Musk v. Altman) — LessWrong

Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI — LessWrong

Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al. — LessWrong

Intent alignment as a stepping-stone to value alignment — LessWrong

What TMS is like — LessWrong

Could randomly choosing people to serve as representatives lead to better government? — LessWrong

Could literally randomly choosing people to serve as our political representatives lead to better government? — LessWrong

RREAL AGI - LessWrong

Contact details

Emails

Socials & Sites