Oliver Sourbut

Featured in:

Articles

Deceptive Alignment and Homuncularity — LessWrong

Jan 16, 2025 | lesswrong.com | Oliver Sourbut

NB this dialogue occurred at the very end of 2023, and for various reasons is only being published ~a year later! Keep this in mind while reading.
Deceptive AI ≠ Deceptively-aligned AI — LessWrong

Jan 7, 2024 | lesswrong.com | Steven Byrnes |Seth Herd |Oliver Sourbut

Tl;dr: A “deceptively-aligned AI” is different from (and much more specific than) a “deceptive AI”. I think this is well-known and uncontroversial among AI Alignment experts, but I see people getting confused about it sometimes, so this post is a brief explanation of how they differ. You can just look at the diagram below for the upshot. Some motivating context: There have been a number of recent arguments that future AI is very unlikely to be deceptively-aligned.
Natural Latents: The Math — LessWrong

Dec 27, 2023 | lesswrong.com | David Lorell |Alexander Gietelink Oldenziel |Thane Ruthenis |Oliver Sourbut

Also, what do you mean by mutual information between Xi, given that there are at least 3 of them? You can generalize mutual information to N variables: interaction information. Why would it always be possible to decompose random variables to allow for a natural latent? Well, I suppose I overstated it a bit by saying "always"; you can certainly imagine artificial setups where the mutual information between a bunch of variables is zero.
"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity — LessWrong

Dec 16, 2023 | lesswrong.com | Thane Ruthenis |Oliver Sourbut |Mo Putera |Gerald M. Monroe

When discussing AGI Risk, people often talk about it in terms of a war between humanity and an AGI. Comparisons between the amounts of resources at both sides' disposal are brought up and factored in, big impressive nuclear stockpiles are sometimes waved around, etc. I'm pretty sure it's not how that'd look like, on several levels. 1.
Mapping the semantic void: Strange goings-on in GPT embedding spaces — LessWrong

Dec 15, 2023 | lesswrong.com | Dmitry Vaintrob |Joseph Miller |Oliver Sourbut |Carl Feynman

TL;DR: GPT-J token embeddings inhabit a zone in their 4096-dimensional embedding space formed by the intersection of two hyperspherical shells. This is described, and then the remaining expanse of the embedding space is explored by using simple prompts to elicit definitions for non-token custom embedding vectors (so-called "nokens").
Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" — LessWrong

Nov 21, 2023 | lesswrong.com | Eli Tyre |Stephen McAleese |Ben Pace |Oliver Sourbut

I've seen/heard a bunch of people in the LW-o-sphere saying that the OpenAI corporate drama this past weekend was clearly bad. And I'm not really sure why people think that? To me, seems like a pretty clearly positive outcome overall. I'm curious why in the world people are unhappy about it (people in the LW-sphere, that is, obviously I can see why e.g. AI accelerationists would be unhappy about it). And I also want to lay out my models. Here's the high-gloss version of my take.
Vote on worthwhile OpenAI topics to discuss — LessWrong

Nov 21, 2023 | lesswrong.com | Ben Pace |Garrett Baker |Oliver Sourbut |Gesild Muka

I was asked to clarify my position about why I voted 'disagree' with "I assign >50% to this claim: The board should be straightforward with its employees about why they fired the CEO." I'm putting a maybe-unjustified high amount of trust in all the people involved, and from that, my prior is very high on "for some reason, it would be really bad, inappropriate, or wrong to discuss this in a public way." And given that OpenAI has ~800 employees, telling them would basically count as a 'public'...

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →

Oliver Sourbut

Articles

Deceptive Alignment and Homuncularity — LessWrong

Deceptive AI ≠ Deceptively-aligned AI — LessWrong

Natural Latents: The Math — LessWrong

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity — LessWrong

Mapping the semantic void: Strange goings-on in GPT embedding spaces — LessWrong

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" — LessWrong

Vote on worthwhile OpenAI topics to discuss — LessWrong

Contact details

Emails

Socials & Sites