Daniel Filan

Featured in:

Articles

AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI — LessWrong

Jan 24, 2025 | lesswrong.com | Daniel Filan

YouTube linkTypically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Topics we discuss: Positive visions of AI Improving recommendation systemsDaniel Filan (00:09):Hello, everyone.
AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming — LessWrong

Jan 19, 2025 | lesswrong.com | Daniel Filan

YouTube linkSuppose we’re worried about AIs engaging in long-term plans that they don’t tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Topics we discuss: The Alignment Workshop Daniel Filan (00:09):Hello, everyone.
MATS mentor selection — LessWrong

Jan 9, 2025 | lesswrong.com | Daniel Filan |Ryan Kidd

MATS currently has more people interested in being mentors than we are able to support—for example, for the Winter 2024-25 Program, we received applications from 87 prospective mentors who cumulatively asked for 223 scholars (for a cohort where we expected to only accept 80 scholars). As a result, we need some process for how to choose which researchers to take on as mentors and how many scholars to allocate each.
AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory — LessWrong

Nov 27, 2024 | lesswrong.com | Daniel Filan

YouTube linkYou may have heard of singular learning theory, and its “local learning coefficient”, or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models. Topics we discuss: About Jesse The Alignment Workshop About Timaeus SLT that isn’t developmental interpretability The refined local learning coefficient Daniel Filan (00:09):Hello, everyone.
38.1 - Alan Chan on Agent Infrastructure — LessWrong

Nov 16, 2024 | lesswrong.com | Daniel Filan

YouTube linkRoad lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents. Topics we discuss: How the Alignment Workshop is Agent infrastructure Why agent infrastructure A trichotomy of agent infrastructure Agent IDs Agent channels Relation to AI controlDaniel Filan (00:09):Hello, everyone.
AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems — LessWrong

Nov 14, 2024 | lesswrong.com | Daniel Filan

YouTube linkDo language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions. Topics we discuss: How the Alignment Workshop is Causality and alignment Causal abstraction Why LLM causal reasoning? Multi-agent systemsDaniel Filan (00:09):Hello, everyone.
A failure of an argument against sola scriptura — LessWrong

Nov 1, 2024 | lesswrong.com | Daniel Filan

(cross-posted from Superstimulus)Recently, Catholic apologist Joe Heschmeyer has produced a couple of videos arguing against the Protestant view of the Bible - specifically, the claims of Sola Scriptura and Perspicuity (capitalized because I’ll want to refer to them as premises later).
MATS AI Safety Strategy Curriculum v2 — LessWrong

Oct 7, 2024 | lesswrong.com | Daniel Filan |Ryan Kidd

As part of our Summer 2024 Program, MATS ran a series of discussion groups focused on questions and topics we believe are relevant to prioritizing research into AI safety. Each weekly session focused on one overarching question, and was accompanied by readings and suggested discussion questions.
AXRP Episode 37 - Jaime Sevilla on Forecasting AI — LessWrong

Oct 4, 2024 | lesswrong.com | Daniel Filan |Daniel Kokotajlo

YouTube linkEpoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in data used, and when the above trends might hit an end. In this episode, I speak with the director of Epoch AI, Jaime Sevilla, about how compute, data, and algorithmic improvements are impacting AI, and whether continuing to scale can get us AGI.
AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics — LessWrong

Sep 29, 2024 | lesswrong.com | Daniel Filan

YouTube linkSometimes, people talk about transformers as having “world models” as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural networks.