Articles
-
Jan 10, 2025 |
lesswrong.com | Sam Marks
Anthropic’s Alignment Science team conducts technical research aimed at mitigating the risk of catastrophes caused by future advanced AI systems, such as mass loss of life or permanent loss of human control. A central challenge we face is identifying concrete technical work that can be done today to prevent these risks. Future worlds where our research matters—that is, worlds that carry substantial catastrophic risk from AI—will have been radically transformed by AI development.
-
Jul 7, 2024 |
alliancemagazine.org | Sam Marks
Imagine being charged with critical life-changing responsibilities while being starved by the same public actors to whom you are accountable. This is the crazy-making situation nonprofits are finding themselves in, whether they are housing the unhoused, providing safe spaces for women fleeing intimate partner violence, or providing childcare, many of society’s most critical services rely on timely, predictable funding from government agencies.
-
Jun 12, 2024 |
alliancemagazine.org | Sam Marks |Claudia Cahalane
Imagine being charged with critical life-changing responsibilities while being starved by the same public actors to whom you are accountable. This is the crazy-making situation nonprofits are finding themselves in, whether they are housing the unhoused, providing safe spaces for women fleeing intimate partner violence, or providing childcare, many of society’s most critical services rely on timely, predictable funding from government agencies.
-
Apr 18, 2024 |
lesswrong.com | Sam Marks
In a new preprint, Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models, my coauthors and I introduce a technique, Sparse Human-Interpretable Feature Trimming (SHIFT), which I think is the strongest proof-of-concept yet for applying AI interpretability to existential risk reduction. In this post, I will explain how SHIFT fits into a broader agenda for what I call cognition-based oversight.
-
Feb 24, 2024 |
studentnewspaper.org | Sam Marks
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →