Zach Stein-Perlman

Featured in:

Articles

List of AI safety papers from companies, 2023–2024 — LessWrong

Jan 15, 2025 | lesswrong.com | Zach Stein-Perlman

I'm collecting (x-risk-relevant) safety research from frontier AI companies published in 2023 and 2024: https://docs.google.com/spreadsheets/d/10_dzImDvHq7eEag6paK6AmIdAGMBOA7yXUvumODhZ5U/edit?usp=sharing. I was planning to get AI safety researchers to score each of the papers, so that we could compare the labs on quality-adjusted safety research output. I'm giving up on this for now, largely because I expect to struggle to find scorers. Let me know if you want to collaborate on this.
DeepSeek beats o1-preview on math, ties on coding; will release weights — LessWrong

Nov 20, 2024 | lesswrong.com | Zach Stein-Perlman |David Matolcsi |Daniel Kokotajlo

I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.
DeepSeek beats o1 on math and ties on coding; will release weights — LessWrong

Nov 20, 2024 | lesswrong.com | Zach Stein-Perlman

DeepSeek-R1-Lite-Preview was announced today. Post. Chatbot. Chinese blogpost translation. DeepSeek says it will release the weights. The model appears to be stronger than o1-preview on math, similar on coding, and weaker on everything else. DeepSeek is Chinese. I'm not really familiar with the company. I thought Chinese companies were at least a year behind the frontier.
Three Sketches of ASL-4 Safety Case Components — LessWrong

Nov 6, 2024 | lesswrong.com | Zach Stein-Perlman

The cleanest argument that current-day AI models will not cause a catastrophe is probably that they lack the capability to do so. However, as capabilities improve, we’ll need new tools for ensuring that AI models won’t cause a catastrophe even if we can’t rule out the capability. Anthropic’s Responsible Scaling Policy (RSP) categorizes levels of risk of AI systems into different AI Safety Levels (ASL), and each level has associated commitments aimed at mitigating the risks.
The current state of RSPs — LessWrong

Nov 4, 2024 | lesswrong.com | Zach Stein-Perlman

This is a reference post. It contains no novel facts and almost no novel analysis. The idea of responsible scaling policies is now over a year old. Anthropic, OpenAI, and DeepMind each have something like an RSP, and several other relevant companies have committed to publish RSPs by February. The core of an RSP is a risk assessment plan plus a plan for safety practices as a function of risk assessment results.
Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority — LessWrong

Oct 28, 2024 | lesswrong.com | Zach Stein-Perlman

IntroductionA simplified view of AI policy is that there are two “arenas” with distinct challenges and opportunities. In the domestic arena, governments aim to support AI innovation within their borders and ensure that it is widely beneficial to their citizens, while simultaneously preventing that innovation from harming citizens’ interests (safety, human rights, economic well-being, etc).
UK AISI: Early lessons from evaluating frontier AI systems — LessWrong

Oct 25, 2024 | lesswrong.com | Zach Stein-Perlman

This blog sets out our thinking to date on how to design and run third-party evaluations, including key elements to consider and open questions. This is not intended to provide robust recommendations; rather we want to start a conversation in the open about these practices and to learn from others. We discuss the role of third-party evaluators and what they could target for testing, including which systems to test, when to test them, and which risks and capabilities to test for.
Lab governance reading list — LessWrong

Oct 25, 2024 | lesswrong.com | Zach Stein-Perlman

What labs should doThe table/list in Towards best practices in AGI safety and governance: A survey of expert opinion (GovAI: Schuett et al.
IAPS: Mapping Technical Safety Research at AI Companies — LessWrong

Oct 24, 2024 | lesswrong.com | Zach Stein-Perlman

As artificial intelligence (AI) systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define “safe AI development” as developing AI systems that are unlikely to pose large-scale misuse or accident risks.
What AI companies should do: Some rough ideas — LessWrong

Oct 21, 2024 | lesswrong.com | Zach Stein-Perlman

This post is incomplete. I'm publishing it because it might be helpful for some readers anyway. A good version of this post would be more detailed: for each proposed action, explain motivation and high-level goal, explain problem to solve, respond to obvious questions/objections, and lay out costs involved and costs worth accepting. These are the actions I think companies that will build very powerful AI systems should prioritize to improve safety.