
Zach Stein-Perlman
Articles
-
Jan 15, 2025 |
lesswrong.com | Zach Stein-Perlman
I'm collecting (x-risk-relevant) safety research from frontier AI companies published in 2023 and 2024: https://docs.google.com/spreadsheets/d/10_dzImDvHq7eEag6paK6AmIdAGMBOA7yXUvumODhZ5U/edit?usp=sharing. I was planning to get AI safety researchers to score each of the papers, so that we could compare the labs on quality-adjusted safety research output. I'm giving up on this for now, largely because I expect to struggle to find scorers. Let me know if you want to collaborate on this.
-
Nov 20, 2024 |
lesswrong.com | Zach Stein-Perlman |David Matolcsi |Daniel Kokotajlo
I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.
-
Nov 20, 2024 |
lesswrong.com | Zach Stein-Perlman
DeepSeek-R1-Lite-Preview was announced today. Post. Chatbot. Chinese blogpost translation. DeepSeek says it will release the weights. The model appears to be stronger than o1-preview on math, similar on coding, and weaker on everything else. DeepSeek is Chinese. I'm not really familiar with the company. I thought Chinese companies were at least a year behind the frontier.
-
Nov 6, 2024 |
lesswrong.com | Zach Stein-Perlman
The cleanest argument that current-day AI models will not cause a catastrophe is probably that they lack the capability to do so. However, as capabilities improve, we’ll need new tools for ensuring that AI models won’t cause a catastrophe even if we can’t rule out the capability. Anthropic’s Responsible Scaling Policy (RSP) categorizes levels of risk of AI systems into different AI Safety Levels (ASL), and each level has associated commitments aimed at mitigating the risks.
-
Nov 4, 2024 |
lesswrong.com | Zach Stein-Perlman
This is a reference post. It contains no novel facts and almost no novel analysis. The idea of responsible scaling policies is now over a year old. Anthropic, OpenAI, and DeepMind each have something like an RSP, and several other relevant companies have committed to publish RSPs by February. The core of an RSP is a risk assessment plan plus a plan for safety practices as a function of risk assessment results.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →