thezvi.substack.com

Articles

o3 Turns Pro

1 week ago | thezvi.substack.com | Zvi Mowshowitz

You can now have o3 throw vastly more compute at a given problem. That’s o3-pro. Should you have o3 throw vastly more compute at a given problem, if you are paying the $200/month subscription price for ChatGPT Pro? Should you pay the $200, or the order of magnitude markup over o3 to use o3-pro in the API? That’s trickier. Sometimes yes. Sometimes no. My experience so far is that waiting a long time is annoying, sufficiently annoying that you often won’t want to wait.
Google I/O Day

1 month ago | thezvi.substack.com | Zvi Mowshowitz

What did Google announce on I/O day? Quite a lot of things. Many of them were genuinely impressive. Google is secretly killing it on the actual technology front. Logan Kilpatrick (DeepMind): Google's progress in AI since last year:- The worlds strongest models, on pareto frontier- Gemini app: has over 400M monthly active users- We now process 480T tokens a month, up 50x YoY- Over 7M developers have built with the Gemini API (4x)Much more to come still!I think? It’s so hard to keep track.
America Makes AI Chip Diffusion Deal with UAE and KSA

1 month ago | thezvi.substack.com | Zvi Mowshowitz

Our government, having withdrawn the new diffusion rules, has now announced an agreement to sell massive numbers of highly advanced AI chips to UAE and Saudi Arabia (KSA). This post analyzes that deal and that decision.
GPT-4o Is An Absurd Sycophant

2 months ago | thezvi.substack.com | Zvi Mowshowitz

GPT-4o tells you what it thinks you want to hear. The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences. (Also some other interesting choices, like having no NSFW filter, but that one’s good.)People like Janus and Near Cyan tried to warn us, even more than usual. Then OpenAI combined this with full memory, and updated GPT-4o sufficiently that many people (although not I) tried using it in the first place.
o3 Is a Lying Liar

2 months ago | thezvi.substack.com | Zvi Mowshowitz

I love o3. I’m using it for most of my queries now. But that damn model is a lying liar. Who lies. This post covers that fact, and some related questions. The biggest thing to love about o3 is it just does things. You don’t need complex or multi-step prompting, ask and it will attempt to do things. Ethan Mollick: o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt. It just does things.
On GPT-4.5

Mar 3, 2025 | thezvi.substack.com | Zvi Mowshowitz

It’s happening. The question is, what is the it that is happening? An impressive progression of intelligence? An expensive, slow disappointment? Something else? The evals we have available don’t help us that much here, even more than usual. My tentative conclusion is it’s Secret Third Thing. It’s a different form factor, with unique advantages, that is hard to describe precisely in words.
On Emergent Misalignment

Feb 28, 2025 | thezvi.substack.com | Zvi Mowshowitz

One hell of a paper dropped this week. It turns out that if you fine-tune models, especially GPT-4o and Qwen2.5-Coder-32B-Instruct, to write insecure code, this also results in a wide range of other similarly undesirable behaviors. They more or less grow a mustache and become their evil twin. More precisely, they become antinormative. They do what seems superficially worst. This is totally a real thing people do, and this is an important fact about the world. The misalignment here is not subtle.
Time to Welcome Claude 3.7

Feb 26, 2025 | thezvi.substack.com | Zvi Mowshowitz

Anthropic has reemerged from stealth and offers us Claude 3.7.Given this is named Claude 3.7, an excellent choice, from now on this blog will refer to what they officially call Claude Sonnet 3.5 (new) as Sonnet 3.6. Claude 3.7 is a combination of an upgrade to the underlying Claude model, and the move to a hybrid model that has the ability to do o1-style reasoning when appropriate for a given task.
AI #103: Show Me the Money

Feb 13, 2025 | thezvi.substack.com | Zvi Mowshowitz

The main event this week was the disastrous Paris AI Anti-Safety Summit. Not only did we not build upon the promise of the Bletchley and Seoul Summits, the French and Americans did their best to actively destroy what hope remained, transforming the event into a push for a mix of nationalist jingoism, accelerationism and anarchism. It’s vital and also difficult not to panic or despair, but it doesn’t look good.
o3-mini Early Days and the OpenAI AMA

Feb 3, 2025 | thezvi.substack.com | Zvi Mowshowitz

New model, new hype cycle, who dis?