Joel Burget's profile photo

Joel Burget

Featured in:

Articles

  • Jun 13, 2024 | lesswrong.com | Joel Burget

    Today, Retired U.S. Army General Paul M. Nakasone has joined our Board of Directors. A leading expert in cybersecurity, Nakasone’s appointment reflects OpenAI’s commitment to safety and security, and underscores the growing significance of cybersecurity as the impact of AI technology continues to grow.

  • Jun 5, 2024 | lesswrong.com | Joel Burget

    Jack Clark's retrospective on GPT2 is full of interesting policy thoughts, I recommend reading the whole thing. One excerpt:I've come to believe that in policy "a little goes a long way" - it's far better to have a couple of ideas you think are robustly good in all futures and advocate for those than make a confident bet on ideas custom-designed for one specific future - especially if it's based on a very confident risk model that sits at some unknowable point in front of you.

  • May 23, 2024 | lesswrong.com | Joel Burget

    1. How Many Features are Active at Once? Previously I’ve seen the rule of thumb “20-100 for most models”. Anthropic says:For all three SAEs, the average number of features active (i.e. with nonzero activations) on a given token was fewer than 3002. Splitting SAEsHaving multiple different-sized SAEs for the same model seems useful. The dashboard shows feature splitting clearly. I hadn’t ever thought of comparing features from different SAEs using cosine similarity and plotting them together with UMAP.

  • May 15, 2024 | lesswrong.com | Joel Burget

    OpenAI has historically used "GPT-N" to mean "a model as capable as GPT-N where each GPT is around 100-200x more effective compute than the prior GPT". This applies even if the model was trained considerably later than the original GPT-N and is correspondingly more efficient due to algorithmic improvements. So, see for instance GPT-3.5-turbo, which corresponds to a model which is somewhere between GPT-3 and GPT-4.

  • Apr 19, 2024 | lesswrong.com | Joel Burget

    I previously expected open-source LLMs to lag far behind the frontier because they're very expensive to train and naively it doesn't make business sense to spend on the order of $10M to (soon?) $1B to train a model only to give it away for free.  But this has been repeatedly challenged, most recently by Meta's Llama 3. They seem to be pursuing something like a commoditize your complement strategy: https://twitter.com/willkurt/status/1781157913114870187 .

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →