Vikrant Varma

Featured in: Favicon

wiley.com

onlinelibrary.wiley.com

Articles

JumpReLU SAEs + Early Access to Gemma 2 SAEs — LessWrong

Jul 19, 2024 | lesswrong.com | Neel Nanda |Senthooran Rajamanoharan |Tom Lieberum |Vikrant Varma

New paper from the Google DeepMind mechanistic interpretability team, led by Sen Rajamanoharan!We introduce JumpReLU SAEs, a new SAE architecture that replaces the standard ReLUs with discontinuous JumpReLU activations, and seems to be (narrowly) state of the art over existing methods like TopK and Gated SAEs for achieving high reconstruction at a given sparsity level, without a hit to interpretability.
Discussion: Challenges with Unsupervised LLM Knowledge Discovery — LessWrong

Dec 18, 2023 | lesswrong.com | Vikrant Varma |Vlad Mikulik |Rohin Shah |Seb Farquhar

TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly helpful for implementations of alignment strategies (>95%). Instead of finding knowledge, it seems to find the most prominent feature. We are less sure about the wider category of unsupervised consistency-based methods, but tend to think they won’t be directly helpful either (70%). We’ve written a paper about some of our detailed experiences with it.
Explaining grokking through circuit efficiency — LessWrong

Sep 8, 2023 | lesswrong.com | Vikrant Varma |Rohin Shah

Unless by "shrugs" you mean the details of what the partial hypothesis says in this particular case are still being worked out. Yes, that's what I mean. I do agree that it's useful to know whether a partial hypothesis says anything or not; overall I think this is good info to know / ask for. I think I came off as disagreeing more strongly than I actually did, sorry about that. Do you have any plans to do this?

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →

Vikrant Varma

Articles

JumpReLU SAEs + Early Access to Gemma 2 SAEs — LessWrong

Discussion: Challenges with Unsupervised LLM Knowledge Discovery — LessWrong

Explaining grokking through circuit efficiency — LessWrong

Contact details

Emails

Socials & Sites