
Jordan Taylor
Articles
-
Jun 25, 2024 |
lesswrong.com | Jordan Taylor
At what point will it no longer be useful for humans to be involved in the process of alignment research? After the first slightly-superhuman AGI, well into superintelligence, or somewhere in between?
-
May 17, 2024 |
lesswrong.com | Dan Braun |Jordan Taylor |Lee Sharkey |Nicholas Goldowsky-Dill
A short summary of the paper is presented below. This work was produced by Apollo Research in collaboration with Jordan Taylor (MATS + University of Queensland) . TL;DR: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted.
-
Oct 4, 2023 |
lesswrong.com | Jordan Taylor
-
Aug 26, 2023 |
lesswrong.com | Daniel Murfet |Wei Dai |Roman Leventov |Jordan Taylor
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try to give a concise summary of how each distinct problem would be solved by Safeguarded AI (formerly known as an Open Agency Architecture, or OAA), if it turns out to be feasible. 1. Value is fragile and hard to specify.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →