
Thomas Kwa
Articles
-
Oct 2, 2024 |
lesswrong.com | Thomas Kwa
Suppose you can tell your AI to meet a certain spec (e.g. cure cancer), but most plans that meet the spec are unsafe (e.g. involve killing everyone, or so Rob Bensinger thinks). In these cases, a quantilizer is insufficient for safety due to instrumental convergence. But suppose we can also give the agent a dispreference for unsafe actions like murder. In effect it has unsafe long-term goals but we control its immediate preferences.
-
Sep 28, 2024 |
lesswrong.com | Raymond Arnold |Ben Pace |Thomas Kwa |Logan Riggs
One year ago, many people on LessWrong received a DM asking them to choose the most important virtue of Petrov Day, with four listed options that we'd seen people argue for in previous years.
-
Sep 24, 2024 |
lesswrong.com | Eliezer Yudkowsky |Lucius Bushnaq |Thomas Kwa |Joey KL
Crossposted from Twitter with Eliezer's permissionA common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he'll give you $77.18. Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.
-
May 17, 2024 |
lesswrong.com | Thomas Kwa
This is the appendix to the previous post on Goodhart’s Law and KL regularization, containing all of our proofs. Theorem about distributionsTheorem 1: Given any heavy-tailed reference distribution Q over R with mean μQ, and any M,ϵ>0, there is a distribution P with mean μP>M and DKL(P∥Q)<ϵ. Proof: WLOG let μQ=0. We construct a sequence of distributions {Pt} such that limt→∞EPt[X]≥c for any constant c, and limt→∞DKL(Pt∥Q)=0. We define Pt for any t>c thusly.
-
Mar 14, 2024 |
lesswrong.com | Ricki Heicklen |Thomas Kwa |Nathan Helm-Burger |Ben Pace
“I refuse to join any club that would have me as a member” -MarxAdverse Selection is the phenomenon in which information asymmetries in non-cooperative environments make trading dangerous. It has traditionally been understood to describe financial markets in which buyers and sellers systematically differ, such as a market for used cars in which sellers have the information advantage, where resulting feedback loops can lead to market collapses.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →