Henry Papadatos

Featured in:

Articles

Towards Quantitative AI Risk Management — LessWrong

Oct 16, 2024 | lesswrong.com | Henry Papadatos

Reading guidelines: If you are short on time, just read the section “The importance of quantitative risk tolerance & how to turn it into actionable signals”Tl;dr: We have recently published an AI risk management framework. This framework draws from both existing risk management approaches and AI risk management practices. We then adapted it into a rating system with quantitative and well-defined criteria to assess AI developers' implementation of adequate AI risk management.
Your LLM Judge may be biased — LessWrong

Mar 29, 2024 | lesswrong.com | Rachel A. Freedman |Henry Papadatos

AI safety researchers often rely on LLM “judges” to qualitatively evaluate the output of separate LLMs. We try this for our own interpretability research, but find that our LLM judges are often deeply biased. For example, we use Llama2 to judge whether movie reviews are more “(A) positive” or “(B) negative”, and find that it almost always answers “(B)”, even when we switch the labels or order of these alternatives.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.