Sam Bowman

Featured in:

Articles

The Checklist: What Succeeding at AI Safety Will Involve — LessWrong

Sep 3, 2024 | lesswrong.com | Sam Bowman

This piece reflects my current best guess at the major goals that Anthropic (or another similarly positioned AI developer) will need to accomplish to have things go well with the development of broadly superhuman AI. Given my role and background, it’s disproportionately focused on technical research and on averting emerging catastrophic risks.
LLM Evaluators Recognize and Favor Their Own Generations — LessWrong

Apr 17, 2024 | lesswrong.com | Arjun Panickssery |Sam Bowman |Shi Feng

Self-evaluation using LLMs is used in reward modeling, model-based benchmarks like GPTScore and AlpacaEval, self-refinement, and constitutional AI. LLMs have been shown to be accurate at approximating human annotators on some tasks. But these methods are threatened by self-preference, a bias in which an LLM evaluator scores its own outputs higher than than texts written by other LLMs or humans, relative to the judgments of human annotators.
Tips on Balancing Parenthood and Remote Work: Practical Strategies for Productivity

Sep 25, 2023 | mommybites.com | Sam Bowman

Whether you’re a parent with a remote job or you’re looking for an opportunity to work from home, you need to figure out how to do so while raising your kids. From moving to a better space to creating boundaries, there are many tactics you can try to juggle all of your important responsibilities so you can focus on your work and your family. Control Your CircumstancesThe best way to effectively balance both work and parenthood is to control your circumstances so you have fewer surprises.
A Mother's Guide to Planning a Safe and Memorable Vacation for Your Kids

Aug 4, 2023 | mommybites.com | Sam Bowman

A Mother’s Guide to Planning a Safe and Memorable Vacation for Your KidsWhether you’re just traveling as a family unit or making an adventure with another group, moms need to be in full control. There’s a lot to prepare for and a lot that can happen, and by planning ahead, you can be ready for everything. So, if you have some vacation dates on the calendar, use these helpful tips to ensure the trip goes off without a hitch.
Reducing sycophancy and improving honesty via activation steering — LessWrong

Jul 28, 2023 | lesswrong.com | Nina Rimsky |Ethan Perez |Sam Bowman |Logan Riggs

Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Evan Hubinger. I generate an activation steering vector using Anthropic's sycophancy dataset and then find that this can be used to increase or reduce performance on TruthfulQA, indicating a common direction between sycophancy on questions of opinion and untruthfulness on questions relating to common misconceptions.