
Siyan Zhao
Articles
-
1 month ago |
amazon.science | Siyan Zhao |Mingyi Hong |Yang Liu |Devamanyu Hazarika
Large Language Models (LLMs) are increasingly used as chatbots, yet their ability to personalize responses to user preferences remains limited. We introduce PREFEVAL, a benchmark for evaluating LLMs’ ability to infer, memorize and adhere to user preferences in a long-context conversational setting. PREFEVAL comprises 3,000 manually curated user preference and query pairs spanning 20 topics.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →