Matej Cief

Featured in: Favicon

Articles

Learning action embeddings for off-policy evaluation

Jan 19, 2024 | amazon.science | Jacek R. Golebiowski |Philipp Schmidt |Artur Bekasov |Matej Cief

Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy. However, when the number of actions is large, or certain actions are under-explored by the logging policy, existing estimators based on inverse-propensity scoring (IPS) can have a high or even infinite variance.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.