
Matej Cief
Featured in:
amazon.science
Articles
-
Jan 19, 2024 |
amazon.science | Jacek R. Golebiowski |Philipp Schmidt |Artur Bekasov |Matej Cief
Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy. However, when the number of actions is large, or certain actions are under-explored by the logging policy, existing estimators based on inverse-propensity scoring (IPS) can have a high or even infinite variance.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →