Artur Bekasov

Featured in: Favicon

Articles

Learning action embeddings for off-policy evaluation

Mar 27, 2024 | amazon.science | Jacek R. Golebiowski |Philipp Schmidt |Artur Bekasov |Huijun Yu

This repository contains code for evaluating the methods proposed in Learning action embeddings for off-policy evaluation. To get started, we recommend checking the Example.ipynb notebook as it clearly demonstrates benefits of the proposed method from Section 3 and implements everything in a few lines of code. To run the notebook, you only need python 3 with standard machine learning libraries.
Learning action embeddings for off-policy evaluation

Jan 19, 2024 | amazon.science | Jacek R. Golebiowski |Philipp Schmidt |Artur Bekasov |Matej Cief

Off-policy evaluation (OPE) methods allow us to compute the expected reward of a policy by using the logged data collected by a different policy. However, when the number of actions is large, or certain actions are under-explored by the logging policy, existing estimators based on inverse-propensity scoring (IPS) can have a high or even infinite variance.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.