
Shangtong Zhang
Featured in:
arxiv.org
Articles
-
Nov 20, 2024 |
arxiv.org | Zixuan Xie |Xinyu Liu |Shangtong Zhang
-
Sep 18, 2024 |
arxiv.org | Shangtong Zhang
-
Aug 2, 2023 |
arxiv.org | Shangtong Zhang
Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximation and bootstrapping, two arguably indispensable ingredients for large-scale reinforcement learning.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →