Shangtong Zhang

Featured in: Favicon

Articles

[2411.13711] Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

Nov 20, 2024 | arxiv.org | Zixuan Xie |Xinyu Liu |Shangtong Zhang
[2409.12135] Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Sep 18, 2024 | arxiv.org | Shangtong Zhang
[2308.01170] Direct Gradient Temporal Difference Learning

Aug 2, 2023 | arxiv.org | Shangtong Zhang

Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximation and bootstrapping, two arguably indispensable ingredients for large-scale reinforcement learning.

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →