
Mohammad Gheshlaghi
Articles
-
Jun 27, 2024 |
arxiv.org | Mohammad Gheshlaghi
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Jun 27, 2024 |
arxiv.org | Mohammad Gheshlaghi
-
May 29, 2024 |
arxiv.org | Pierre D Harvey |Mohammad Gheshlaghi |Bernardo Avila
-
Dec 1, 2023 |
arxiv.org | Mohammad Gheshlaghi |Zhaohan Daniel
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
-
May 25, 2023 |
arxiv.org | Mohammad Gheshlaghi
[Submitted on 11 Jan 2023 ( v1 ), last revised 25 May 2023 (this version, v2)] Title:An Analysis of Quantile Temporal-Difference Learning Download a PDF of the paper titled An Analysis of Quantile Temporal-Difference Learning, by Mark Rowland and 8 other authors Download PDF Abstract: We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement...
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →