Articles
-
Oct 19, 2024 |
arxiv.org | Anh Tuan |Anh Tuấn
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
-
Oct 7, 2024 |
arxiv.org | Anh Tuan |Anh Tuấn
[Submitted on 7 Oct 2024 ( v1 ), last revised 25 Oct 2024 (this version, v2)] Title:As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss View a PDF of the paper titled As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss, by Xin Mao and 5 other authors View PDF HTML (experimental) Abstract:Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF)...
-
Oct 6, 2024 |
arxiv.org | Anh Tuan |Anh Tuấn
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →