Articles

  • Oct 8, 2024 | huggingface.co | Tianzhu Ye |Li Dong |Yuqing Xia |Yutao Sun

    Abstract Transformer tends to overallocate attention to irrelevant context. In thiswork, we introduce Diff Transformer, which amplifies attention to the relevantcontext while canceling noise. Specifically, the differential attentionmechanism calculates attention scores as the difference between two separatesoftmax attention maps. The subtraction cancels noise, promoting the emergenceof sparse attention patterns.

  • Oct 7, 2024 | arxiv.org | Li Dong |Yuqing Xia |Yutao Sun |Yi Zhu

    arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

  • Jul 17, 2023 | arxiv.org | Li Dong |Shaohan Huang |Shuming Ma |Yuqing Xia

    [Submitted on 17 Jul 2023 ( v1 ), last revised 9 Aug 2023 (this version, v4)] Title:Retentive Network: A Successor to Transformer for Large Language Models Download a PDF of the paper titled Retentive Network: A Successor to Transformer for Large Language Models, by Yutao Sun and 7 other authors Download PDF Abstract: In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference,...

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →