Yuqing Xia

Featured in: Favicon

nature.com Favicon

wiley.com (+1) Favicon

Articles

Differential Transformer

Oct 8, 2024 | huggingface.co | Tianzhu Ye |Li Dong |Yuqing Xia |Yutao Sun

Abstract Transformer tends to overallocate attention to irrelevant context. In thiswork, we introduce Diff Transformer, which amplifies attention to the relevantcontext while canceling noise. Specifically, the differential attentionmechanism calculates attention scores as the difference between two separatesoftmax attention maps. The subtraction cancels noise, promoting the emergenceof sparse attention patterns.
[2410.05258] Differential Transformer

Oct 7, 2024 | arxiv.org | Li Dong |Yuqing Xia |Yutao Sun |Yi Zhu

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Retentive Network: A Successor to Transformer for Large Language Models

Jul 17, 2023 | arxiv.org | Li Dong |Shaohan Huang |Shuming Ma |Yuqing Xia

[Submitted on 17 Jul 2023 ( v1 ), last revised 9 Aug 2023 (this version, v4)] Title:Retentive Network: A Successor to Transformer for Large Language Models Download a PDF of the paper titled Retentive Network: A Successor to Transformer for Large Language Models, by Yutao Sun and 7 other authors Download PDF Abstract: In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference,...

Contact details

Emails

[email protected]

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →