
Yutao Sun
Articles
-
Oct 8, 2024 |
huggingface.co | Tianzhu Ye |Li Dong |Yuqing Xia |Yutao Sun
Abstract Transformer tends to overallocate attention to irrelevant context. In thiswork, we introduce Diff Transformer, which amplifies attention to the relevantcontext while canceling noise. Specifically, the differential attentionmechanism calculates attention scores as the difference between two separatesoftmax attention maps. The subtraction cancels noise, promoting the emergenceof sparse attention patterns.
-
Oct 7, 2024 |
arxiv.org | Li Dong |Yuqing Xia |Yutao Sun |Yi Zhu
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →