
Tianzhu Ye
Featured in:
Articles
-
Oct 8, 2024 |
huggingface.co | Tianzhu Ye |Li Dong |Yuqing Xia |Yutao Sun
Abstract Transformer tends to overallocate attention to irrelevant context. In thiswork, we introduce Diff Transformer, which amplifies attention to the relevantcontext while canceling noise. Specifically, the differential attentionmechanism calculates attention scores as the difference between two separatesoftmax attention maps. The subtraction cancels noise, promoting the emergenceof sparse attention patterns.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →