Tianzhu Ye's profile photo

Tianzhu Ye

Featured in:

Articles

  • Oct 8, 2024 | huggingface.co | Tianzhu Ye |Li Dong |Yuqing Xia |Yutao Sun

    Abstract Transformer tends to overallocate attention to irrelevant context. In thiswork, we introduce Diff Transformer, which amplifies attention to the relevantcontext while canceling noise. Specifically, the differential attentionmechanism calculates attention scores as the difference between two separatesoftmax attention maps. The subtraction cancels noise, promoting the emergenceof sparse attention patterns.

Contact details

Socials & Sites

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.

Start Your 7-Day Free Trial →