Shaohan Huang

Featured in: Favicon

hindawi.com

Articles

You Only Cache Once: Decoder-Decoder Architectures for Language Models

May 8, 2024 | arxiv.org | Li Dong |Yi Zhu |Shaohan Huang |Wenhui Wang
BitNet: Scaling 1-bit Transformers for Large Language Models

Oct 17, 2023 | arxiv.org | Shuming Ma |Li Dong |Shaohan Huang |Huaijie Wang

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Retentive Network: A Successor to Transformer for Large Language Models

Jul 17, 2023 | arxiv.org | Li Dong |Shaohan Huang |Shuming Ma |Yuqing Xia

[Submitted on 17 Jul 2023 ( v1 ), last revised 9 Aug 2023 (this version, v4)] Title:Retentive Network: A Successor to Transformer for Large Language Models Download a PDF of the paper titled Retentive Network: A Successor to Transformer for Large Language Models, by Yutao Sun and 7 other authors Download PDF Abstract: In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference,...
LongNet: Scaling Transformers to 1,000,000,000 Tokens

Jul 5, 2023 | arxiv.org | Shuming Ma |Li Dong |Xingxing Zhang |Shaohan Huang

arXiv:2307.02486 (cs) Download a PDF of the paper titled LongNet: Scaling Transformers to 1,000,000,000 Tokens, by Jiayu Ding and 6 other authors Download PDF Submission history From: Shuming Ma [ view email] [v1] Wed, 5 Jul 2023 17:59:38 UTC (219 KB) Bibliographic Tools Bibliographic Explorer Toggle Bibliographic Explorer () Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Links to Code Toggle CatalyzeX Code Finder...
Kosmos-2: Grounding Multimodal Large Language Models to the World

Jun 26, 2023 | arxiv.org | Wenhui Wang |Li Dong |Shaohan Huang |Yaru Hao

[Submitted on 26 Jun 2023 ( v1 ), last revised 27 Jun 2023 (this version, v2)] Title:Kosmos-2: Grounding Multimodal Large Language Models to the World Download a PDF of the paper titled Kosmos-2: Grounding Multimodal Large Language Models to the World, by Zhiliang Peng and 6 other authors Download PDF Abstract: We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual...