
Bogdan Ionut Cirstea
Articles
-
Dec 8, 2024 |
lesswrong.com | Bogdan Ionut Cirstea
Authors: Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Xu Han, Zhiyuan Liu, Maosong Sun. Abstract:Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable.
-
Nov 28, 2024 |
lesswrong.com | Bogdan Ionut Cirstea
Author: Yijiong Yu.Abstract:It has been well-known that Chain-of-Thought can remarkably enhance LLMs’ performance on complex tasks. However, because it also introduces slower inference speeds and higher computational costs, many researches have attempted to use implicit CoT, which does not need LLMs to explicitly generate the intermediate steps. But there is still gap between their efficacy and typical explicit CoT methods.
-
Nov 26, 2024 |
lesswrong.com | Bogdan Ionut Cirstea
Authors: Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel, Mor Geva. Abstract:We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like "In the year Scarlett Johansson was born, the Summer Olympics were hosted in the country of".
-
Nov 24, 2024 |
lesswrong.com | Bogdan Ionut Cirstea
Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel. Abstract:Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure ("disentangled" or "abstract" representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along orthogonal directions, thus facilitating feature-based generalization.
-
Nov 20, 2024 |
lesswrong.com | Bogdan Ionut Cirstea
Authors: Anonymous (I'm not one of them). Abstract:Most analysis of transformer expressivity treats the depth (number of layers) of a model as a fixed constant, and analyzes the kinds of problems such models can solve across inputs of unbounded length. In practice, however, the context length of a trained transformer model is bounded. Thus, a more pragmatic question is: What kinds of computation can a transformer perform on inputs of bounded length?
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →