Ahead of AI

Articles

Understanding and Coding the KV Cache in LLMs from Scratch

1 week ago | magazine.sebastianraschka.com | Sebastian Raschka

KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-efficient LLM inference in production. This article explains how they work conceptually and in code with a from-scratch, human-readable implementation. It's been a while since I shared a technical tutorial explaining fundamental LLM concepts.
The State of Reinforcement Learning for LLM Reasoning

2 months ago | magazine.sebastianraschka.com | Sebastian Raschka

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. Meanwhile, competitors such as xAI and Anthropic have added more reasoning capabilities and features into their models.
The State of LLM Reasoning Models

Mar 8, 2025 | magazine.sebastianraschka.com | Sebastian Raschka

Improving the reasoning abilities of large language models (LLMs) has become one of the hottest topics in 2025, and for good reason. Stronger reasoning skills allow LLMs to tackle more complex problems, making them more capable across a wide range of tasks users care about. In the last few weeks, researchers have shared a large number of new strategies to improve reasoning, including scaling inference-time compute, reinforcement learning, supervised fine-tuning, and distillation.
Understanding Reasoning LLMs

Feb 6, 2025 | magazine.sebastianraschka.com | Sebastian Raschka

This article describes the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this provides valuable insights and helps you navigate the rapidly evolving literature and hype surrounding this topic. In 2024, the LLM field saw increasing specialization. Beyond pre-training and fine-tuning, we witnessed the rise of specialized applications, from RAGs to code assistants.
Building A GPT-Style LLM Classifier From Scratch

Sep 21, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

In this article, I want to show you how to transform pretrained large language models (LLMs) into strong text classifiers. But why focus on classification? First, finetuning a pretrained model for classification offers a gentle yet effective introduction to model finetuning. Second, many real-world and business challenges revolve around text classification: spam detection, sentiment analysis, customer feedback categorization, topic labeling, and more.
Building LLMs from the Ground Up: A 3-hour Coding Workshop

Aug 31, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

If you’d like to spend a few hours this weekend to dive into Large Language Models (LLMs) and understand how they work, I've prepared a 3-hour coding workshop presentation on implementing, training, and using LLMs. Below, you'll find a table of contents to get an idea of what this video covers (the video itself has clickable chapter marks, allowing you to jump directly to topics of interest): 0:00 – Workshop overview 2:17 – Part 1: Intro to LLMs 9:14 – Workshop materials 10:48 – Part 2:...
New LLM Pre-training and Post-training Paradigms

Aug 17, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

The development of large language models (LLMs) has come a long way, from the early GPT models to the sophisticated open-weight LLMs we have today. Initially, the LLM training process focused solely on pre-training, but it has since expanded to include both pre-training and post-training. Post-training typically encompasses supervised instruction fine-tuning and alignment, which was popularized by ChatGPT. Training methodologies have evolved since ChatGPT was first released.
LLM Research Insights: Instruction Masking and New LoRA Finetuning Experiments

Jun 2, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

This month, I am covering three new papers related to instruction finetuning and parameter-efficient finetuning with LoRA in large language models (LLMs). I work with these methods on a daily basis, so it's always exciting to see new research that provides practical insights.
A LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM research

Mar 3, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

Once again, this has been an exciting month in AI research. This month, I'm covering two new openly available LLMs, insights into small finetuned LLMs, and a new parameter-efficient LLM finetuning technique. The two LLMs mentioned above stand out for several reasons. One LLM (OLMo) is completely open source, meaning that everything from the training code to the dataset to the log files is openly shared.
Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch

Feb 18, 2024 | magazine.sebastianraschka.com | Sebastian Raschka

Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit a specific, often smaller, dataset by adjusting only a small, low-rank subset of the model's parameters. This approach is important because it allows for efficient finetuning of large models on task-specific data, significantly reducing the computational cost and time required for finetuning.