Seungwook Han, Jineop Song*, Jeff Gore, Pulkit Agrawal*

MIT

Spotlight Paper in ICML 2025

📃 Paper

Introduction

Hendel et al. and Todd et al. first discovered “task vector,” task-specific intermediate representations in LLMs for in-context learning (ICL) that trigger a desired behavior from the model. This discovery is not only intriguing, but also has become foundations in diverse mechanistic interpretability research on representations and internal LLM mechanisms. However, it remains unknown how task vectors form during pretraining, motivating our work.

Overview of encoder-decoder perspective.

Overview of encoder-decoder perspective.

In this work, we introduce an encoder–decoder perspective to address the questions around the emergence of task vectors. During pretraining of autoregressive transformers, We find that task encoding emerges as the model learns to map different latent tasks into distinctseparable representation spaces. Surprisingly, this geometric structuring of the representation space is coupled with the development of task-specific in-context learning algorithms—namely, task decoding. We argue that task vector emerges by the following two hypotheses:

Hypothesis 1

Over training, transformers learn separable representations by latent task -- task encoding. Simultaneously, the model learns task-specific algorithms by leveraging the separable representation spaces -- task decoding. We illustrate that these two processes combined manifest as task vectors, and that they emerge concurrently during training.

Hypothesis 2

Furthermore, we extend our findings to predict the effectiveness of task vectors based on their representations with a novel geometric metric Task Decodability (TD). TD quantifies how effectively a model can decode the task from its intermediate representations. We demonstrate that TD score is causally related to ICL performance through ablation studies on ICL examples, fine-tuning, and prompting.

In pretrained LLMs, task decodability (TD) varies across tasks and determines the effectiveness of the task vector. From TD, the downstream ICL performance then can be predicted.

Task Encoding and Decoding Emerges Simultaneously

We observe that, during training of a small transformer on a mixture of sparse linear‐regression tasks, both task encoding (the separation of representations) and task decoding (the development of task‐specific algorithms) emerge together. Through activation patching experiments, we demonstrate that the tasks clustered in their latent representations in fact share the same algorithm for solving the regression task. Only when each of the tasks become fully separable, the model is able to correctly identify the latent task and develop a different optimal algorithm for each one. We illustrate that these two processes combined manifest as task vectors, and that they emerge concurrently during training.