Page de couverture de Transformer-Squared: Self-Adaptive LLMs

Transformer-Squared: Self-Adaptive LLMs

Transformer-Squared: Self-Adaptive LLMs

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

In this episode we’re diving into “Transformer-Squared: Self-Adaptive LLMs” — a new framework for adapting large language models to unseen tasks on the fly by tuning only a small part of their weights. The central idea is Singular Value Fine-Tuning (SVF), a parameter-efficient fine-tuning technique that decomposes each weight matrix with Singular Value Decomposition (SVD) and then only trains a small vector that scales the singular values. These vectors become compact “expert” modules that specialize in different tasks and, unlike traditional methods like LoRA, can be composed, mixed, and reused because they’re in a principled, orthogonal basis.

During inference, Transformer-Squared runs a two-pass process — the first pass identifies the task or context, and the second pass combines the appropriate expert vectors to dynamically adapt the model’s behavior in real time. Across benchmarks and architectures, SVF consistently outperforms LoRA despite requiring orders of magnitude fewer parameters, and the framework even shows versatility on multimodal tasks like vision-language.

If you’re into efficient adaptation, reinforcement-learning optimization of model components, and self-organizing AI systems, this paper is a big step toward real-time adaptive foundation models. Read the full paper here: https://arxiv.org/pdf/2501.06252

Pas encore de commentaire