Papers
arxiv:2509.15974

BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes

Published on Apr 19
Authors:
,

Abstract

Fine-tuning bias terms in large language models reveals that directly optimizing the value projection bias (bv) yields superior downstream performance compared to query and key projection biases, especially in low-data scenarios.

AI-generated summary

Fine-tuning the bias terms of large language models (LLMs) has the potential to achieve unprecedented parameter efficiency while maintaining competitive performance, particularly in low-data regimes. However, the link between fine-tuning different bias terms (i.e., b_q, b_k, and b_v in the query, key, or value projections) and downstream performance remains largely unclear to date. In this paper, we investigate the link between fine-tuning b_q, b_k, and b_v with the performance of the downstream task. Our key finding is that directly fine-tuning b_v generally leads to higher downstream performance in low-data regimes, in comparison to b_q and b_k. We extensively evaluate this unique property across a wide range of LLMs spanning encoder-only and decoder-only architectures up to 6.7B parameters (including bias-free LLMs). Our results provide strong evidence for the effectiveness of directly fine-tuning b_v across various downstream tasks. The implementation code is available at https://github.com/whubaichuan/BEFT.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.15974 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.15974 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.15974 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.