arxiv:2606.16817

Understanding the Behaviors of Environment-aware Information Retrieval

Published on Jun 15

· Submitted by

Hou Pong (Ken) Chan on Jun 19

LCO-Embedding

Upvote

Authors:

Abstract

Large language models can be trained via reinforcement learning to adapt query formulation strategies for different retrievers, with distinct optimal query styles and improved performance through retriever-specific guidance and model scaling.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we present the first systematic analysis of how LLMs can learn to adapt their query formulation strategies for different retrievers via reinforcement learning (RL). Our empirical study reveals that RL effectively teaches an LLM to tailor its queries to specific retriever characteristics. We discover that different retrievers exhibit surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), suggesting strategies learned for one retriever ineffective for another. We further show that performance can be enhanced by incorporating retriever-specific human guidance and by scaling model size. To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability. Our work provides the first empirical evidence and actionable insights for building truly retriever-aware RAG systems. Code and resources are available at https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval.

View arXiv page View PDF GitHub 5 Add to collection

Community

kenchan0226

Paper submitter about 8 hours ago

Search agents are usually optimized around one or a few “search environments”, whether web search APIs or local search built with a single retriever.
In practice, search environments are diverse, shaped by the retriever’s behavior, the indexing pipeline, the corpus distribution and quality, and the interaction interface.
Can search agents adapt their search strategies to different environments? More fundamentally, are they even aware when they're placed in different environments?

We believe this calls for a new research direction: 𝗘𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁-𝗮𝘄𝗮𝗿𝗲 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹. Our ACL 2026 work takes an initial step toward this goal by studying one core factor: how search agents adapt to different retriever behaviors, and how much this adaptation matters. Check it out!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.16817

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.16817 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16817 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16817 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.