To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention

This repository provides the code, data generation pipeline, training scripts, and evaluation scripts for DAS, the method introduced in our WWW 2026 paper:

To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention

DAS improves deep search agents by aligning the boundary between two actions: continuing to search and stopping to answer. The method diagnoses two types of decision errors, over-search and under-search, uses causal intervention to identify better decision preferences, and converts the resulting feedback into DPO data for post-training Search-R1 style agents.

The goal is to improve both:

Accuracy: answer more questions correctly.
Efficiency: reduce unnecessary search calls.

Paper

Conference: WWW '26, The ACM Web Conference 2026
DOI: 10.1145/3774904.3792235
arXiv: 2602.03304

Released Artifacts

Artifact	Link
DAS LoRA	reasonrag/das-lora-searchr1
DAS dpo data-searchr1-7b	reasonrag/das-dpo-data-searchr1-7b

The released LoRA and DPO data are provided for reproducing DAS-style post-training with LLaMA-Factory and FlashRAG.

Repository Layout

.
|-- FlashRAG/
|   `-- examples/methods/
|       |-- run_exp.py                         # Search-R1 / Search-O1 evaluation entry
|       |-- decision_data_generation.py        # over-search DPO pair construction
|       |-- hint_under_opd_generation.py       # under-search hint generation
|       |-- under_search_hint_chosen_generator.py
|       |-- merge_under_hint_reviewed.py
|       |-- dpo_quality_tool.py                # DPO data quality checks
|       `-- decision_dpo.yaml                  # legacy DPO config template
|-- scripts/
|   |-- das_generate_data.sh                   # end-to-end data generation scaffold
|   |-- das_train_dpo.sh                       # LLaMA-Factory DPO training
|   |-- das_infer_searchr1.sh                  # Search-R1 rollout / inference
|   `-- das_eval_searchr1.sh                   # FlashRAG evaluation
`-- README.md

Setup

Create the FlashRAG environment:

conda create -n flashrag python=3.10 -y
conda activate flashrag
pip install flashrag-dev --pre
pip install "flashrag-dev[full]"
pip install "vllm>=0.10.0" deepspeed peft huggingface_hub

Create the LLaMA-Factory environment:

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
conda create -n llama_factory python=3.10 -y
conda activate llama_factory
pip install -e ".[torch,metrics]"

Prepare FlashRAG datasets and retrieval index following the FlashRAG documentation. For the experiments in this project, the evaluation datasets are NQ, HotpotQA, and 2WikiMultiHopQA.

Data Generation

DAS data generation has three stages:

Run Search-R1 rollouts and save intermediate search trajectories.
Detect over-search and under-search decision errors through causal intervention.
Convert the causal feedback into DPO preference pairs.

Run the scaffold:

bash scripts/das_generate_data.sh \
  --dataset hotpotqa \
  --model search-r1 \
  --gpu 0

For convenience, the released data can be downloaded directly:

python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="reasonrag/das-dpo-data-searchr1-7b",
    repo_type="dataset",
    local_dir="data/das-dpo-data-searchr1-7b",
)
PY

Training

Train DAS with DPO in LLaMA-Factory:

Run:

export LLAMA_FACTORY_DIR=/path/to/LLaMA-Factory
export DATASET_REPO=reasonrag/das-dpo-data-searchr1-7b
export BASE_MODEL=/path/to/base/searchr1/model
export OUTPUT_DIR=$LLAMA_FACTORY_DIR/saves/das/searchr1-das

bash scripts/das_train_dpo.sh

The script downloads the DAS DPO data, registers it in dataset_info.json, writes a LLaMA-Factory training YAML, and launches DPO LoRA training.

Evaluation

Evaluate the base model:

bash scripts/das_eval_searchr1.sh \
  --dataset hotpotqa \
  --model search-r1 \
  --gpu 0

Evaluate with the released LoRA:

bash scripts/das_eval_searchr1.sh \
  --dataset hotpotqa \
  --model search-r1 \
  --lora reasonrag/das-lora-searchr1 \
  --gpu 0

Inference / Rollout

For rollout-style inference without changing the training data pipeline, use:

bash scripts/das_infer_searchr1.sh \
  --dataset nq \
  --model search-r1 \
  --lora reasonrag/das-lora-searchr1 \
  --gpu 0

Acknowledgements

We thank FlashRAG, LLaMA-Factory, and Search-R1 for their valuable open-source contributions.

Citation

If you find this repository helpful, please cite:

@inproceedings{zhang2026search,
  author    = {Wenlin Zhang and Kuicai Dong and Junyi Li and Yingyi Zhang and Xiaopeng Li and Pengyue Jia and Yi Wen and Derong Xu and Maolin Wang and Yichao Wang and Yong Liu and Xiangyu Zhao},
  title     = {To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention},
  booktitle = {Proceedings of the ACM Web Conference 2026},
  series    = {WWW '26},
  pages     = {2049--2059},
  year      = {2026},
  publisher = {ACM},
  doi       = {10.1145/3774904.3792235},
  url       = {https://doi.org/10.1145/3774904.3792235}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
FlashRAG		FlashRAG
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention

Paper

Released Artifacts

Repository Layout

Setup

Data Generation

Training

Evaluation

Inference / Rollout

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention

Paper

Released Artifacts

Repository Layout

Setup

Data Generation

Training

Evaluation

Inference / Rollout

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages