CADEQS

This repository contains materials developed by LY Corporation and is temporarily open-sourced as the official implementation of our SIGIR 2026 paper.

Temporary Release: This repository is temporarily available as open-source. Therefore this repository may be turn into read-only or private anytime.
Attribution: All code and materials in this repository are owned by LY Corporation.

Project Overview

This repository is the official implementation of our SIGIR 2026 paper: "Recasting Web-Scale Query Suggestion as Dense Retrieval: Efficient, Up-to-Date, and Context-Aware Suggestions."

In this work, we reframe web-scale query suggestion as dense retrieval (instead of generation) to jointly address three practical requirements: low latency, reflection of evolving information, and session context awareness.

The repository includes our proposed method CADEQS (Context-aware Asymmetric Dual Encoder for Query Suggestion), along with data preprocessing, evaluation scripts, and baseline implementations. The method is referred to as CADE-QS in the SIGIR 2026 paper.

If you use this repository, please cite:

@inproceedings{recasting_qs_sigir2026,
  title     = {Recasting Web-Scale Query Suggestion as dense retrieval: Efficient, Up-to-Date, and Context-Aware Suggestions},
  author    = {Nishikawa, Sosuke and Yoshinaga, Naoki and Kaji, Nobuhiro},
  booktitle = {Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year      = {2026},
  publisher = {Association for Computing Machinery},
  series = {SIGIR '26}
}

Installation and Usage

Installation

Python 3.10+ required.

pip install -e .

For baselines/llm, install vLLM separately according to your environment: https://docs.vllm.ai/en/latest/getting_started/installation/

Usage

Run a CADEQS pipeline on AOL-stle query logs:

# 1) Preprocess AOL-style logs
# src_dir should contain files like:
#   user-ct-test-collection-01.txt.gz
#   ...
python data/preprocess.py \
  --src_dir /path/to/aol-user-ct-collection \
  --output_dir /tmp/qs_aol

# 2) Prune query encoder (first + last two layers)
python -m cadeqs.prune \
  --model_name_or_path MiniLMv2-L6-H768-distilled-from-BERT-Base \
  --output_dir /tmp/qs_aol/query_encoder_pruned

# 3) Train CADEQS
python -m cadeqs.train \
  --model_name_or_path /tmp/qs_aol/query_encoder_pruned \
  --candidate_model_name_or_path MiniLMv2-L6-H768-distilled-from-BERT-Base \
  --train_file /tmp/qs_aol/train.jsonl \
  --output_dir /tmp/qs_aol/cade_qs_model

# 4) CADEQS inference
python -m cadeqs.infer \
  --query_encoder_path /tmp/qs_aol/cade_qs_model/query_encoder \
  --candidate_encoder_path /tmp/qs_aol/cade_qs_model/candidate_encoder \
  --corpus_path /tmp/qs_aol/inventory.jsonl \
  --test_file /tmp/qs_aol/test.jsonl \
  --output_file /tmp/qs_aol/cade_qs_predictions.jsonl \
  --index_path /tmp/qs_aol/cade_qs_faiss.index

# 5) BERT Score evaluation
python -m evaluation.evaluate \
  --bert-evaluator microsoft/deberta-large \
  --input /tmp/qs_aol/cade_qs_predictions.jsonl \
  --metrics bert_score \
  --output /tmp/qs_aol/cade_qs_predictions_eval.jsonl

For baseline implementations and usage examples, see baselines/README.md.

Contributions

As this project is temporarily open-sourced, we are not accepting contributions. For feedback or inquiries, please open an issue in this repository.

License

This code is dedicated to the public domain under CC0 1.0. You may copy, modify, and distribute it without restriction, and the authors make no warranties or guarantees regarding its use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
cadeqs		cadeqs
data		data
evaluation		evaluation
figures		figures
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CADEQS

Project Overview

Installation and Usage

Installation

Usage

Contributions

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CADEQS

Project Overview

Installation and Usage

Installation

Usage

Contributions

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages