Skip to content

ChocoWu/DACo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DACo

arXiv License

Official Repository for the paper: Global Commander and Local Operative: A Dual-Agent Framework for Scene Navigation

Kaiming Jin, Yuefan Wu, Shengqiong Wu*, Hao Fei, Bobo Li, Shuicheng Yan and Tat-seng Chua. (*Correspondence)

National University, Simon Fraser University, University of Oxford

📃 Table of Contents

📬 News

  • [2026-02-24] 🎉 We have released the initial version of our paper!
  • [2026-02-21] 🎉 We have released our code and dataset used in the paper!

💡 Overview

  • We propose DACo, a novel role-specialized dual-agent architecture that structurally decomposes global planning and local execution for LVLM-based navigation. The overview framework is shown in the figure below:
  • We introduce dynamic planning and adaptive replanning mechanisms that enhance stability and interpretability in long-horizon navigation. As illustrated in the following figure, Rather than operating in isolation, the agents continuously supervise and refine each other through iterative information exchange and coordinated decision making.
  • We conduct extensive evaluations across multiple benchmarks and backbones, demonstrating consistent and significant zero-shot improvements.

🛠️ Setup

Preparation

  • Prepare MP3D Simulator: First, install Matterport3D simulators by following instructions in Matterport3DSimulator.
  • Install requirements:
    conda create --name DACo python=3.10
    conda activate DACo
    pip install -r requirements.txt
  • Prepare Data:
    • Connectivity: The connectivity data are derived from DUET, and you can download here.
    • Annotations: In our paper, we evaluate on three dataset.
      • 72 scenes on R2R: download MapGPT_72_scenes_processed.json.
      • R2R val unseen: download R2R_val_unseen_enc.json.
      • REVERIE val unseen: For quick, cost-effective testing and easier future work, we sampled a subset containing 200 instructions from the REVERIE validation unseen set, i.e. REVERIE_val_unseen_enc.json. We release our sampled subset, and it can be found at datasets/REVERIE/annotations.
      • R4R val unseen: Also, due to the constraint of cost, we sampled a subset containing 200 instructions from the R4R validation unseen set, i.e. R4R_val_unseen_enc.json, and it can be found at datasets/R4R/annotations.
    • Observation Images: The observation images need to be collected in advance from the simulator. We use the same images as MapGPT, which can be downloaded here
    • Top-down View Images: We use the "Floor Plan" part of WayDataset as the global representation. The data includes bird's-eye view (BEV) images and a JSON file mapping viewpoints to pixel coordinates. Download here.
  • Environment Variables: We present two options to run our code: api or a local LLM.
    • API: For API, you need to set your own API key by the following instruction.
      export API_KEY="xxxxx"
    • Local LLM: If you want to customize the LLM, set the base url to call your local model.
      export BASE_URL="http://..."

      A reminder: Note that we prioritize calling API. If you want to run with your model and you've set the API_KEY, just remember to unset it.

Inference

Run our code quickly with the provided script:

bash scripts/run.sh

Remember to customize the arguments before run the code:

--root_dir /path/to/datasets
--img_root /path/to/RGB_Observations/
--bev_dir /path/to/bev_images
--traj_img_dir bev_with_traj/<split_name>   # Output directory for trajectory visualizations
--split <split_name>    
--output_dir /path/to/save/output
--dataset r2r       # r2r, reverie or r4r
--max_action_len 15
--max_re_plan 1
--llm /path/to/your_model   # or a version of OPENAI model
--max_tokens 1000

📝 Due to the inherent randomness of LLMs, it is nearly impossible to reproduce the exact results reported in the paper, which are averaged over multiple runs. Minor fluctuations are normal. However, we guarantee that the performance consistently exceeds that of baseline methods.

👏 Acknowledgement

You may refer to related work that serves as foundations for our framework and code repository, MapGPT, DiscussNav, NavGPT, VLN-DUET and Matterport3D. Thanks for their wonderful works!

🗞️ Citation

@misc{jin2026globalcommanderlocaloperative,
      title={Global Commander and Local Operative: A Dual-Agent Framework for Scene Navigation}, 
      author={Kaiming Jin and Yuefan Wu and Shengqiong Wu and Bobo Li and Shuicheng Yan and Tat-Seng Chua},
      year={2026},
      eprint={2602.18941},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.18941}, 
}

About

This is the repo for 'Dual-agent Collaborative System for 3D Scene Navigation'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors