Official Repository for the paper: Global Commander and Local Operative: A Dual-Agent Framework for Scene Navigation
Kaiming Jin, Yuefan Wu, Shengqiong Wu*, Hao Fei, Bobo Li, Shuicheng Yan and Tat-seng Chua. (*Correspondence)
National University, Simon Fraser University, University of Oxford
- [2026-02-24] 🎉 We have released the initial version of our paper!
- [2026-02-21] 🎉 We have released our code and dataset used in the paper!
- We propose DACo, a novel role-specialized dual-agent architecture that structurally decomposes global planning and local execution for LVLM-based navigation. The overview framework is shown in the figure below:

- We introduce dynamic planning and adaptive replanning mechanisms that enhance stability and interpretability in long-horizon navigation. As illustrated in the following figure, Rather than operating in isolation, the agents continuously supervise and refine each other through iterative information exchange
and coordinated decision making.

- We conduct extensive evaluations across multiple benchmarks and backbones, demonstrating consistent and significant zero-shot improvements.
- Prepare MP3D Simulator: First, install Matterport3D simulators by following instructions in Matterport3DSimulator.
- Install requirements:
conda create --name DACo python=3.10 conda activate DACo pip install -r requirements.txt
- Prepare Data:
- Connectivity: The connectivity data are derived from DUET, and you can download here.
- Annotations: In our paper, we evaluate on three dataset.
- 72 scenes on R2R: download MapGPT_72_scenes_processed.json.
- R2R val unseen: download R2R_val_unseen_enc.json.
- REVERIE val unseen: For quick, cost-effective testing and easier future work, we sampled a subset containing 200 instructions from the REVERIE validation unseen set, i.e.
REVERIE_val_unseen_enc.json. We release our sampled subset, and it can be found atdatasets/REVERIE/annotations. - R4R val unseen: Also, due to the constraint of cost, we sampled a subset containing 200 instructions from the R4R validation unseen set, i.e.
R4R_val_unseen_enc.json, and it can be found atdatasets/R4R/annotations.
- Observation Images: The observation images need to be collected in advance from the simulator. We use the same images as MapGPT, which can be downloaded here
- Top-down View Images: We use the "Floor Plan" part of WayDataset as the global representation. The data includes bird's-eye view (BEV) images and a JSON file mapping viewpoints to pixel coordinates. Download here.
- Environment Variables: We present two options to run our code: api or a local LLM.
- API: For API, you need to set your own API key by the following instruction.
export API_KEY="xxxxx"
- Local LLM: If you want to customize the LLM, set the base url to call your local model.
export BASE_URL="http://..."
A reminder: Note that we prioritize calling API. If you want to run with your model and you've set the API_KEY, just remember to unset it.
- API: For API, you need to set your own API key by the following instruction.
Run our code quickly with the provided script:
bash scripts/run.shRemember to customize the arguments before run the code:
--root_dir /path/to/datasets
--img_root /path/to/RGB_Observations/
--bev_dir /path/to/bev_images
--traj_img_dir bev_with_traj/<split_name> # Output directory for trajectory visualizations
--split <split_name>
--output_dir /path/to/save/output
--dataset r2r # r2r, reverie or r4r
--max_action_len 15
--max_re_plan 1
--llm /path/to/your_model # or a version of OPENAI model
--max_tokens 1000📝 Due to the inherent randomness of LLMs, it is nearly impossible to reproduce the exact results reported in the paper, which are averaged over multiple runs. Minor fluctuations are normal. However, we guarantee that the performance consistently exceeds that of baseline methods.
You may refer to related work that serves as foundations for our framework and code repository, MapGPT, DiscussNav, NavGPT, VLN-DUET and Matterport3D. Thanks for their wonderful works!
@misc{jin2026globalcommanderlocaloperative,
title={Global Commander and Local Operative: A Dual-Agent Framework for Scene Navigation},
author={Kaiming Jin and Yuefan Wu and Shengqiong Wu and Bobo Li and Shuicheng Yan and Tat-Seng Chua},
year={2026},
eprint={2602.18941},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.18941},
}