Hanzhi Chang*1,
Ruijie Zhu*1,2,
Wenjie Chang1,
Mulin Yu2,
Yanzhe Liang1,
Jiahao Lu1,
Zhuoyuan Li1,
Tianzhu Zhang1
1 USTC
2 Shanghai AI Lab
AAAI 2026
Overview of MeshSplat. Taken a pair of images as input, MeshSplat begins with a multi-view backbone to extract feature maps for each view. After that, we construct per-view cost volumes via the plane-sweeping method. We use these cost volumes to generate coarse depth maps in order to get 3D point clouds and apply our proposed Weighted Chamfer Distance Loss. Then we feed cost volumes and feature maps into our gaussian prediction network, which consist of a depth refinement network and a normal prediction network, to obtain pixel-aligned 2DGS. Finally we can apply novel view synthesis and reconstruct the scene mesh using these 2DGS.
We use Re10K dataset that were split into ~100 MB chunks by authors of pixelSplat to train and evaluate MeshSplat. The preprocessed version can be found here. Since Re10K does not have ground-truth point clouds or meshes, we use the dense reconstruction process of COLMAP to generate ground-truth point clouds of 20 scenes, which can be found here.
We organize the datasets as follows:
├── data
│ | re10k
│ ├── train
│ ├── ...
│ | re10k_pc
│ ├── xxx.ply
│ ├── ...MeshSplat also provides a lightweight custom dataset interface for your own posed multi-view images. By default it expects each scene to contain an images/ folder and a cameras.npz file:
├── data
│ | custom
│ ├── train
│ │ ├── scene_000
│ │ │ ├── images
│ │ │ │ ├── 000.png
│ │ │ │ ├── 001.png
│ │ │ │ └── ...
│ │ │ └── cameras.npz
│ │ └── ...
│ ├── val # optional; falls back to test if missing
│ └── testcameras.npz must contain:
extrinsics:float32array with shape[N, 4, 4], camera-to-world matrices in OpenCV convention. If your poses are world-to-camera, invert them before saving.intrinsics:float32array with shape[N, 3, 3]. Pixel-space intrinsics are used by default and are normalized internally; setdataset.intrinsics_are_normalized=trueif they are already normalized to[0, 1].image_names(optional): string array with lengthN. If omitted, images are loaded by sorted filename fromimages/.
A minimal conversion script can save one scene like this:
import numpy as np
# c2w: [N, 4, 4] camera-to-world matrices
# K: [N, 3, 3] pixel-space intrinsics
# image_names: e.g. ["000.png", "001.png", ...]
np.savez(
"data/custom/train/scene_000/cameras.npz",
extrinsics=c2w.astype("float32"),
intrinsics=K.astype("float32"),
image_names=np.array(image_names),
)Images in the same scene should have the same resolution and should be no smaller than dataset.image_shape. The custom loader currently fills mask and depth with ones, which is enough for image reconstruction and mesh export. Tune dataset.near and dataset.far to match your scene scale.
Train on a custom dataset:
python -m src.main \
dataset=custom \
dataset.root=data/custom \
data_loader.train.batch_size=4 \
wandb.mode=disabled \
hydra.run.dir=outputs/meshsplat_custom_trainingEvaluate a checkpoint with fixed context and target views:
python -m src.main \
dataset=custom \
mode=test \
checkpointing.load=checkpoints/meshsplat.ckpt \
'dataset.view_sampler.context_views=[0,1]' \
'dataset.view_sampler.target_views=[2]' \
test.output_path=outputs/meshsplat_custom_test \
hydra.run.dir=outputs/meshsplat_custom_test \
wandb.mode=disabled- Clone this repo:
git clone https://github.com/HanzhiChang/MeshSplat.git- Install Pytorch:
conda create -n mvsplat python=3.10
conda activate mvsplat
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia- Install other dependencies:
pip install -r requirements.txt
cd src/submodules
git clone https://github.com/hbb1/diff-surfel-rasterization.git
pip install ./diff-surfel-rasterizationYou can download our pretrained checkpoints here.
bash train.shThe results can be found in outputs/meshsplat_training. You can also use wandb by changing wandb.mode to online or offline in the scripts. (Make sure you have changed wandb settings in config/main.yaml)
bash test.shIf you find our work useful, please cite:
@article{chang2025meshsplat,
title={MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting},
author={Hanzhi Chang and Ruijie Zhu and Wenjie Chang and Mulin Yu and Yanzhe Liang and Jiahao Lu and Zhuoyuan Li and Tianzhu Zhang},
journal={arXiv preprint arXiv:2508.17811},
year={2025}
}Our code is based on MVSplat and 2DGS. We thank the authors for their excellent work!
