Skip to content

Dimon0000000/SCFDepth

Repository files navigation

SCFDepth: A Single-step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation

This repository is based on Marigold, CVPR 2024 Best Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Website License Hugging Face Model Hugging Face Demo

Haruko386, Shuai Yuan, Mingbo Lei Yibo Chen

cover

We present SCFDepth, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.

📢 News

  • 2026-04-06: Test code and Model are released.

🚀 Usage

We offer several ways to interact with SCFDepth:

  1. A free online interactive demo is available here:

  2. If you just want to see the examples, visit our gallery:

  3. Local development instructions with this codebase are given below.

🛠️ Setup

The Model was trained on:

  • Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, NVIDIA RTX 6000 Ada Generation

The inference code was tested on:

  • Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, NVIDIA GeForce RTX 4090

🪧 A Note for Windows users

We recommend running the code in WSL2:

  1. Install WSL following installation guide.
  2. Install CUDA support for WSL following installation guide.
  3. Find your drives in /mnt/<drive letter>/; check WSL FAQ for more details. Navigate to the working directory of choice.

📦 Repository

Clone the repository (requires git):

git clone https://github.com/dimon0000000/SCFDepth.git
cd SCFDepth

💻 Dependencies

Using Conda: Alternatively, create a Python native virtual environment and install dependencies into it:

conda create -n scfdepth python==3.12.9
conda activate scfdepth
pip install -r requirements.txt

Note

Keep the environment activated before running the inference script. Activate the environment again after restarting the terminal session.

🏃 Testing on your images

📷 Prepare images

  1. Use selected images under input

  2. Or place your images in a directory, for example, under input/test-image, and run the following inference command.

🎮 Run inference with paper setting

This setting corresponds to our paper. For academic comparison, please run with this setting.

python run.py \
    --checkpoint checkpoints/ApDepth \
    --ensemble_size 1 \
    --processing_res 0 \
    --input_rgb_dir input/example-1 \
    --output_dir output/example-1

You can find all results in output/example-1. Enjoy!

⚙️ Inference settings

The default settings are optimized for the best result. However, the behavior of the code can be customized:

  • Trade-offs between the accuracy and speed (for both options, larger values result in better accuracy at the cost of slower inference.)

    • --ensemble_size: Number of inference passes in the ensemble.
    • --processing_res: the processing resolution; set as 0 to process the input resolution directly. When unassigned (None), will read default setting from model config. Default: 768 None.
    • --output_processing_res: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
    • --resample_method: the resampling method used to resize images and depth predictions. This can be one of bilinear, bicubic, or nearest. Default: bilinear.
  • --half_precision or --fp16: Run with half-precision (16-bit float) to have faster speed and reduced VRAM usage, but might lead to suboptimal results.

  • --seed: Random seed can be set to ensure additional reproducibility. Default: None (unseeded). Note: forcing --batch_size 1 helps to increase reproducibility. To ensure full reproducibility, deterministic mode needs to be used.

  • --batch_size: Batch size of repeated inference. Default: 0 (best value determined automatically).

  • --color_map: Colormap used to colorize the depth prediction. Default: Spectral. Set to None to skip colored depth map generation.

  • --apple_silicon: Use Apple Silicon MPS acceleration.

🦿 Evaluation on test datasets

Install additional dependencies:

pip install -r requirements+.txt -r requirements.txt

Set data directory variable (also needed in evaluation scripts) and download evaluation datasets into corresponding subfolders:

export BASE_DATA_DIR=<YOUR_DATA_DIR>  # Set target data directory

wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/

Run inference and evaluation scripts, for example:

# Run inference
bash script/eval/11_infer_nyu.sh

# Evaluate predictions
bash script/eval/12_eval_nyu.sh

Alternatively, use the following script to evaluate all datasets.

# Evaluate all datasets
bash script/eval/00_test_all.sh

You can get the result under output/eval

Important

Although the seed has been set, the results might still be slightly different on different hardware.

✏️ Contributing

Please refer to this instruction.

🤔 Troubleshooting

Problem Solution
(Windows) Invalid DOS bash script on WSL / $'\r': command not found / set: invalid option Run dos2unix <script_name> to convert script format
(Windows) Multiple .sh scripts fail due to CRLF line endings Run find . -name "*.sh" -exec dos2unix {} + to fix all scripts
(Windows) error on WSL: Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file Run export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
HuggingFace model download incomplete / corrupted Re-run with --resume-download or ensure stable network
model_index.json not found when loading checkpoint Ensure the model is fully downloaded and placed at checkpoints/ApDepth/
Dataset loading error: tarfile.ReadError: unexpected end of data Re-download dataset; the .tar file is likely corrupted or incomplete

🎫 License

This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

By downloading and using the code and model you agree to the terms in the LICENSE.

License

About

A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors