A complete pipeline for converting real-world scenes into Minecraft worlds using Neural Radiance Fields (NeRF) and semantic segmentation.
The Worldcraft pipeline transforms images of real-world scenes into voxelized 3D representations suitable for Minecraft. The pipeline consists of four main stages:
- Semantic Segmentation - Process images to generate semantic masks using Mask2Former
- NeRF Training - Train a semantic NeRF model and export a point cloud
- Voxelization - Convert the point cloud into a voxel grid
- Minecraft Conversion - Transform voxels into Minecraft blocks
- NVIDIA GPU with CUDA support (required for NeRF training)
- Recommended: 16GB+ RAM, 50GB+ free disk space
- Conda (Miniconda or Anaconda)
- NVIDIA GPU drivers
- Git
git clone https://github.com/worldcraft-org/worldcraft.git
cd worldcraftThe setup script will:
- Verify conda and GPU availability
- Create the conda environment with all dependencies
- Install CUDA toolkit 11.8
- Pre-download required models (~1.5GB)
- Create necessary directories
bash setup.shThis process may take 15-30 minutes depending on your internet connection.
conda activate worldcraft- Create a scene directory with your images:
mkdir -p data/my_scene/images
# Copy your images to data/my_scene/images/Images should be:
- JPG or PNG format
- Captured from multiple viewpoints
- Overlapping coverage of the scene
- 20-100 images recommended
python orchestrate.py --input data/my_scene --output outputs --scene-name my_sceneThis will execute all four stages automatically. The pipeline will:
- Generate semantic segmentation masks
- Train a NeRF model (this takes the longest, 2-8 hours)
- Export and voxelize the point cloud
- Convert to Minecraft format
data/my_scene/
images/ # Your input images
semantics/ # Generated semantic masks (Stage 1)
panoptic_classes.json
outputs/
processed/my_scene/ # COLMAP processed data (Stage 2)
models/my_scene/ # Trained NeRF models (Stage 2)
exports/
my_scene/
point_cloud.ply # Exported point cloud (Stage 2)
voxel_grid.npz # Voxelized grid (Stage 3)
*.litematic # Minecraft file (Stage 4)
Process images to generate semantic segmentation masks:
python image-processing/process_semantics.py --scene-dir ./data/my_sceneOptions:
--scene-dir: Path to scene directory containingimages/subdirectory
Output:
semantics/- Full-resolution semantic maskssemantics_2/,semantics_4/,semantics_8/- Downscaled versionspanoptic_classes.json- Class labels and colors
Time: ~5-15 minutes for 50 images
Train a semantic NeRF model and export a point cloud:
bash semnerf/train_job.sh my_scene ./data/my_scene ./outputs ./exportsParameters:
SCENE_NAME- Name for organizing outputsDATA_DIR- Path to scene directory (with images/ and semantics/)OUTPUT_DIR- Path for processed data and modelsEXPORT_DIR- Path for exported point clouds
Steps:
- Process data with COLMAP (structure-from-motion)
- Copy semantic annotations to processed data
- Train semantic-nerfw model
- Export point cloud with 1M points
Time: 2-8 hours depending on scene complexity and GPU
Customization:
To adjust training parameters, edit the ns-train command in semnerf/train_job.sh:
ns-train semantic-nerfw \
--data "$OUTPUT_DIR/processed/$SCENE_NAME" \
--output-dir "$OUTPUT_DIR/models/$SCENE_NAME" \
--viewer.quit-on-train-completion True \
--pipeline.datamanager.pixel-sampler-num-rays-per-batch 4096 \
--max-num-iterations 30000 # Add this to train longerConvert the point cloud into a voxel grid:
python voxelize/voxelize.py exports/my_scene/point_cloud.ply exports/my_scene/voxel_grid.npz --voxel-size 0.05Options:
--voxel-size: Size of each voxel in meters (default: 0.05)- Smaller = more detail but larger file size
- Recommended range: 0.03-0.1
Alternative - Web UI:
python voxelize/voxelize.py serve
# Open http://localhost:8000/docs in your browserTime: ~1-5 minutes
Convert the voxel grid to Minecraft format:
python export/convert.py exports/my_scene/voxel_grid.npz exports/my_scene my_sceneOutput: .litematic file that can be imported into Minecraft using Litematica mod
You can run the pipeline from any stage using the --start-stage option:
# Skip semantic segmentation, start from NeRF training
python orchestrate.py --input data/my_scene --output outputs --scene-name my_scene --start-stage 2
# Only run voxelization and conversion (stages 3-4)
python orchestrate.py --input data/my_scene --output outputs --scene-name my_scene --start-stage 3# Higher detail (smaller voxels)
python orchestrate.py --input data/my_scene --output outputs --scene-name my_scene --voxel-size 0.03
# Lower detail (larger voxels, faster processing)
python orchestrate.py --input data/my_scene --output outputs --scene-name my_scene --voxel-size 0.1# Scene 1
python orchestrate.py --input data/scene1 --output outputs --scene-name scene1
# Scene 2
python orchestrate.py --input data/scene2 --output outputs --scene-name scene2Each scene maintains separate directories in outputs/ and exports/.
If you see "CUDA not available":
- Verify GPU drivers:
nvidia-smi - Check PyTorch CUDA:
conda activate worldcraft python -c "import torch; print(torch.cuda.is_available())" - If false, reinstall PyTorch with CUDA:
conda activate worldcraft pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
If training fails with OOM errors:
-
Reduce batch size in
semnerf/train_job.sh:--pipeline.datamanager.pixel-sampler-num-rays-per-batch 2048
-
Reduce image resolution by downscaling input images
-
Use fewer images (40-60 is usually sufficient)
If structure-from-motion fails:
- Ensure images have sufficient overlap
- Check that images are clear and well-lit
- Try reducing the number of images
- Ensure images are from different viewpoints
If semantic masks look incorrect:
- Check input image quality
- Verify the model downloaded correctly:
python scripts/download_models.py
- The model is trained on urban scenes - performance may vary for other environments
If point cloud is empty or sparse:
- Check that NeRF training completed successfully
- Review training logs in
outputs/models/your_scene/ - Verify semantic annotations exist in processed data
- Try training longer (increase
--max-num-iterations)
worldcraft/
├── data/ # Input data
│ └── [scene_name]/
│ ├── images/ # Input images (you provide)
│ ├── semantics/ # Generated semantic masks
│ └── panoptic_classes.json # Class definitions
│
├── outputs/ # Processing outputs
│ ├── processed/ # COLMAP processed data
│ └── models/ # Trained NeRF models
│
├── exports/ # Final exports
│ └── [scene_name]/
│ ├── point_cloud.ply
│ ├── voxel_grid.npz
│ └── *.litematic
│
├── image-processing/ # Semantic segmentation
│ └── process_semantics.py
│
├── semnerf/ # NeRF training
│ └── train_job.sh
│
├── voxelize/ # Voxelization
│ └── voxelize.py
│
├── export/ # Minecraft conversion
│ └── convert.py
│
├── scripts/ # Utility scripts
│ └── download_models.py
│
├── orchestrate.py # Main pipeline orchestrator
├── setup.sh # Setup script
├── environment.yml # Conda environment
└── README.md # This file
Key dependencies installed via environment.yml:
- Python 3.10
- PyTorch 2.1.2 with CUDA 11.8
- Nerfstudio - NeRF training framework
- tiny-cuda-nn - Fast neural network library
- Transformers - Hugging Face library for Mask2Former
- Open3D - Point cloud processing
- FastAPI - Voxelization web API
The pipeline uses:
- Mask2Former (facebook/mask2former-swin-large-mapillary-vistas-semantic)
- Pre-trained on Mapillary Vistas dataset
- 65 semantic classes
- ~1.5GB model size
The voxel grid NPZ file contains:
points: (N, 3) array of voxel center positionscolor_grid: (N, 3) array of RGB colors (uint8)occupancy_grid: (X, Y, Z) boolean array of voxel occupancy
The pipeline can also run on SLURM clusters. The semnerf/train_job.sh script includes SLURM directives that are automatically ignored when running standalone.
To submit as a SLURM job:
sbatch semnerf/train_job.sh my_scene ./data/my_scene ./outputs ./exportsSee semnerf/README.md for HPC-specific documentation.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
If you use this pipeline in your research, please cite:
@software{worldcraft2024,
title = {Worldcraft: Real-World to Minecraft Pipeline},
author = {Worldcraft Contributors},
year = {2024},
url = {https://github.com/worldcraft-org/worldcraft}
}This project is available under the MIT License. See LICENSE file for details.
- Nerfstudio - NeRF training framework
- Mask2Former - Semantic segmentation
- tiny-cuda-nn - Fast CUDA neural networks
For issues and questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Provide error messages and logs when reporting problems
See CHANGELOG.md for version history and updates.