Paper | Project | SyntheticPCDCompletionDataset-GColab
S2C-Completion is a dataset and annotation toolkit for establishing ground-truth correspondences between real-world 3D point cloud instances from ScanNetv2 and their matching CAD models from ShapeNet, building on top of the Scan2CAD alignment dataset.
- Overview
- Dataset Structure
- Prerequisites & Environment Setup
- Annotation Tool
- Annotation Workflow
- License
The S2C-Completion dataset augments Scan2CAD annotations with a scannet_instance_id field that links each aligned CAD model to its corresponding segmented instance in the ScanNet point cloud. This enables tasks such as 3D object completion, where a partial scan of an object must be completed using a CAD model as a reference, and object 9-DoF pose estimation.
The annotation toolkit supports two modes:
| Mode | Script | Strategy |
|---|---|---|
| Semi-automatic | annotator.py |
Overlap-based auto-matching with manual fallback |
| Fully manual | object_completion_annotator.py |
Interactive multi-candidate selection via colored visualization |
The main dataset file is S2CCompletion_annotations.json. It is a JSON dictionary keyed by ScanNet scene ID (e.g., "scene0000_00").
S2CCompletion_annotations.json
└── scene0000_00
├── id_scan # ScanNet scene ID
├── id_alignment # Scan2CAD alignment ID
├── trs # Scene-level S2C transformation
│ ├── translation # [x, y, z]
│ ├── rotation # [qw, qx, qy, qz] quaternion
│ └── scale # [sx, sy, sz]
├── n_aligned_models # Number of CAD models in this scene
└── aligned_models # List of CAD model annotation objects
└── [0..N]
├── id_cad # ShapeNet model ID
├── catid_cad # ShapeNet category ID
├── sym # Symmetry type (e.g. "__SYM_ROTATE_UP_2")
├── trs # Model-level transformation (same format as scene trs)
├── bbox # Oriented bounding box dimensions [dx, dy, dz]
├── center # Bounding box center [x, y, z]
├── keypoints_cad # Keypoints on CAD model
│ ├── n_keypoints
│ └── position # Flattened list of 3D points [x0,y0,z0, x1,y1,z1, ...]
├── keypoints_scan # Corresponding keypoints on scan
│ ├── n_keypoints
│ └── position
└── scannet_instance_id # ← S2C-Completion key field
# >= 0 : matched ScanNet instance ID
# -1 : no match found
| Field | Type | Description |
|---|---|---|
id_scan |
string |
ScanNet scene identifier (e.g. "scene0000_00") |
id_alignment |
string |
Scan2CAD alignment identifier |
trs |
object |
Scene-to-CAD coordinate transformation |
n_aligned_models |
int |
Number of aligned CAD models in the scene |
aligned_models |
list |
Per-model annotation entries (see below) |
| Field | Type | Description |
|---|---|---|
id_cad |
string |
ShapeNet model ID |
catid_cad |
string |
ShapeNet category ID |
sym |
string |
Rotational symmetry type of the object |
trs |
object |
6-DoF transformation (translation, quaternion rotation, scale) |
bbox |
[float] |
Oriented bounding box half-extents |
center |
[float] |
Bounding box center in scene coordinates |
keypoints_cad |
object |
Sparse 3D keypoints on the CAD model |
keypoints_scan |
object |
Corresponding keypoints on the scan |
scannet_instance_id |
int |
S2C-Completion annotation: ScanNet instance index (-1 = unmatched) |
The annotation scripts require the following external datasets and Python packages.
- ScanNet v2 — RGB-D scans with instance segmentation labels
- ShapeNet — 3D CAD model repository
- Scan2CAD — CAD model alignments to ScanNet scenes (
scan2cad_v2_annotations.json)
Install the required packages via pip:
pip install open3d trimesh torch numpy tqdmThis script automatically matches CAD models to ScanNet instances using voxel-overlap scoring, with a manual vetting step for ambiguous cases.
How it works:
-
Loads Scan2CAD alignments and ShapeNet taxonomy.
-
For each unannotated scene:
- Loads the ScanNet point cloud and instance segmentation labels.
- Aligns the CAD models into scene coordinates using Scan2CAD transformations.
- For each CAD model, finds the 5 nearest ScanNet instances (by center distance) of the matching category.
- Computes voxel overlap between the CAD model bounding box and each candidate instance.
-
Applies automatic decisions based on overlap thresholds:
Overlap Action ≥ 0.7 Auto-accept — instance is matched automatically < 0.05 Auto-discard — no match assigned ( -1)0.05 – 0.7 Manual review — opens interactive Open3D window -
In the manual review window:
- W — Accept the current candidate as a match
- R — Reject and move to the next candidate
Output: Per-scene JSON files saved to obj_completion_per_scene/ with the naming pattern s2c_v2_completion_scene{XXXX}_{YY}.json.
Run:
python annotator.pyThis script provides a richer interactive annotation experience by displaying up to 6 candidate instances simultaneously, each rendered in a distinct color, allowing annotators to select the best match with a single keypress.
How it works:
- Follows the same scene loading and alignment pipeline as
annotator.py. - For each CAD model, finds the 6 nearest ScanNet instances.
- Opens an Open3D visualization window showing:
-
The CAD model bounding box in black
-
Up to 6 candidate ScanNet instances color-coded as:
Key Color Instance 1Red Candidate 1 2Green Candidate 2 3Blue Candidate 3 4Orange Candidate 4 5Teal Candidate 5 6Pink Candidate 6 R— Reject (no match)
-
Output: Same format and location as annotator.py — per-scene files in obj_completion_per_scene/.
Run:
python object_completion_annotator.pyAfter annotating scenes with either script above, run this script to merge all per-scene JSON files into the final consolidated dataset file.
Run:
python combine_scene_annos.pyIf you use this dataset or code please cite:
@inproceedings{kumar2026caoa,
title={CAOA-Completion-Assisted Object-CAD Alignment},
author={Kumar, Hiranya Garbha and Kamal, Minhas and Prabhakaran, Balakrishnan},
booktitle={Thirteenth International Conference on 3D Vision},
year={2026},
organization={IEEE}
}
This project is licensed under CC0-1.0.
See more of my research works here.