In 6D pose estimation tasks, CAD models are traditionally required for training. However, obtaining CAD models for real-world objects is often impractical—what we can easily capture are images of the object instead. Our solution uses NeRF (Neural Radiance Fields) to reconstruct objects and replace traditional CAD models.
To reconstruct a complete object with NeRF, we need images covering the entire object from all angles. In practice, a single image sequence typically captures only one portion of an object (e.g., the upper or lower half). This necessitates capturing at least two sequences to achieve full coverage.
The Challenge: These separate sequences exist in different reference frames. To apply NeRF reconstruction, we must register these sequences—transforming them into a unified coordinate system. Our project provides a robust solution to this registration problem, enabling the creation of complete 3D object models from multiple partial image sequences.
There are three primary approaches for image sequence registration:
- 2D-2D Correspondences: Finding poses between images using the Essential Matrix
- 2D-3D Correspondences: Finding poses between images and 3D models using PnP + RANSAC (our approach)
- 3D-3D Correspondences: Finding poses between 3D models using ICP (employed by Dreg-NeRF and NeRF2NeRF)
Given two image sequences of a textureless object from the T-LESS dataset in the BOP benchmark, we utilize the Surfemb architecture to register the sequences by estimating the 6D relative pose between them.
-
Initial Correspondence Finding: Apply Surfemb to establish 3D-2D correspondences between:
- The NeRF model reconstructed from the first sequence (3D)
- 2D images from the second sequence
-
Pose Estimation: Calculate the relative pose using PnP with RANSAC based on the correspondences
-
Verification Scheme: Select the best predicted 6D pose by:
- Comparing all predicted relative poses against ground truth
- Choosing the prediction with the smallest Chamfer distance loss
-
Pose Refinement: Since initial predictions aren't perfect, we refine them:
- Reconstruct NeRF models for both sequences
- Transform the second sequence to the first sequence's canonical frame using the predicted pose
- Apply ICP (Iterative Closest Point) to obtain the refined relative pose
-
Final Reconstruction: Merge both NeRF models using the refined pose to create a complete 3D object model
We evaluate results using the Chamfer distance metric. A pose prediction is considered correct when the error is significantly smaller than the threshold of 0.1 × object diameter.
During the initial phase, we validated our approach on the textured Ruapc dataset from the BOP Benchmark, testing on object ID 000001.
Initial Registration: Transforming the second sequence NeRF with the predicted 6D pose:
After ICP Refinement: Achieving correct registration:
Comparison with CAD Model: Chamfer distance error of 1.26 (well below the threshold of 0.1 × diameter):
After validating on textured objects, we tested our methodology on the more challenging T-LESS dataset—a textureless, symmetric object dataset from the BOP benchmark.
Why T-LESS is Challenging:
- All objects are textureless with uniform gray coloring (except structural parts)
- Objects exhibit symmetries leading to pose ambiguity
- Remains challenging for both RGB and RGBD detectors
Registration for Continuous Symmetric Object:
Registration for Discrete Symmetric Object:
Install the required packages:
pip install -r requirements.txtSetup:
- Create a folder structure:
bop/ruapc/ - Download and unzip:
- Synthetic training images: ruapc_train.zip
- Models: ruapc_models.zip
- Update the
datasetPathvariable intrainNeRF.pyto point tobop/ruapc
Training Command:
python trainNeRFFine.py --objid 1 --dataset tless --UH 1Parameters:
--objid: Object ID--UH: Upper/lower half selection0: Lower half1: Upper half
Output:
- Generated NeRF images
- Point cloud reconstruction
v1.npy: Point cloud as 3D numpy arrayv1Fine.npy: Finer NeRF model reconstruction
Note: Train separate models for upper and lower halves by changing the UH parameter.
Generate 3D corresponding coordinates for training images:
python generateCors.py --objid 2 --dataset ruapc --UH 1 --viz 0Parameters:
--viz: Visualization flag0: No visualization1: Visualize denoised point cloud (verify no noise present)
First Run (generates few.npy and negVec.npy):
python trainPose.py --objid 2 --cont FalseSecond Run (trains the pose estimator):
python trainPose.py --objid 2 --cont TrueBackground Dataset:
- Download the COCO dataset for background augmentation
- Set the COCO dataset path in the
trainPose.pyfile - A subset of COCO is sufficient for this specific use case
- More backgrounds improve generalization, but fewer backgrounds work well for segmented sequences on black backgrounds
Extract and scale features from the NeRF Feature MLP:
python genFeat.py --objid 1This generates features for the normalized NeRF point cloud, then scales them to match the actual CAD model scale.
Output files (saved in 7poseEst folder):
vert1_scaled.npy: Scaled point cloud verticesfeat1_scaled.npy: Per-point featuresnormal1_scaled.npy: Point normals
python inference.py --objid 2 --id 1285Parameters:
--id: Image ID from the training dataset
Generate pose predictions for all images:
python inference.py --objid 1Output: pred6d.json containing predicted 6D poses for all images
Select best prediction:
python verification.py --objid 1Output: ID of the best image for ICP refinement
Visualize and refine:
python ICP.py --objid 1 --bestimage <best_image_id>Replace <best_image_id> with the ID from the verification step.
Output:
- Visualization of point clouds before ICP
- Visualization of point clouds after ICP
- Comparison with CAD model
- Chamfer distance between predicted and CAD model
Final transformation:
python icp.py --dataset ruapc --objid 1This generates the final refined transformation between the two sequences.
If you use this code in your research, please cite our work:
@misc{imagesequenceregistration,
title={Image Sequence Registration for 6D Pose Estimation Labeling},
author={Your Name},
year={2024},
howpublished={\url{https://github.com/Kudo510/ImageSequenceRegistrationfor6DPoseEstimationLabeling}}
}



