This project addresses the task of 6D object pose estimation on the LINEMOD preprocessed dataset.
Moreover, the proposed architecture is trained to detect the pose of cones in a simulator of the student autonomous driving team of Politecnico of Turin Squadra Corse driverless.
To run the code, execute the notebook.
If you want to run with Colab:
- download repo and store on Drive
- extract it
- open the notebook
- connect to Drive (set
MOUNT_DRIVEtoTrue) - follow the notebook step by step
If you want to run locally:
- you can clone the repo, but you may need to rename all the
6DPose_Estimation-maininto6DPose_Estimation
- execute all the cells of
Set up the project,Download dataset,Modify Dataset,Data Exploration,Define CustomDataset,Data Preprocessing for Object Detection Model,Visualize data - execute the inference cell (
Inference BaselineorInference Extension), it uses the test set (it may take a while to create training, validation, and test sets)
The repository is structured such that:
- checkpoints contains saved models
- data contains custom dataloader and dataset classes
- datasets contains datasets
- images contains images
- models contains model architectures, metrics
- utils contains multiple functions (init, data exploration, plot)
- notebook and associated training logic
- requirements contains necessary packages
Figure 1: Object 05. Figure 2: Object 06.
Figure 3: Object 08. Figure 4: Object 09.
Figure 5: Object 13. Figure 6: Object 14.
Overall:
| Extension | ADD Score | Accuracy |
|---|---|---|
| RGB-D | 0.0138 | 80.03% |
| YOLO + RGB-D | 0.0144 | 77.03% |
Results by object:
| Object | Ours (baseline, RGB) |
Ours (baseline pipeline, YOLO + RGB) |
Ours (extension, RGB-D) |
Ours (extension pipeline, YOLO + RGB-D) |
|---|---|---|---|---|
| ape (01) | 0.0 | 0.0 | 51.1 | 31.7 |
| bench vi. (02) | 8.8 | 1.1 | 89.6 | 84.1 |
| camera (04) | 0.0 | 1.1 | 70.0 | 65.0 |
| can (05) | 2.2 | 1.7 | 81.0 | 80.5 |
| cat (06) | 1.1 | 0.0 | 79.7 | 82.0 |
| driller (08) | 3.9 | 0.5 | 90.5 | 86.5 |
| duck (09) | 0.0 | 0.0 | 52.1 | 47.9 |
| eggbox (10) | 13.3 | 0.5 | 100.0 | 100.0 |
| glue (11) | 18.0 | 6.0 | 100.0 | 100.0 |
| hole p. (12) | 1.1 | 0.5 | 59.7 | 61.3 |
| iron (13) | 2.9 | 0.5 | 95.4 | 88.4 |
| lamp (14) | 8.2 | 1.6 | 91.9 | 92.9 |
| phone (15) | 2.7 | 2.1 | 81.5 | 83.2 |
| MEAN | 4.8 | 1.9 | 80.0 | 77.0 |
Table: Comparison of 6D pose estimation methods on the LineMOD dataset. Results are reported as accuracy (%) under the ADD(-S) metric.
Figure 1: Left camera frame. Figure 2: Right camera frame.
Figure 3: Left camera frame processed by YOLO. Figure 4: Right camera frame processed by YOLO.
Figure 5: Cropped image of left cone. Figure 6: Cropped image of right cone.
Figure 7: LiDAR pointcloud of the left cone. Figure 8: LiDAR pointcloud of the right cone.
Figure 9: 6D pose estimation of the left cone. Figure 10: 6D pose estimation of the right cone.















