1.During training, train the rpn+roi head to output bbox2d, but do not use it during inference. Instead, use Grounding Dino's bbox2d during inference. What is the purpose of training the pn+roi head to output bbox2d?
2.During training, Grounding Dino is not involved, but during inference and evaluation, Grounding Dino's bbox2d is used. Therefore, the bbox2d output by Grounding Dino is uncontrollable and will inevitably affect the 3D results. Why not train the bbox2d output by Grounding Dino to make it more accurate?
3.When evaluation, I want to use ground truth 2d for evaluation, however, even if I add gt oracle2D information using the mergr_oracle2d_to_detection_dicts function. When inference, the code still uses the bbox2d output from Grounding Dino. How can I replace the bbox2d output from Grounding Dino with gt bbox2d during inference?
1.During training, train the rpn+roi head to output bbox2d, but do not use it during inference. Instead, use Grounding Dino's bbox2d during inference. What is the purpose of training the pn+roi head to output bbox2d?
2.During training, Grounding Dino is not involved, but during inference and evaluation, Grounding Dino's bbox2d is used. Therefore, the bbox2d output by Grounding Dino is uncontrollable and will inevitably affect the 3D results. Why not train the bbox2d output by Grounding Dino to make it more accurate?
3.When evaluation, I want to use ground truth 2d for evaluation, however, even if I add gt oracle2D information using the mergr_oracle2d_to_detection_dicts function. When inference, the code still uses the bbox2d output from Grounding Dino. How can I replace the bbox2d output from Grounding Dino with gt bbox2d during inference?