Thank you for your impressive work on Treevgr.
I am currently exploring the provided SFT and RL datasets and have a quick technical question regarding the bounding box (bbox) format:
- Are the bbox values (format as [x1, x2, y1, y2] ?) normalized to a [0, 1000] or [0, 1024] range, or do they represent absolute pixel coordinates from the original images?
- Or could you provide a brief code snippet or a reference to the specific script in the repository that demonstrates how these boxes are loaded and processed during training?
Thank you for your impressive work on Treevgr.
I am currently exploring the provided SFT and RL datasets and have a quick technical question regarding the bounding box (bbox) format: