-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I tried submitting a json file, which follows the specified format, and I obtained the quantitative result as follows.
{"L_MDisp": 211.9670281732144, "R_MDisp": 276.70152097706693, "L_CDisp": 207.87675657595398, "R_CDisp": 271.26115030262525, "Total": 967.8064560288606}
However, even though the results that we tested in the validation dataset were better than the baseline, the results obtained from the actual submissions have a huge amount of errors.
This is probably because the mask is not multiplied by the prediction we submit. The mask is used so that the error is zero on frames in which hand is not visible.
To demonstrate that the quantitative results presented above are anomalous, here are a prediction list, which is a part of my submission.json file, and its visualization result.
As you can see these figures, the quantitative results obtained from the actual submission seem to be incorrect, and the reason for this is thought to be that the loss is calculated without multiplying the predictions by the masks.
@VJWQ
Could you please confirm that the loss calculation is done correctly? In particular, I would appreciate it if you could check if the process is done to set the error to zero if the hands are not in frames.
"2152_3837": [120.74557495117188, 84.73670959472656, 235.1125030517578, 93.4263687133789, 118.04257202148438, 86.06185150146484, 230.081787109375, 91.89846801757812, 125.53624725341797, 88.14488220214844, 230.46359252929688, 94.43958282470703, 122.34292602539062, 88.79545593261719, 225.5665740966797, 91.5564193725586, 122.0747299194336, 94.3060531616211, 217.82423400878906, 99.28343963623047]

Figure1 pre_45 frame

Figure2 pre_30 frame

Figure3 pre_15 frame

Figure4 pre_frame

Figure5 contact_frame