Hi, thanks for an interesting project and model!
When trying this on video sequences, I sometimes see some noisy/incorrect poses (eg for occluded arms) for a few frames and generally noisy z-coordinates that would be nice to filter temporally to make smoother and less susceptible to outliers. While processing the predicted pose, beta and translation is possible I believe a better solution would be to modify the test-time augmentation and up-weigh temporally consistent poses, which should be possible if I understood the code correctly. For that to work with the latest model I think I would need the raw pytorch model file (non-torchscript), is that possible to get?
Hi, thanks for an interesting project and model!
When trying this on video sequences, I sometimes see some noisy/incorrect poses (eg for occluded arms) for a few frames and generally noisy z-coordinates that would be nice to filter temporally to make smoother and less susceptible to outliers. While processing the predicted pose, beta and translation is possible I believe a better solution would be to modify the test-time augmentation and up-weigh temporally consistent poses, which should be possible if I understood the code correctly. For that to work with the latest model I think I would need the raw pytorch model file (non-torchscript), is that possible to get?