Thank you for your excellent work! From the paper, I know that you sample 10 frames from a 2.5 FPS video. I want to know how many frames per video in the dataset you use? Is the 10 frames evenly sampled from the video or the first 10 frames in the video?