This program recognizes the user's hand gesture and performs an assigned action on a Youtube video.
The actions currently recognized are:
Fist => To play a video
Open => To pause a video
Left => Go back to the previous video
Right => Skip to the next video
Check your webcam to ensure the models are loaded (This might take a while ... Wait until you see the black dots)
The finished product can be viewed here. <--------------------------------------------------------------------------
Important! Make sure that:
- You give permission for the website to use your camera
- You ensure that no other applications are currently using your camera
Firstly, the program takes input from the user's webcam and uses the Handpose Detection model provided by TensorFlow.js to discern 21 different hand landmarks.
The landmarks corresponding with different gestures (fist, open, left, right, ok) are then recorded, labeled, and stored to a json database.
The data is then collected by a custom dataset using Python, where the coordinates are normalized and trained using Pytorch.
The model used to train the data consists of three linear layers that surround two activiation layers. It takes a tensor of shape (B by 42) as input and returns a tensor of shape (B by 5) as output (corresponding with the 5 different gestures).
The trained weights and the model are then converted into ONNX format to be used directly on React.
Below is the finished result:






