If LLM use gif as input, it will only take the first frame as real input. which will miss some key information.
If LLM use gif as input, it will only take the first frame as real input. which will miss some key information.