Skip to content

Conversation

@oscarqjh
Copy link
Contributor

  1. Added option to input video as multi image to align with models like SenseNova-SI qwen series that are trained on multi image input. See VSI-Bench_32frame results are not reproducible EASI#20.

sample run result:
image

  1. Fixed the issue in official code where post-prompt and pre-prompt are not extracted properly from lmms_eval_specific_kwargs

@oscarqjh
Copy link
Contributor Author

@PeterWangyi @kcz358

@oscarqjh oscarqjh marked this pull request as draft January 16, 2026 02:36
@oscarqjh oscarqjh marked this pull request as ready for review January 16, 2026 05:24
@oscarqjh
Copy link
Contributor Author

b792f6c added option to use interleave_visual - this is required to align evaluation result with VLMEvalKit

@PeterWangyi
Copy link
Collaborator

It seems that in the Internvl series models, we can only call the doc_to_visual function, but not the doc_to_message function.
This will cause the text-image interleave to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants