does the dataset adapt to evaluating RAG? From the paper, it is for evaluating VLM's VQA performance.
does the dataset adapt to evaluating RAG? From the paper, it is for evaluating VLM's VQA performance.