fix: gracefully handle processor load failures for multimodal models (#82)#364
fix: gracefully handle processor load failures for multimodal models (#82)#364umran666 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request wraps the initialization of the multimodal processor in a try-except block to handle loading failures gracefully with a warning. The reviewer recommended catching specific exceptions rather than a broad Exception to prevent masking other programming errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Does merely loading a processor require torchvision? That would surprise me. |
|
Yeah, it actually does because |
|
Interesting, thanks for identifying this problem. I'll need to think about what the best fix is here though. In general, I prefer Heretic to work out of the box in all reasonable situations, so simply depending on torchvision might be the better solution. |
|
Adding torchvision to dependencies in pyproject.toml is definitely cleaner for out-of-the-box multimodal support, though the main risk is dependency resolution conflicts since torchvision releases are tightly coupled to specific torch versions (which can sometimes get messy during installation on custom environments). Let me know if you'd prefer me to update this PR to add torchvision to pyproject.toml (with or without the fallback catch block), or if we should stick to the defensive error-trapping warning. |
|
No, you don't need to do anything at this point. I'll have to think about this some more, and solving this niche issue isn't a high priority for me. |
Catches configuration/dependency load failures when initializing a multimodal model's processor, logging a warning rather than crashing the model initialization loop. This allows multimodal models to be abliterated on text-only pipelines even if optional vision dependencies (such as torchvision) are missing.