Description
Currently PhantomCrowd only analyzes text content. Many marketing campaigns are image/video-based. Adding multimodal input would dramatically improve accuracy.
From the roadmap
This is listed in README roadmap as a planned feature.
Proposed approach
- Accept image uploads via the campaign creation form
- Use a vision-capable LLM (e.g., gemma4, llava) to describe the image
- Feed the description into the existing pipeline as additional context
- Display uploaded images in the campaign detail view
Technical notes
- Backend: add image upload endpoint, store in
data/uploads/
- LLM: use Ollama vision model to generate description
- Frontend: add image preview in campaign form and detail view
Difficulty
Intermediate. Requires backend + frontend + Ollama vision model integration.
Description
Currently PhantomCrowd only analyzes text content. Many marketing campaigns are image/video-based. Adding multimodal input would dramatically improve accuracy.
From the roadmap
This is listed in README roadmap as a planned feature.
Proposed approach
Technical notes
data/uploads/Difficulty
Intermediate. Requires backend + frontend + Ollama vision model integration.