This repository demonstrates how to set up an AI-powered slide retrieval system that processes PowerPoint presentations, generates detailed AI-based slide descriptions, and enables semantic search through a Flask web interface. It integrates vision-language models, sentence embeddings, and a vector database (ChromaDB) to deliver a complete Retrieval-Augmented Generation (RAG) pipeline for slides.
- Flask – Web application for interactive slide search
- Micromamba + Python – Lightweight environment management
- ChromaDB – Vector database for embedding and retrieval
- Qwen2.5-VL (Hugging Face) – Vision-language model for slide description
- Sentence Transformers (Hugging Face) – Text embeddings for semantic search
- PDF2Image + LibreOffice – PowerPoint to PNG conversion pipeline
- Gradio + IBMTheme – Custom UI design integration
Install required system packages (example for macOS):
brew install --cask libreoffice
brew install popplerFor Linux systems (Power or x86):
sudo dnf install git poppler-utils libreofficeClone the project repository:
git clone https://github.com/HenrikMader/SlidesSearcher_public.git
cd Slidesearcher_Publiccd ~
curl -Ls https://micro.mamba.pm/api/micromamba/linux-ppc64le/latest | tar -xvj bin/micromamba
eval "$(micromamba shell hook --shell bash)"
micromamba --versionmicromamba create -n rag_env_slides python=3.11
micromamba activate rag_env_slidesInstall project dependencies via pip:
pip install Flask chromadb pydantic_settings sentence_transformers pdf2image accelerate torchvision gradio tqdm transformersCheck installed packages:
pip listNavigate to the project directory:
cd ~/Slidesearcher_Public
rm -rf pipeline/dbConvert PowerPoints to image slides:
python pipeline/convert_from_pptx_to_pdf.pyGenerate AI-based slide descriptions:
python pipeline/describe_each_pdf.pyUpload the slide embeddings and descriptions to ChromaDB:
python pipeline/upload_descriptions_to_db.pyThis process will create or update your vector database in pipeline/db/.
Start the Flask web interface:
python app.pyAccess the web UI in your browser:
http://<IP_of_your_machine>:7680
Login credentials:
- Username: power
- Password: power
(Login credentials can be changed inside templates/login.html file)
To rebuild or refresh the database, re-run the ingestion scripts:
python pipeline/convert_from_pptx_to_pdf.py
python pipeline/describe_each_pdf.py
python pipeline/upload_descriptions_to_db.pyThe database automatically indexes each slide image and its AI-generated description for fast semantic retrieval.
Once the web app is running, you can:
- Enter natural language queries (e.g., “Show slides about sales trends”)
- SlideSearcher will:
- Embed your query using Sentence Transformers
- Retrieve top similar embeddings from ChromaDB
- Display matching slides and their AI-generated descriptions
Example workflow:
- Place
.pptxfiles intoFiles/PPTX_DIR/ - Run the ingestion scripts
- Start Flask:
python app.py
- Access the web app and start searching.
- index.html – Main search UI with gallery, modal preview, and download options
- login.html – Secure login interface
- main.html – Optional redirect or post-login landing page
Default port: 7680
Defines a unified Gradio-based interface theme:
- Primary color: IBM Blue
- Fonts: IBM Plex Serif & IBM Plex Mono
- Layout: Rounded cards, clear spacing, modern hierarchy
Slidesearcher/
├─ app.py # Flask web app entry point
├─ theme.py # IBM Gradio theme definitions
├─ pipeline/ # Processing and database scripts
│ ├─ db/ # ChromaDB files
│ ├─ config.py # Configuration settings
│ ├─ convert_from_pptx_to_pdf.py # Converts PPTX → PDF → PNG
│ ├─ describe_each_pdf.py # AI-based slide descriptions
│ └─ upload_descriptions_to_db.py # Uploads to ChromaDB
├─ Files/
│ ├─ PPTX_DIR/ # Input PowerPoints
│ └─ IMG_DIR/ # Output images/descriptions
├─ templates/
│ ├─ index.html
│ ├─ login.html
│ └─ main.html
└─ README.md
Files/
├─ PPTX_DIR/
│ ├─ Sales_Deck.pptx
│ └─ Training_Manual.pptx
└─ IMG_DIR/
├─ Sales_Deck/
│ ├─ slide_1.png
│ ├─ slide_1.png.desc.txt
│ └─ slide_2.png.desc.txt
└─ Training_Manual/
├─ slide_1.png
└─ slide_1.png.desc.txt
SlideSearcher retrieves models directly from the Hugging Face Hub:
- Vision Model:
Qwen/Qwen2.5-VL-3B-Instruct - Sentence Embedding Model:
all-mpnet-base-v2
You can replace these models in pipeline/config.py using your preferred Hugging Face repositories.