CAP-GEN-EXP

Pipeline combining the usage of BLIP ViT(https://huggingface.co/Salesforce/blip-image-captioning-large) ,fine-tuned version of SmolLM2 360M (https://huggingface.co/HuggingFaceTB/SmolLM2-360M) and Stable-diffusion-v1-4 (https://huggingface.co/CompVis/stable-diffusion-v1-4) for the purpose of image captioning , porviding grad-CAM overlay, self-attention and generating new images based on the extracted caption . The app focuses on forwarding generated captions into the SmolLM2 in order to explain the word/object in the image in more detail as weel as forwarding into stable diffisuion to generate new images and includeing XAI (grad-cam, self-attention) for the whole process. Finet notebook provides a simple workflow for fine-tuning the models along with integrated wandb logger.

App can be tested at the following link : https://huggingface.co/spaces/Fine-Tuning-DLSE-Smol2/dlasw-pipeline-deploy?logs=container. Or by running inisde a Docker container locally:

docker run -it -p 7860:7860 --gpus all --platform=linux/amd64  registry.hf.space/fine-tuning-dlse-smol2-dlasw-pipeline-deploy:latest python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
sd		sd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAP-GEN-EXP

USER INTERFACE

WORKFLOW BLOCK DIAGRAM

FINE-TUNE

ADDITIONAL EXAMPLES

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mrj65/CAP-GEN-EXP

Folders and files

Latest commit

History

Repository files navigation

CAP-GEN-EXP

USER INTERFACE

WORKFLOW BLOCK DIAGRAM

FINE-TUNE

ADDITIONAL EXAMPLES

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages