Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.
-
Updated
Dec 23, 2025 - Python
Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.
Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.
AI-powered OCR for Diablo II: Resurrected - batch-extract item tooltips from screenshots using Vision LLMs (OpenAI, Groq, OpenRouter, LM Studio/Ollama). No Tesseract or EasyOCR needed.
A feature-rich desktop GUI for Ollama with Vision, RAG, and JSON support.
PyMidscene - Midscene.js 的 Python SDK 实现 | AI 驱动的自然语言 UI 自动化,告别选择器,用中文描述即可操作。与官方缓存格式完全兼容。
Free OCR powered by LLMs using OpenRouter — extract text from images with no API costs. Works with image URLs and Base64 inputs using free vision-capable models.
Multimodal AI-powered medical assistant with LLMs, speech, and image understanding.
Free, offline OCR using local LLMs with Ollama. Convert images to text with vision-enabled models running entirely on your machine — no cloud, no API costs, full privacy.
🖼️ Extract text from images locally using Ollama's LLMs—100% free, offline, and private. No API keys or cloud costs necessary.
A FastAPI-based backend service that extracts structured information from academic marksheets (images or PDFs) using OCR and an LLM, and returns a normalized JSON response with confidence scores.
AI-powered tool that extracts structured data from bank statement images using LLaMA Vision and displays it in clean JSON and table formats. Built with Streamlit and pandas for fast, accurate financial document parsing.
A Python‑based incident detection engine that analyzes video feeds for motion, detects objects, and uses large language models (LLMs) to generate semantic descriptions of incidents. Designed for extensibility with custom detectors and processors.
Car Damage Assessment using Vision LLM
Automated data extraction from PDF receipts to Excel using Vision LLM (tested with Qwen3-VL and olmOCR 2).
🎓 Extract and validate data from academic marksheets using AI for accurate JSON output, enhancing record-keeping and analysis.
🧙♂️ Extract and organize Diablo II: Resurrected item tooltips from screenshots using AI for easy access and management of your collection.
This repository focuses on customizing the Qwen2.5-Vision model for specific tasks. It provides step-by-step guidance, scripts, and best practices for fine-tuning the model on custom datasets. Ideal for developers and researchers, it ensures optimal performance and accuracy tailored to unique use cases.
🤖 A Discord bot that scrapes daily tech comics (XKCD, MonkeyUser, Turnoff.us) and uses Vision LLMs (Llama-4 via Groq) to explain the jokes.
Multi-engine image generation filter for Open WebUI. Features automated prompt enhancement, multi-language support, and real-time Vision QC scoring. Supports A1111, ComfyUI, and OpenAI backends with integrated performance telemetry.
Add a description, image, and links to the vision-llm topic page so that developers can more easily learn about it.
To associate your repository with the vision-llm topic, visit your repo's landing page and select "manage topics."