This is a Dockerized Streamlit app that allows users to upload a PDF, extract text using PyMuPDF or Unstructured, and download the extracted text.
- 📂 Upload a PDF file
- 📝 Extract text using:
- PyMuPDF (
pymupdf) - Unstructured library (
unstructured) ---> Extract Table (new feature)
- PyMuPDF (
- 📥 Download extracted text as a
.txtfile - 🌐 Streamlit-based web interface
- 🐳 Containerized using Docker for easy deployment
Your friend can pull and run the Docker container with the following steps:
docker pull riokomoo12356/ocr_app:v2.2docker run -p 8501:8501 riokomoo12356/ocr_app:v.2Open http://localhost:8501 in a web browser.
📜 Dependencies
This app uses:
• streamlit
• pymupdf (PyMuPDF)
• unstructured
These dependencies are installed inside the Docker container.
Use this link to access the web version of the app: https://pdfextracto.streamlit.app/