Skip to content

Simple text extraction dockerized app that uses Unstructred and PyMuPDF libraries.

License

Notifications You must be signed in to change notification settings

rudro12356/OCR-App

Repository files navigation

📝 PDF Text Extractor - Streamlit OCR App

This is a Dockerized Streamlit app that allows users to upload a PDF, extract text using PyMuPDF or Unstructured, and download the extracted text.

🚀 Features

  • 📂 Upload a PDF file
  • 📝 Extract text using:
    • PyMuPDF (pymupdf)
    • Unstructured library (unstructured) ---> Extract Table (new feature)
  • 📥 Download extracted text as a .txt file
  • 🌐 Streamlit-based web interface
  • 🐳 Containerized using Docker for easy deployment

📦 Pull and Run the Docker Image

Your friend can pull and run the Docker container with the following steps:

1️⃣ Pull the Docker Image

docker pull riokomoo12356/ocr_app:v2.2

2️⃣ Run the Docker Container

docker run -p 8501:8501 riokomoo12356/ocr_app:v.2

3️⃣ Access the App

Open http://localhost:8501 in a web browser.

📜 Dependencies

This app uses:
• streamlit
• pymupdf (PyMuPDF)
• unstructured

These dependencies are installed inside the Docker container.

Cloud Access

Use this link to access the web version of the app: https://pdfextracto.streamlit.app/

About

Simple text extraction dockerized app that uses Unstructred and PyMuPDF libraries.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published