Skip to content

a-meraji/persian-math-ocr

Repository files navigation

Persian Math OCR - Automated Math Content Extractor (AMCE)

استخراج هوشمند محتوای ریاضی فارسی از فایل‌های PDF

Next.js TypeScript MongoDB OpenRouter


🎯 What is AMCE?

AMCE (Automated Math Content Extractor) is an intelligent web application that automatically extracts Persian text and mathematical formulas from PDF documents using AI-powered vision models. It converts PDFs into structured, editable content with perfect LaTeX rendering.

Core Workflow: PDF → Image → LaTeX

The backbone of this project follows a sophisticated pipeline:

📄 PDF Document
    ↓
🖼️  Image Conversion (pdf-to-img)
    ↓
🤖 AI Vision Processing (OpenRouter/GPT-4o/Claude)
    ↓
📝 Structured JSON Output (Persian Text + LaTeX)
    ↓
💾 MongoDB Storage
    ↓
✨ Beautiful RTL Rendering (React + KaTeX)

Why This Approach?

  1. PDF → Image: Converts each PDF page into high-resolution images (2x scale) to preserve visual context, mathematical notation, and Persian typography
  2. Image → AI Processing: Vision models (GPT-4o/Claude) analyze images with superior accuracy for:
    • Persian/Arabic text recognition
    • Mathematical formula detection
    • Layout understanding
    • Mixed content (text + math) parsing
  3. AI → LaTeX: Extracts content into structured JSON with:
    • Double-escaped LaTeX (\\frac for proper JSON encoding)
    • Markdown formatting preservation
    • Multiple-choice question parsing
    • Difficulty estimation

✨ Key Features

🎨 Intelligent Extraction

  • Persian/Farsi Support: Full RTL text rendering with Vazirmatn font
  • LaTeX Math Rendering: Perfect formula display using KaTeX
  • Mixed Content: Seamlessly handles Persian prose with embedded math formulas
  • High Accuracy: >95% transcription accuracy for both text and formulas

📊 Structured Data Output

  • Multiple-Choice Parsing: Automatically separates question stems from answer options
  • JSON Schema: Validated output using Zod for type safety
  • MongoDB Storage: Persistent, searchable database of extracted content

🖥️ Modern UI/UX

  • RTL Layout: Native right-to-left support for Persian content
  • Live Editing: Inline editing of extracted problems and options
  • Beautiful Design: Modern gradient-based UI with glassmorphism effects
  • Responsive: Works perfectly on desktop and mobile devices

🔧 Developer-Friendly

  • Type-Safe: Full TypeScript implementation
  • API Routes: RESTful API for document management
  • Error Handling: Graceful fallbacks and user-friendly error messages

🚀 Quick Start

Prerequisites

  • Node.js 20+ or 22+
  • MongoDB Atlas account (or local MongoDB)
  • OpenRouter API key (Get one here)

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/persian-math-ocr.git
    cd persian-math-ocr
  2. Install dependencies

    npm install
  3. Configure environment variables

    Create .env.local file:

    MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/database
    OPENROUTER_API_KEY=sk-or-v1-your-api-key-here
    OPENROUTER_MODEL=openai/gpt-4o
    NEXT_PUBLIC_APP_URL=http://localhost:3000

    Recommended Models:

    • openai/gpt-4o - Best for vision/OCR accuracy
    • anthropic/claude-3.5-sonnet - Excellent for structured output
    • google/gemini-pro-1.5 - Good balance of speed and accuracy
  4. Run development server

    npm run dev
  5. Open your browser

    Navigate to http://localhost:3000


📖 How It Works

Step-by-Step Process

  1. Upload PDF: Drag and drop or select a PDF file containing Persian math problems
  2. Image Conversion: Each page is converted to a high-resolution JPEG image (2x scale)
  3. AI Processing: Images are sent to OpenRouter's vision API with a specialized prompt
  4. Extraction: The AI extracts:
    • Document title
    • Problem statements (Persian text + LaTeX formulas)
    • Multiple-choice options (if present)
    • Difficulty estimation
  5. Validation: Output is validated using Zod schema
  6. Storage: Structured data is saved to MongoDB
  7. Rendering: Beautiful RTL interface displays content with proper math rendering

Example Output Structure

{
  "document_title": "بانک تست فیزیک ۱۰",
  "problems": [
    {
      "id": 1,
      "content": "درون کره‌ای آهنی به چگالی $\\frac{8}{cm^3}$ حفره‌ای به حجم $2000cm^3$ وجود دارد...",
      "options": [
        { "label": "A", "text": "پاسخ اول" },
        { "label": "B", "text": "پاسخ دوم" }
      ],
      "difficulty_estimation": "Medium"
    }
  ]
}

🛠️ Tech Stack

Category Technology
Framework Next.js 15 (App Router)
Language TypeScript
Database MongoDB + Mongoose
Styling Tailwind CSS v4
AI/OCR OpenRouter API (GPT-4o, Claude, Gemini)
PDF Processing pdf-to-img
Math Rendering KaTeX + react-markdown + rehype-katex
Validation Zod
UI Components Lucide React Icons
Notifications Sonner

📁 Project Structure

persian-math-ocr/
├── src/
│   ├── app/
│   │   ├── api/
│   │   │   ├── upload/          # PDF upload endpoint
│   │   │   └── documents/       # Document CRUD endpoints
│   │   ├── documents/[id]/      # Document detail view
│   │   ├── layout.tsx           # Root layout with RTL support
│   │   ├── page.tsx             # Dashboard
│   │   └── globals.css          # Tailwind v4 styles
│   ├── components/
│   │   ├── FileUpload.tsx       # Drag-and-drop upload
│   │   └── ProblemCard.tsx      # Problem display & editing
│   ├── lib/
│   │   ├── pdf-processor.ts     # PDF → Image conversion
│   │   ├── llm.ts              # OpenRouter AI integration
│   │   ├── validation.ts       # Zod schemas
│   │   └── db.ts               # MongoDB connection
│   ├── models/
│   │   └── Document.ts         # Mongoose schema
│   └── types/
│       └── index.ts            # TypeScript types
├── public/                      # Static assets
├── .env.local                  # Environment variables (not in git)
└── README.md                   # This file

🎨 Features in Detail

Persian Text Support

  • RTL Layout: Automatic right-to-left text direction
  • Vazirmatn Font: Beautiful Persian typography
  • UTF-8 Encoding: Perfect character preservation
  • BiDi Handling: Math formulas remain LTR within RTL text

LaTeX Math Rendering

  • Inline Math: $x^2 + y^2 = z^2$
  • Block Math: $$\int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$
  • Double Escaping: Automatic \\frac handling for JSON
  • Error Boundaries: Graceful fallback for malformed LaTeX

Editing Interface

  • Inline Editing: Click edit button to modify content
  • Live Preview: See changes in real-time
  • Option Management: Add/edit/remove multiple-choice options
  • Save to Database: Persistent storage of edits

🔧 API Endpoints

POST /api/upload

Upload and process a PDF file.

Request:

  • Content-Type: multipart/form-data
  • file: PDF file

Response:

{
  "_id": "...",
  "document_title": "...",
  "problems": [...],
  "createdAt": "...",
  "updatedAt": "..."
}

GET /api/documents

List all processed documents.

GET /api/documents/[id]

Get a specific document with all problems.

PUT /api/documents/[id]

Update document (e.g., edited problems).


🎯 Use Cases

  • Educational Institutions: Convert Persian math textbooks to digital format
  • Content Management: Build searchable question banks
  • E-Learning Platforms: Import problems from PDF worksheets
  • Research: Extract mathematical content for analysis
  • Publishers: Digitize legacy Persian math publications

📝 Environment Variables

Variable Description Required
MONGODB_URI MongoDB connection string ✅ Yes
OPENROUTER_API_KEY OpenRouter API key ✅ Yes
OPENROUTER_MODEL Model to use (default: openai/gpt-4o) ❌ No
NEXT_PUBLIC_APP_URL App URL for OpenRouter headers ❌ No

🐛 Troubleshooting

Styles Not Loading

If Tailwind styles aren't appearing, ensure you're using Tailwind CSS v4 syntax:

@import "tailwindcss";

PDF Processing Errors

  • Ensure pdf-to-img dependencies are installed
  • Check that PDF files are not corrupted
  • Verify OpenRouter API key is valid

MongoDB Connection Issues

  • Check your connection string format
  • Ensure IP whitelist includes your server IP
  • Verify database credentials

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

  • OpenRouter for providing access to multiple AI models
  • Mozilla PDF.js for PDF processing capabilities
  • KaTeX for beautiful math rendering
  • Next.js team for the amazing framework

📧 Contact

For questions, issues, or suggestions, please open an issue on GitHub.


Made with ❤️ for the Persian-speaking educational community

⭐ Star this repo if you find it useful!

# persian-math-ocr

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages