AMCE (Automated Math Content Extractor) is an intelligent web application that automatically extracts Persian text and mathematical formulas from PDF documents using AI-powered vision models. It converts PDFs into structured, editable content with perfect LaTeX rendering.
The backbone of this project follows a sophisticated pipeline:
📄 PDF Document
↓
🖼️ Image Conversion (pdf-to-img)
↓
🤖 AI Vision Processing (OpenRouter/GPT-4o/Claude)
↓
📝 Structured JSON Output (Persian Text + LaTeX)
↓
💾 MongoDB Storage
↓
✨ Beautiful RTL Rendering (React + KaTeX)
- PDF → Image: Converts each PDF page into high-resolution images (2x scale) to preserve visual context, mathematical notation, and Persian typography
- Image → AI Processing: Vision models (GPT-4o/Claude) analyze images with superior accuracy for:
- Persian/Arabic text recognition
- Mathematical formula detection
- Layout understanding
- Mixed content (text + math) parsing
- AI → LaTeX: Extracts content into structured JSON with:
- Double-escaped LaTeX (
\\fracfor proper JSON encoding) - Markdown formatting preservation
- Multiple-choice question parsing
- Difficulty estimation
- Double-escaped LaTeX (
- Persian/Farsi Support: Full RTL text rendering with Vazirmatn font
- LaTeX Math Rendering: Perfect formula display using KaTeX
- Mixed Content: Seamlessly handles Persian prose with embedded math formulas
- High Accuracy: >95% transcription accuracy for both text and formulas
- Multiple-Choice Parsing: Automatically separates question stems from answer options
- JSON Schema: Validated output using Zod for type safety
- MongoDB Storage: Persistent, searchable database of extracted content
- RTL Layout: Native right-to-left support for Persian content
- Live Editing: Inline editing of extracted problems and options
- Beautiful Design: Modern gradient-based UI with glassmorphism effects
- Responsive: Works perfectly on desktop and mobile devices
- Type-Safe: Full TypeScript implementation
- API Routes: RESTful API for document management
- Error Handling: Graceful fallbacks and user-friendly error messages
- Node.js 20+ or 22+
- MongoDB Atlas account (or local MongoDB)
- OpenRouter API key (Get one here)
-
Clone the repository
git clone https://github.com/yourusername/persian-math-ocr.git cd persian-math-ocr -
Install dependencies
npm install
-
Configure environment variables
Create
.env.localfile:MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/database OPENROUTER_API_KEY=sk-or-v1-your-api-key-here OPENROUTER_MODEL=openai/gpt-4o NEXT_PUBLIC_APP_URL=http://localhost:3000
Recommended Models:
openai/gpt-4o- Best for vision/OCR accuracyanthropic/claude-3.5-sonnet- Excellent for structured outputgoogle/gemini-pro-1.5- Good balance of speed and accuracy
-
Run development server
npm run dev
-
Open your browser
Navigate to http://localhost:3000
- Upload PDF: Drag and drop or select a PDF file containing Persian math problems
- Image Conversion: Each page is converted to a high-resolution JPEG image (2x scale)
- AI Processing: Images are sent to OpenRouter's vision API with a specialized prompt
- Extraction: The AI extracts:
- Document title
- Problem statements (Persian text + LaTeX formulas)
- Multiple-choice options (if present)
- Difficulty estimation
- Validation: Output is validated using Zod schema
- Storage: Structured data is saved to MongoDB
- Rendering: Beautiful RTL interface displays content with proper math rendering
{
"document_title": "بانک تست فیزیک ۱۰",
"problems": [
{
"id": 1,
"content": "درون کرهای آهنی به چگالی $\\frac{8}{cm^3}$ حفرهای به حجم $2000cm^3$ وجود دارد...",
"options": [
{ "label": "A", "text": "پاسخ اول" },
{ "label": "B", "text": "پاسخ دوم" }
],
"difficulty_estimation": "Medium"
}
]
}| Category | Technology |
|---|---|
| Framework | Next.js 15 (App Router) |
| Language | TypeScript |
| Database | MongoDB + Mongoose |
| Styling | Tailwind CSS v4 |
| AI/OCR | OpenRouter API (GPT-4o, Claude, Gemini) |
| PDF Processing | pdf-to-img |
| Math Rendering | KaTeX + react-markdown + rehype-katex |
| Validation | Zod |
| UI Components | Lucide React Icons |
| Notifications | Sonner |
persian-math-ocr/
├── src/
│ ├── app/
│ │ ├── api/
│ │ │ ├── upload/ # PDF upload endpoint
│ │ │ └── documents/ # Document CRUD endpoints
│ │ ├── documents/[id]/ # Document detail view
│ │ ├── layout.tsx # Root layout with RTL support
│ │ ├── page.tsx # Dashboard
│ │ └── globals.css # Tailwind v4 styles
│ ├── components/
│ │ ├── FileUpload.tsx # Drag-and-drop upload
│ │ └── ProblemCard.tsx # Problem display & editing
│ ├── lib/
│ │ ├── pdf-processor.ts # PDF → Image conversion
│ │ ├── llm.ts # OpenRouter AI integration
│ │ ├── validation.ts # Zod schemas
│ │ └── db.ts # MongoDB connection
│ ├── models/
│ │ └── Document.ts # Mongoose schema
│ └── types/
│ └── index.ts # TypeScript types
├── public/ # Static assets
├── .env.local # Environment variables (not in git)
└── README.md # This file
- RTL Layout: Automatic right-to-left text direction
- Vazirmatn Font: Beautiful Persian typography
- UTF-8 Encoding: Perfect character preservation
- BiDi Handling: Math formulas remain LTR within RTL text
- Inline Math:
$x^2 + y^2 = z^2$ - Block Math:
$$\int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$ - Double Escaping: Automatic
\\frachandling for JSON - Error Boundaries: Graceful fallback for malformed LaTeX
- Inline Editing: Click edit button to modify content
- Live Preview: See changes in real-time
- Option Management: Add/edit/remove multiple-choice options
- Save to Database: Persistent storage of edits
Upload and process a PDF file.
Request:
Content-Type: multipart/form-datafile: PDF file
Response:
{
"_id": "...",
"document_title": "...",
"problems": [...],
"createdAt": "...",
"updatedAt": "..."
}List all processed documents.
Get a specific document with all problems.
Update document (e.g., edited problems).
- Educational Institutions: Convert Persian math textbooks to digital format
- Content Management: Build searchable question banks
- E-Learning Platforms: Import problems from PDF worksheets
- Research: Extract mathematical content for analysis
- Publishers: Digitize legacy Persian math publications
| Variable | Description | Required |
|---|---|---|
MONGODB_URI |
MongoDB connection string | ✅ Yes |
OPENROUTER_API_KEY |
OpenRouter API key | ✅ Yes |
OPENROUTER_MODEL |
Model to use (default: openai/gpt-4o) |
❌ No |
NEXT_PUBLIC_APP_URL |
App URL for OpenRouter headers | ❌ No |
If Tailwind styles aren't appearing, ensure you're using Tailwind CSS v4 syntax:
@import "tailwindcss";- Ensure
pdf-to-imgdependencies are installed - Check that PDF files are not corrupted
- Verify OpenRouter API key is valid
- Check your connection string format
- Ensure IP whitelist includes your server IP
- Verify database credentials
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenRouter for providing access to multiple AI models
- Mozilla PDF.js for PDF processing capabilities
- KaTeX for beautiful math rendering
- Next.js team for the amazing framework
For questions, issues, or suggestions, please open an issue on GitHub.
Made with ❤️ for the Persian-speaking educational community
⭐ Star this repo if you find it useful!