This is a Flask application that matches resumes (PDF/Text) to job descriptions using Mistral AI.
-
Install Dependencies:
cd app python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Configure API Key: Edit
app/.envand add your Mistral API Key:MISTRAL_API_KEY=your_actual_key_here -
Data: Ensure
job_descriptions.csvis in the project root (../job_descriptions.csvrelative toapp/).
cd app
python app.pyYou can override the CSV and embeddings binary used by the app at runtime:
# Example: use a trimmed CSV and small embeddings
.venv/bin/python src/app.py --csv data/job_descriptions_small.csv --embeddings data/job_embeddings_small.bin --port 5001The app will print startup messages showing which CSV and embeddings were loaded.
If you have data/job-description-dataset.zip (the original large archive), you can extract just the first 50 rows into data/job_descriptions_small.csv for faster local testing:
cd project_root
.venv/bin/python src/extract_small_csv.pyThen in src/app.py, change the CSV path to data/job_descriptions_small.csv if you want to run with the trimmed dataset. Or you can replace the big file with the small one (be careful to backup first).
- Open
http://localhost:5000in your browser. - Upload a resume (PDF or Text).
- The app will:
- OCR the resume (if PDF) using Mistral OCR.
- Extract structured data (JSON) using Mistral Large.
- Generate embeddings for the resume.
- Match against the top 20 jobs from the CSV (limit set for demo).
- Return the parsed resume and top matches.
See docs/TESTING.md for a step-by-step guide (commands and example files) that reproduces the E2E testing we used to validate API endpoints, embeddings, and the matching logic. The doc includes fallback options if you hit Mistral rate limits and instructions for saving unique test artifacts under data/.
- Parsing: Mistral OCR + Mistral Large (Chat Completion) for JSON extraction.
- Matching: Mistral Embeddings + Cosine Similarity.
- POST /match
- Upload a resume file (PDF or text) as
file. Returnsresume_analysisandtop_matcheswheretop_matchesis a list of{index, score}referencing rows indata/job_descriptions.csv.
- Upload a resume file (PDF or text) as
- GET /jobs/
- Returns a single job by its index (zero-based by CSV order).
These are the datasets and Mistral API docs referenced in the original project prompt and used while building/testing the system.
- Job listings dataset (Kaggle): https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset/data
- Resumes dataset (Hugging Face): https://huggingface.co/datasets/datasetmaster/resumes
- Mistral Document Annotations API (Document AI): https://docs.mistral.ai/capabilities/document_ai/annotations
- Mistral Text Embeddings API: https://docs.mistral.ai/capabilities/embeddings/text_embeddings
Please consult these links for dataset schema details and Mistral API usage when generating embeddings and parsing documents.