Autonomous AI Agent for HeadHunter (API) - is a multi-stage data pipeline designed to automate the process of parsing, scoring, and analyzing large volumes of unstructured job market data. It utilizes Google Gemini (LLM) to make complex semantic decisions based on custom engineering filters.
"Automating cognitive load: from unstructured HTML to AI-driven semantic decision making."
This project is structured as a robust Data Engineering Pipeline. Each script acts as a micro-service, handling a specific state of data transformation before passing it to the Large Language Model.
- Connects to the official HeadHunter REST API.
- Executes complex search queries based on hardcoded engineering filters (role, region, experience, schedule).
- Outputs structured endpoints for deep-dive extraction.
- Performs secondary API calls to retrieve full JSON payloads for each endpoint.
- Sanitizes the data by stripping raw HTML tags from the descriptions using
BeautifulSoup4. - Outputs a clean, structured
vacancies_data.jsondataset.
- Acts as the Deterministic Filter Engine before invoking the LLM (saving token costs).
- Applies a custom weighted scoring algorithm based on targeted technical stacks (e.g., Python, Selenium, API) and "Golden Keywords".
- Applies penalty scores for irrelevant tech stacks (e.g., Java, C#) or misaligned roles.
- Outputs
ranked_vacancies.json, sorted by highest relevance probability.
- The final stage acts as an AI Agent. It takes the top-N results from the ranked dataset and feeds them into Google Gemini.
- Prompt Engineering: The LLM is provided with a strict system prompt containing the candidate's profile, "Red/Green flags" strategy, and the raw job description.
- Output: The LLM acts as an evaluator, returning a binary verdict ("Fit" / "No Fit") and generating a highly personalized, context-aware cover letter in Markdown format.
- Language: Python 3.9+
- Data Extraction:
requests,beautifulsoup4 - LLM Integration:
google-generativeai(Gemini API) - Data Serialization: JSON, Markdown
git clone https://github.com/SanMog/HeadHunter-AI-Assistant.git
cd HeadHunter-AI-Assistant
python -m venv .venv
# Windows PowerShell: .venv\Scripts\Activate.ps1
# Linux/macOS: source .venv/bin/activate
pip install -r requirements.txtNever hardcode API keys. The system expects the Google Gemini API key to be passed securely via environment variables.
# Windows PowerShell
$env:GOOGLE_API_KEY = "YOUR_GEMINI_API_KEY"
# Linux/macOS
export GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"Run the pipeline sequentially to process the data:
python step1_get_urls.py
python step2_get_details.py
python step3_analyze.py
python step4_generate_letters.pyThe scoring engine is highly decoupled and can be tuned for any engineering role:
- Weights & Penalties: Edit
SKILLS_KEYWORDS,GOLDEN_KEYWORDS, andPENALTY_KEYWORDSinstep3_analyze.pyto adjust the deterministic scoring algorithm. - LLM Persona: Modify the
PROMPT_TEMPLATEinstep4_generate_letters.pyto dictate the AI's analytical behavior and writing style.
Architect: SanMog
Domain: AI Automation / Agentic Workflows
License: MIT