#start:
-
frontend: npm run dev
-
backend: python main.py
β οΈ TEMPORARY NOTICE (2025-01-11): Airtable integration is currently disabled due to performance issues. All profiles are processed and displayed on the frontend, but NOT saved to Airtable. This is temporary until Airtable subscription is sorted. SeeCHANGES.mdSection 13 for details and revert instructions.
An automated system that profiles LinkedIn users, enriches their data, and evaluates them against criteria (default ICP or custom use cases). Supports multiple workflows: processing post reactors, manual profile input, and custom evaluation criteria.
- π From Post Reactors (ICP): Automatically fetch and process reactions from a LinkedIn post against Dograh's ICP
- βοΈ Manual Input (ICP): Process specific profiles by entering LinkedIn URLs against Dograh's ICP
- π― Custom Evaluation (NEW): Define your own evaluation criteria for any use case - works with both post reactors and manual profiles
- π€ Enrich profile data from LinkedIn (via Apify)
- π’ Gather company information with fallback mechanism
- π€ AI-powered profile and company summarization (Groq Llama 3.3 70B)
- π― ICP matching evaluation (OpenAI GPT-5 mini with high reasoning effort)
- π Store and track leads in Airtable
- π» Full-stack dashboard with real-time progress tracking
- β‘ Smart rate limiting (max 100 profiles per batch)
- π Automatic deduplication (skips existing profiles)
- π Incremental results display (see leads as they're processed, every 20 seconds)
- π Job History Sidebar: Collapsible sidebar showing your last 50 jobs
- Click the π button (top-left) to open job history
- View job status, timestamp, and input preview
- Click any job to resume tracking and view results
- Persists across browser sessions (stored in localStorage)
- Filter by workflow type (post reactors, manual input, custom evaluation)
- Clear all history with one click
- π Load Results by Job ID: Manually load results from any job
- Collapsible section below page title on all 3 pages
- Paste any job ID to view results from closed tabs or previous sessions
- Works across all workflow types (post/manual/custom)
- Continues polling if job is still processing
- Perfect for recovering lost tabs or sharing job results
- Frontend: Next.js (TypeScript) - Runs on
0.0.0.0:3000 - Backend: FastAPI (Python) - Runs on
localhost:8000 - Proxy: Next.js proxies
/api/*requests to FastAPI
linkedin-profiling/
βββ frontend/ # Next.js frontend
β βββ app/
β β βββ page.tsx # Post reactors workflow (ICP mode)
β β βββ manual-input/
β β β βββ page.tsx # Manual profile input workflow (ICP mode)
β β βββ custom-evaluation/
β β β βββ page.tsx # Custom use case evaluation (NEW)
β β βββ login/
β β β βββ page.tsx # Authentication
β β βββ layout.tsx
β βββ middleware.ts # Route protection
β βββ next.config.js # Proxy configuration
β βββ package.json
β βββ tsconfig.json
β
βββ backend/ # FastAPI backend
β βββ main.py # FastAPI server (ICP + custom endpoints)
β βββ workflow.py # Linear automation (supports both modes)
β βββ prompts.py # Centralized LLM prompts (ICP + custom)
β βββ test_components.py # Component testing script
β βββ requirements.txt
β βββ .env.example
β
βββ README.md
Before running the application, complete these setup tasks:
You need accounts and API keys for these services:
- β
Token included in
.env.example:apify_api_____________
- Sign up at https://airtable.com
- Create a new base (or use existing)
- Go to https://airtable.com/create/tokens
- Create a personal access token with
data.records:readanddata.records:writescopes - Copy your Base ID from the base URL:
https://airtable.com/YOUR_BASE_ID/... - Create a table named "Leads" (or your preferred name)
- Sign up at https://platform.openai.com
- Go to https://platform.openai.com/api-keys
- Create a new API key
- Copy the key (starts with
sk-...)
- β
Token included in
.env.example:gsk_______________
Create a table with these fields (exact names required - case-sensitive):
| Field Name | Field Type | Options |
|---|---|---|
| URN | Single line text | - |
| Name | Single line text | - |
| company_name | Single line text | - |
| company_website | URL or Single line text | - |
| Email Address | - | |
| country | Single line text | - |
| current_job_location | Single line text | - |
| Title | Long text | - |
| Profile URL | URL | - |
| icp_fit_strength | Single select | Options: High, Medium, Low |
| Reason | Long text | - |
| validation_judgement | Single select | Options: Correct, Incorrect, Unsure |
| validation_reason | Long text | - |
| profile_summary | Long text | - |
| company_summary | Long text | - |
IMPORTANT: Field names are case-sensitive. Capital letters must match exactly (URN, Name, Email Address, Title, Profile URL, Reason). The remaining fields are lowercase (company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary).
Edit backend/.env with your credentials:
cd backend
cp .env.example .env
# Edit .env and add your tokensThe application is protected with password authentication to prevent unauthorized access.
Login Credentials:
- Password is set in
backend/.env:PORTAL_PASSWORD=your-password-here - Default example password:
________(in.env.example) - No username required - just enter the password
- IMPORTANT: Change the default password before deploying or sharing
How It Works:
- All routes except
/loginare protected by middleware - Frontend sends password to backend API for validation
- Backend validates against
PORTAL_PASSWORDenvironment variable - On successful authentication, a secure HTTP-only cookie is set (7-day expiration)
- Password never stored in frontend code or committed to repository
- Backend must be running for authentication to work
Login Page:
- URL:
http://localhost:3000/login - Simple password field with validation
- Shows error message on incorrect password
- Redirects to dashboard on successful login
Technical Details:
- Password validation happens server-side (FastAPI backend)
- Middleware checks for
auth_tokencookie on every request - Cookie is HTTP-only and secure (in production)
- Frontend proxies auth requests to
localhost:8000/api/auth/login
-
Navigate to the backend directory:
cd backend -
Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create
.envfile from.env.example:cp .env.example .env
-
Fill in your API credentials in
.env:APIFY_TOKEN: Your Apify API tokenAIRTABLE_TOKEN: Your Airtable personal access tokenAIRTABLE_BASE_ID: Your Airtable base IDAIRTABLE_TABLE_NAME: Table name (default: "Leads")OPENAI_API_KEY: Your OpenAI API keyGROQ_API_KEY: Your Groq API keyPORTAL_PASSWORD: Your portal authentication password (change from default!)
-
Run the FastAPI server:
python main.py
Server will start on
http://localhost:8000
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install
-
Run the development server:
npm run dev
App will be available on
http://0.0.0.0:3000orhttp://localhost:3000
Create a table with these fields (exact names - case-sensitive):
| Field Name | Field Type | Description |
|---|---|---|
| URN | Single line text | LinkedIn URN (unique identifier) |
| Name | Single line text | Full name |
| company_name | Single line text | Company name |
| company_website | URL or Single line text | Company website URL |
| Email Address | Email address (if available) | |
| country | Single line text | Country from profile (N/A if unavailable) |
| current_job_location | Single line text | Current job location (N/A if unavailable) |
| Title | Long text | Job title/headline |
| Profile URL | URL | LinkedIn profile URL |
| icp_fit_strength | Single select | "High", "Medium", "Low" |
| Reason | Long text | Reason for ICP evaluation |
| validation_judgement | Single select | "Correct", "Incorrect", "Unsure" |
| validation_reason | Long text | Reason for validation judgement |
| profile_summary | Long text | AI-generated profile summary |
| company_summary | Long text | AI-generated company summary |
Note: Field names must match exactly. Capitalized fields: URN, Name, Email Address, Title, Profile URL, Reason. Lowercase fields: company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary.
First-time Access:
- Start both backend and frontend servers
- Open
http://localhost:3000in your browser - You'll be redirected to the login page
- Enter the password you set in
backend/.env(PORTAL_PASSWORD) - Click "Login" - you'll be redirected to the dashboard
- Start both backend and frontend servers (if not already running)
- Login if not already authenticated
- Open
http://localhost:3000in your browser (or you'll already be there after login) - Click "From Post Reactors" tab (default)
- Enter a LinkedIn post URL or ID (e.g.,
7392508631268835328) - Click "Process Post"
- View real-time progress and results in the dashboard
- Progress bar updates every 20 seconds showing X/100 profiles processed
- Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
- No need to wait - review leads while processing continues
- Start both backend and frontend servers
- Open
http://localhost:3000in your browser - Click "Manual Input" tab
- Paste LinkedIn profile URLs (one per line, max 100)
- Format:
https://linkedin.com/in/username
- Format:
- Click "Process X Profiles"
- View real-time progress and results in the dashboard
- Progress bar updates every 20 seconds showing X/100 profiles processed
- Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
- No need to wait - review leads while processing continues
- Start both backend and frontend servers
- Open
http://localhost:3000in your browser - Click "Custom Evaluation β" tab
- Define your evaluation criteria (structured form):
- Use Case Description (required): Describe what you're looking for
- Example: "Find potential customers for HR analytics software"
- Target Job Titles/Roles (optional): Comma-separated roles
- Example: "HR Director, VP People, CHRO, Head of HR"
- Target Industries/Sectors (optional): Industries to focus on
- Example: "SaaS, Fintech, Healthcare, E-commerce"
- Company Size (optional): Dropdown selection
- Options: Any, 1-10, 10-50, 50-200, 200-1000, 1000+ employees
- Additional Requirements (optional): Exclusions, examples, edge cases
- Example: "Exclude consultants and agencies. Prefer B2B companies with venture funding."
- Use Case Description (required): Describe what you're looking for
- Choose input type:
- LinkedIn Post URL/ID: Process reactions from a post
- Manual Profile URLs: Enter specific profile URLs (one per line)
- Click "Start Custom Evaluation"
- View real-time progress and results in the dashboard
- Same table layout as ICP mode (reuses "ICP Fit" column terminology)
- Profiles evaluated against YOUR criteria instead of Dograh's ICP
- Same validation workflow for quality control
Use Cases for Custom Evaluation:
- Find founders of B2B SaaS companies in specific industries
- Identify decision-makers at companies of specific sizes
- Locate professionals with specific skills/experience combinations
- Discover potential partners, investors, or collaborators
- Any custom use case beyond the default ICP
Scenario 1: You accidentally closed the browser tab
- Click the π button at the top-left of any page
- Your job history sidebar will open showing recent jobs
- Click the job you want to resume
- Results will load immediately, and polling will continue if still processing
Scenario 2: You have a job ID from a previous session
- Open any of the 3 workflow pages
- Click "π Load Results by Job ID" section to expand
- Paste your job ID (e.g.,
91abfc33-10a4-4629-84ba-d27eb2f4cf55) - Click "Load Results"
- Results will display in the same table format
Scenario 3: Sharing results with a teammate
- Copy the job ID from the blue info box (shown during processing)
- Share the job ID with your teammate
- They can paste it into the "Load Results by Job ID" section
- Results will load (requires authentication with same portal password)
Note: Job history persists in your browser's localStorage and survives page refreshes and browser restarts. However, job data on the backend is temporary and will be lost if the backend server restarts.
The frontend dashboard displays processed leads in a structured table with the following columns:
| Column | Width | Description |
|---|---|---|
| Name | 150px | Full name of the LinkedIn profile |
| Company | 150px | Company name + website URL (stacked display) |
| 180px | Email address from profile (or "Not Available") | |
| Phone | 130px | Phone number from profile (or "N/A") |
| Country | 120px | Country from profile (or "N/A") |
| Job Location | 200px | Current job location from profile (or "N/A") |
| Title | 120px | Job title/headline |
| ICP Fit | 100px | High/Medium/Low badge (color-coded) |
| ICP Reason | 500px | Detailed explanation of ICP evaluation |
| Validation | 120px | Correct/Incorrect/Unsure badge (color-coded) |
| Validation Reason | 300px | Explanation of validation assessment |
| Profile URL | 120px | Full clickable LinkedIn profile URL |
| Followers | 100px | Number of followers (right-aligned, comma-separated) |
| Connections | 100px | Number of connections (right-aligned, comma-separated) |
Key Features:
- Company column: Shows company name on first line, website URL on second line (if available)
- Website extracted from company API:
website,websiteUrl, orbasic_info.website - Clickable blue link (12px font size)
- Website extracted from company API:
- Email column: Displays email address extracted from LinkedIn profile data
- Shows "Not Available" when profile has no email
- Positioned after Company, before Phone for logical contact information grouping
- Phone column: Displays phone number extracted from profile data (
mobileNumberfield)- Shows "N/A" when profile has no phone number
- Country column: Displays country from profile (
addressCountryOnlyfield)- Shows "N/A" when country data is unavailable
- Job Location column: Displays current job location from profile (
jobLocationfield)- Shows "N/A" when location data is unavailable
- Examples: "Bengaluru, Karnataka, India", "San Francisco Bay Area"
- Profile URL column: Displays full URL as clickable text (for easy bulk copying)
- Followers column: Displays follower count from profile (
followersfield)- Right-aligned with comma separators (e.g., "11,147")
- Shows "0" when follower data is unavailable
- Connections column: Displays connection count from profile (
connectionsfield)- Right-aligned with comma separators (e.g., "7,453")
- Shows "0" when connection data is unavailable
- Optimized widths: ICP Reason (2.5x wider), Title (0.4x narrower), Validation Reason (1.5x wider) for better readability
- Color-coded badges: Green (High/Correct), Yellow (Medium/Unsure), Red (Low/Incorrect)
All automation steps are visible in backend/workflow.py:
- Fetch Post Reactions: Get reactors from the LinkedIn post
- For each reactor:
- Check Airtable: Skip profiles already in the database
- Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
- Enrich Company: Fetch company information with fallback (Apify)
- Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
- Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
- Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
- Store: Save/update record in Airtable
- Parse Profile URLs: Validate and clean provided LinkedIn profile URLs
- For each profile URL:
- Extract Profile ID: Use LinkedIn username from URL as URN (e.g., "priteshkr" from "linkedin.com/in/priteshkr/")
- Check Airtable: Skip profiles already in the database (using profile ID as URN)
- Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
- Enrich Company: Fetch company information with fallback (Apify)
- Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
- Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
- Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
- Store: Save/update record in Airtable with profile ID as URN
Note: Processing is limited to 100 profiles per batch (both workflows) to prevent API overload and avoid LinkedIn rate limits. This limit can be adjusted in backend/workflow.py by changing the MAX_REACTORS_PER_POST constant.
URN Format Difference:
- Post Reactors: Uses Apify's URN field (e.g.,
urn:li:person:123456789) - Manual Input: Uses LinkedIn profile ID from URL (e.g.,
priteshkrfromlinkedin.com/in/priteshkr/) - This ensures consistent and predictable URNs for manually added profiles
Per-Profile Timeout (180 seconds):
- Each profile has a 180-second (3-minute) timeout for all processing steps combined
- If processing exceeds the timeout, the profile is automatically skipped
- Batch processing continues immediately to the next profile (no blocking)
- Skipped profiles are tracked separately with the reason for skipping
Common Skip Reasons:
- β±οΈ "Processing exceeded 180s timeout" - Profile took too long to process
- π "Network error during profile fetch" - API connectivity issues
β οΈ "API error: [specific message]" - Upstream service errors- β "Could not fetch profile data" - Profile not accessible or invalid
Skipped Profiles Display:
- Skipped profiles appear in a separate table below successful leads
- Table shows: Profile Name, Skip Reason, Profile URL link
- Orange/yellow color scheme distinguishes skipped from successful profiles
- Allows manual review of profiles that need attention
Why 180 seconds?
- Profile processing involves 7+ API calls (LinkedIn scraping, company data, multiple LLM calls)
- Some LinkedIn profiles/companies are slow to scrape (30-60s each)
- Conservative timeout ensures legitimate slow responses complete
- Prevents indefinite hangs while allowing most profiles to succeed
Configuration:
- Timeout can be adjusted in
backend/workflow.py:PROFILE_TIMEOUT_SECONDS = 180 - No individual API timeouts - simpler implementation with per-profile wrapper only
Edit the ICP_EVALUATION_PROMPT in backend/prompts.py to customize what defines your ideal customer.
All LLM prompts are centralized in backend/prompts.py:
PROFILE_SUMMARY_SYSTEM_PROMPT- How to summarize LinkedIn profilesCOMPANY_SUMMARY_SYSTEM_PROMPT- How to summarize company dataICP_EVALUATION_PROMPT- How to evaluate lead fit
The workflow passes complete raw JSON from Apify directly to AI models using json.dumps(data, indent=2) for better context and accuracy - no helper functions or pre-formatting.
All workflow steps are in backend/workflow.py. You can easily:
- Add new data sources
- Skip certain steps
- Modify the evaluation logic
- Add custom enrichment
The structure is flexible to accommodate additional automations. Create new workflow files following the same pattern.
GET /- Health checkPOST /api/process-post- Process a LinkedIn post{ "post_url": "7392508631268835328" }
- Apify: LinkedIn data scraping
apimaestro~linkedin-post-reactions: Get post reactionsanchor~linkedin-profile-enrichment: Profile detailslogical_scrapers~linkedin-company-scraper: Company data (primary)apimaestro~linkedin-company-detail: Company data (backup)
- Airtable: Lead database and CRM
- Groq: AI summarization (Llama 3.3 70B model)
- OpenAI: ICP evaluation (GPT-5 mini via /v1/responses endpoint with high reasoning effort)
Backend:
cd backend
uvicorn main:app --reload --host localhost --port 8000Frontend:
cd frontend
npm run devTest each workflow component independently before running the full pipeline:
cd backend
python test_components.pyAvailable Tests:
- Test Apify post reactions scraper
- Test LinkedIn profile enrichment
- Test company data scraping (primary and backup)
- Test Groq AI summarization
- Test OpenAI ICP evaluation
- Test Airtable record creation
- Test full end-to-end pipeline (single lead)
How to use:
- Edit
backend/test_components.py - Replace example URLs/IDs with real LinkedIn data
- Uncomment the specific tests you want to run
- Run the script
See detailed instructions in backend/test_components.py.
Frontend:
cd frontend
npm run build
npm run startFollow these steps to test the application:
-
Start the backend server:
cd backend python main.py -
Verify server is running:
- You should see:
INFO: Uvicorn running on http://localhost:8000 - Open http://localhost:8000 in browser
- You should see:
{"message": "LinkedIn Lead Profiling API is running"}
- You should see:
-
Check for errors:
- β No errors β Backend is ready
- β Import errors β Run
pip install -r requirements.txt - β Environment errors β Check
.envfile exists and has all keys
-
In a new terminal, start the frontend:
cd frontend npm run dev -
Verify frontend is running:
- You should see:
- Local: http://localhost:3000 - Open http://localhost:3000 in browser
- You should see the LinkedIn Lead Profiling dashboard
- You should see:
-
Check the UI:
- β Input form visible β "Enter LinkedIn Post URL or ID"
- β "Process Post" button visible β UI is ready
-
Find a LinkedIn post with reactions:
- Go to LinkedIn and find any post with reactions
- Copy the post URL or ID (e.g.,
7392508631268835328)
-
Process the post:
- Paste the URL/ID in the input field
- Click "Process Post"
- Watch the backend terminal for logs
-
Expected behavior:
Backend logs should show: ==================================== STARTING LINKEDIN POST PROCESSING Post ID: 7392508631268835328 ==================================== STEP 1: Fetching post reactions... β Fetched 6 reactions from post... --- Processing Reactor 1/6: John Doe --- STEP 2a: Checking Airtable... β New profile. Proceeding with enrichment... ... -
Check results:
- β Progress bar updates
- β Results table appears with processed leads
- β ICP fit strength shown (High/Medium/Low)
- β "View Profile" links work
- Open your Airtable base
- Check the "Leads" table
- Verify:
- β New records created
- β All fields populated (name, title, ICP fit, etc.)
- β URN field contains LinkedIn URN
- β Profile URL is clickable
Frontend can't connect to backend:
- Make sure backend is running on
localhost:8000 - Check browser console for errors
- Verify
next.config.jsproxy configuration
Backend API errors:
- Check
.envfile has all required tokens - Verify Apify token is valid
- Verify Airtable base ID and table name are correct
- Check backend terminal for specific error messages
No results returned:
- Check if post ID is valid (should be a numeric string)
- Verify Apify can access the LinkedIn post
- Check rate limits on Apify account
Airtable errors:
- Verify table schema matches exactly (field names are case-sensitive)
- Check Airtable token has read/write permissions
- Ensure base ID is correct
Use this test post ID from the reference: 7392508631268835328
Expected result: 6 reactors should be processed and appear in results table.
Private - Not for distribution
- Keep your
.envfiles secure and never commit them - The automation respects rate limits of external services
- Profile data is cached in Airtable to avoid redundant API calls
- All logs are printed to console for transparency