LinkedIn Lead Profiling Automation

#start:

frontend: npm run dev
backend: python main.py

LinkedIn Lead Profiling Automation

⚠️ TEMPORARY NOTICE (2025-01-11): Airtable integration is currently disabled due to performance issues. All profiles are processed and displayed on the frontend, but NOT saved to Airtable. This is temporary until Airtable subscription is sorted. See CHANGES.md Section 13 for details and revert instructions.

An automated system that profiles LinkedIn users, enriches their data, and evaluates them against criteria (default ICP or custom use cases). Supports multiple workflows: processing post reactors, manual profile input, and custom evaluation criteria.

Features

Three Processing Workflows:

🔄 From Post Reactors (ICP): Automatically fetch and process reactions from a LinkedIn post against Dograh's ICP
✍️ Manual Input (ICP): Process specific profiles by entering LinkedIn URLs against Dograh's ICP
🎯 Custom Evaluation (NEW): Define your own evaluation criteria for any use case - works with both post reactors and manual profiles

Core Capabilities:

👤 Enrich profile data from LinkedIn (via Apify)
🏢 Gather company information with fallback mechanism
🤖 AI-powered profile and company summarization (Groq Llama 3.3 70B)
🎯 ICP matching evaluation (OpenAI GPT-5 mini with high reasoning effort)
📊 Store and track leads in Airtable
💻 Full-stack dashboard with real-time progress tracking
⚡ Smart rate limiting (max 100 profiles per batch)
🔁 Automatic deduplication (skips existing profiles)
📈 Incremental results display (see leads as they're processed, every 20 seconds)

UI Features (NEW: 2025-01-21):

📋 Job History Sidebar: Collapsible sidebar showing your last 50 jobs
- Click the 📋 button (top-left) to open job history
- View job status, timestamp, and input preview
- Click any job to resume tracking and view results
- Persists across browser sessions (stored in localStorage)
- Filter by workflow type (post reactors, manual input, custom evaluation)
- Clear all history with one click
🔍 Load Results by Job ID: Manually load results from any job
- Collapsible section below page title on all 3 pages
- Paste any job ID to view results from closed tabs or previous sessions
- Works across all workflow types (post/manual/custom)
- Continues polling if job is still processing
- Perfect for recovering lost tabs or sharing job results

Architecture

Frontend: Next.js (TypeScript) - Runs on 0.0.0.0:3000
Backend: FastAPI (Python) - Runs on localhost:8000
Proxy: Next.js proxies /api/* requests to FastAPI

Project Structure

linkedin-profiling/
├── frontend/                    # Next.js frontend
│   ├── app/
│   │   ├── page.tsx            # Post reactors workflow (ICP mode)
│   │   ├── manual-input/
│   │   │   └── page.tsx        # Manual profile input workflow (ICP mode)
│   │   ├── custom-evaluation/
│   │   │   └── page.tsx        # Custom use case evaluation (NEW)
│   │   ├── login/
│   │   │   └── page.tsx        # Authentication
│   │   └── layout.tsx
│   ├── middleware.ts           # Route protection
│   ├── next.config.js          # Proxy configuration
│   ├── package.json
│   └── tsconfig.json
│
├── backend/                    # FastAPI backend
│   ├── main.py                # FastAPI server (ICP + custom endpoints)
│   ├── workflow.py            # Linear automation (supports both modes)
│   ├── prompts.py             # Centralized LLM prompts (ICP + custom)
│   ├── test_components.py     # Component testing script
│   ├── requirements.txt
│   └── .env.example
│
└── README.md

🚀 NEXT STEPS FOR USER

Before running the application, complete these setup tasks:

1. Get API Credentials

You need accounts and API keys for these services:

Apify (Already provided)

✅ Token included in .env.example: apify_api_____________

Airtable

Sign up at https://airtable.com
Create a new base (or use existing)
Go to https://airtable.com/create/tokens
Create a personal access token with data.records:read and data.records:write scopes
Copy your Base ID from the base URL: https://airtable.com/YOUR_BASE_ID/...
Create a table named "Leads" (or your preferred name)

OpenAI

Sign up at https://platform.openai.com
Go to https://platform.openai.com/api-keys
Create a new API key
Copy the key (starts with sk-...)

Groq (Already provided)

✅ Token included in .env.example: gsk_______________

2. Set Up Airtable Table

Create a table with these fields (exact names required - case-sensitive):

Field Name	Field Type	Options
URN	Single line text	-
Name	Single line text	-
company_name	Single line text	-
company_website	URL or Single line text	-
Email Address	Email	-
country	Single line text	-
current_job_location	Single line text	-
Title	Long text	-
Profile URL	URL	-
icp_fit_strength	Single select	Options: High, Medium, Low
Reason	Long text	-
validation_judgement	Single select	Options: Correct, Incorrect, Unsure
validation_reason	Long text	-
profile_summary	Long text	-
company_summary	Long text	-

IMPORTANT: Field names are case-sensitive. Capital letters must match exactly (URN, Name, Email Address, Title, Profile URL, Reason). The remaining fields are lowercase (company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary).

3. Configure Environment Variables

Edit backend/.env with your credentials:

cd backend
cp .env.example .env
# Edit .env and add your tokens

Authentication

The application is protected with password authentication to prevent unauthorized access.

Login Credentials:

Password is set in backend/.env: PORTAL_PASSWORD=your-password-here
Default example password: ________ (in .env.example)
No username required - just enter the password
IMPORTANT: Change the default password before deploying or sharing

How It Works:

All routes except /login are protected by middleware
Frontend sends password to backend API for validation
Backend validates against PORTAL_PASSWORD environment variable
On successful authentication, a secure HTTP-only cookie is set (7-day expiration)
Password never stored in frontend code or committed to repository
Backend must be running for authentication to work

Login Page:

URL: http://localhost:3000/login
Simple password field with validation
Shows error message on incorrect password
Redirects to dashboard on successful login

Technical Details:

Password validation happens server-side (FastAPI backend)
Middleware checks for auth_token cookie on every request
Cookie is HTTP-only and secure (in production)
Frontend proxies auth requests to localhost:8000/api/auth/login

Setup Instructions

Backend Setup

Navigate to the backend directory:
```
cd backend
```

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Create .env file from .env.example:
```
cp .env.example .env
```
Fill in your API credentials in .env:
- APIFY_TOKEN: Your Apify API token
- AIRTABLE_TOKEN: Your Airtable personal access token
- AIRTABLE_BASE_ID: Your Airtable base ID
- AIRTABLE_TABLE_NAME: Table name (default: "Leads")
- OPENAI_API_KEY: Your OpenAI API key
- GROQ_API_KEY: Your Groq API key
- PORTAL_PASSWORD: Your portal authentication password (change from default!)
Run the FastAPI server:
```
python main.py
```
Server will start on http://localhost:8000

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```
App will be available on http://0.0.0.0:3000 or http://localhost:3000

Airtable Schema- NOT USING RIGHT NOW

Create a table with these fields (exact names - case-sensitive):

Field Name	Field Type	Description
URN	Single line text	LinkedIn URN (unique identifier)
Name	Single line text	Full name
company_name	Single line text	Company name
company_website	URL or Single line text	Company website URL
Email Address	Email	Email address (if available)
country	Single line text	Country from profile (N/A if unavailable)
current_job_location	Single line text	Current job location (N/A if unavailable)
Title	Long text	Job title/headline
Profile URL	URL	LinkedIn profile URL
icp_fit_strength	Single select	"High", "Medium", "Low"
Reason	Long text	Reason for ICP evaluation
validation_judgement	Single select	"Correct", "Incorrect", "Unsure"
validation_reason	Long text	Reason for validation judgement
profile_summary	Long text	AI-generated profile summary
company_summary	Long text	AI-generated company summary

Note: Field names must match exactly. Capitalized fields: URN, Name, Email Address, Title, Profile URL, Reason. Lowercase fields: company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary.

Usage

First-time Access:

Start both backend and frontend servers
Open http://localhost:3000 in your browser
You'll be redirected to the login page
Enter the password you set in backend/.env (PORTAL_PASSWORD)
Click "Login" - you'll be redirected to the dashboard

Option 1: From Post Reactors (Automatic)

Start both backend and frontend servers (if not already running)
Login if not already authenticated
Open http://localhost:3000 in your browser (or you'll already be there after login)
Click "From Post Reactors" tab (default)
Enter a LinkedIn post URL or ID (e.g., 7392508631268835328)
Click "Process Post"
View real-time progress and results in the dashboard
- Progress bar updates every 20 seconds showing X/100 profiles processed
- Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
- No need to wait - review leads while processing continues

Option 2: Manual Profile Input

Start both backend and frontend servers
Open http://localhost:3000 in your browser
Click "Manual Input" tab
Paste LinkedIn profile URLs (one per line, max 100)
- Format: https://linkedin.com/in/username
Click "Process X Profiles"
View real-time progress and results in the dashboard
- Progress bar updates every 20 seconds showing X/100 profiles processed
- Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
- No need to wait - review leads while processing continues

Option 3: Custom Evaluation (NEW)

Start both backend and frontend servers
Open http://localhost:3000 in your browser
Click "Custom Evaluation →" tab
Define your evaluation criteria (structured form):
- Use Case Description (required): Describe what you're looking for
  - Example: "Find potential customers for HR analytics software"
- Target Job Titles/Roles (optional): Comma-separated roles
  - Example: "HR Director, VP People, CHRO, Head of HR"
- Target Industries/Sectors (optional): Industries to focus on
  - Example: "SaaS, Fintech, Healthcare, E-commerce"
- Company Size (optional): Dropdown selection
  - Options: Any, 1-10, 10-50, 50-200, 200-1000, 1000+ employees
- Additional Requirements (optional): Exclusions, examples, edge cases
  - Example: "Exclude consultants and agencies. Prefer B2B companies with venture funding."
Choose input type:
- LinkedIn Post URL/ID: Process reactions from a post
- Manual Profile URLs: Enter specific profile URLs (one per line)
Click "Start Custom Evaluation"
View real-time progress and results in the dashboard
- Same table layout as ICP mode (reuses "ICP Fit" column terminology)
- Profiles evaluated against YOUR criteria instead of Dograh's ICP
- Same validation workflow for quality control

Use Cases for Custom Evaluation:

Find founders of B2B SaaS companies in specific industries
Identify decision-makers at companies of specific sizes
Locate professionals with specific skills/experience combinations
Discover potential partners, investors, or collaborators
Any custom use case beyond the default ICP

Recovering Lost Jobs (NEW)

Scenario 1: You accidentally closed the browser tab

Click the 📋 button at the top-left of any page
Your job history sidebar will open showing recent jobs
Click the job you want to resume
Results will load immediately, and polling will continue if still processing

Scenario 2: You have a job ID from a previous session

Open any of the 3 workflow pages
Click "🔍 Load Results by Job ID" section to expand
Paste your job ID (e.g., 91abfc33-10a4-4629-84ba-d27eb2f4cf55)
Click "Load Results"
Results will display in the same table format

Scenario 3: Sharing results with a teammate

Copy the job ID from the blue info box (shown during processing)
Share the job ID with your teammate
They can paste it into the "Load Results by Job ID" section
Results will load (requires authentication with same portal password)

Note: Job history persists in your browser's localStorage and survives page refreshes and browser restarts. However, job data on the backend is temporary and will be lost if the backend server restarts.

Results Table Layout

The frontend dashboard displays processed leads in a structured table with the following columns:

Column	Width	Description
Name	150px	Full name of the LinkedIn profile
Company	150px	Company name + website URL (stacked display)
Email	180px	Email address from profile (or "Not Available")
Phone	130px	Phone number from profile (or "N/A")
Country	120px	Country from profile (or "N/A")
Job Location	200px	Current job location from profile (or "N/A")
Title	120px	Job title/headline
ICP Fit	100px	High/Medium/Low badge (color-coded)
ICP Reason	500px	Detailed explanation of ICP evaluation
Validation	120px	Correct/Incorrect/Unsure badge (color-coded)
Validation Reason	300px	Explanation of validation assessment
Profile URL	120px	Full clickable LinkedIn profile URL
Followers	100px	Number of followers (right-aligned, comma-separated)
Connections	100px	Number of connections (right-aligned, comma-separated)

Key Features:

Company column: Shows company name on first line, website URL on second line (if available)
- Website extracted from company API: website, websiteUrl, or basic_info.website
- Clickable blue link (12px font size)
Email column: Displays email address extracted from LinkedIn profile data
- Shows "Not Available" when profile has no email
- Positioned after Company, before Phone for logical contact information grouping
Phone column: Displays phone number extracted from profile data (mobileNumber field)
- Shows "N/A" when profile has no phone number
Country column: Displays country from profile (addressCountryOnly field)
- Shows "N/A" when country data is unavailable
Job Location column: Displays current job location from profile (jobLocation field)
- Shows "N/A" when location data is unavailable
- Examples: "Bengaluru, Karnataka, India", "San Francisco Bay Area"
Profile URL column: Displays full URL as clickable text (for easy bulk copying)
Followers column: Displays follower count from profile (followers field)
- Right-aligned with comma separators (e.g., "11,147")
- Shows "0" when follower data is unavailable
Connections column: Displays connection count from profile (connections field)
- Right-aligned with comma separators (e.g., "7,453")
- Shows "0" when connection data is unavailable
Optimized widths: ICP Reason (2.5x wider), Title (0.4x narrower), Validation Reason (1.5x wider) for better readability
Color-coded badges: Green (High/Correct), Yellow (Medium/Unsure), Red (Low/Incorrect)

Workflow Steps

All automation steps are visible in backend/workflow.py:

From Post Reactors Workflow:

Fetch Post Reactions: Get reactors from the LinkedIn post
For each reactor:
- Check Airtable: Skip profiles already in the database
- Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
- Enrich Company: Fetch company information with fallback (Apify)
- Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
- Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
- Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
- Store: Save/update record in Airtable

Manual Input Workflow:

Parse Profile URLs: Validate and clean provided LinkedIn profile URLs
For each profile URL:
- Extract Profile ID: Use LinkedIn username from URL as URN (e.g., "priteshkr" from "linkedin.com/in/priteshkr/")
- Check Airtable: Skip profiles already in the database (using profile ID as URN)
- Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
- Enrich Company: Fetch company information with fallback (Apify)
- Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
- Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
- Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
- Store: Save/update record in Airtable with profile ID as URN

Note: Processing is limited to 100 profiles per batch (both workflows) to prevent API overload and avoid LinkedIn rate limits. This limit can be adjusted in backend/workflow.py by changing the MAX_REACTORS_PER_POST constant.

URN Format Difference:

Post Reactors: Uses Apify's URN field (e.g., urn:li:person:123456789)
Manual Input: Uses LinkedIn profile ID from URL (e.g., priteshkr from linkedin.com/in/priteshkr/)
This ensures consistent and predictable URNs for manually added profiles

Timeout Handling

Per-Profile Timeout (180 seconds):

Each profile has a 180-second (3-minute) timeout for all processing steps combined
If processing exceeds the timeout, the profile is automatically skipped
Batch processing continues immediately to the next profile (no blocking)
Skipped profiles are tracked separately with the reason for skipping

Common Skip Reasons:

⏱️ "Processing exceeded 180s timeout" - Profile took too long to process
🔌 "Network error during profile fetch" - API connectivity issues
⚠️ "API error: [specific message]" - Upstream service errors
❌ "Could not fetch profile data" - Profile not accessible or invalid

Skipped Profiles Display:

Skipped profiles appear in a separate table below successful leads
Table shows: Profile Name, Skip Reason, Profile URL link
Orange/yellow color scheme distinguishes skipped from successful profiles
Allows manual review of profiles that need attention

Why 180 seconds?

Profile processing involves 7+ API calls (LinkedIn scraping, company data, multiple LLM calls)
Some LinkedIn profiles/companies are slow to scrape (30-60s each)
Conservative timeout ensures legitimate slow responses complete
Prevents indefinite hangs while allowing most profiles to succeed

Configuration:

Timeout can be adjusted in backend/workflow.py: PROFILE_TIMEOUT_SECONDS = 180
No individual API timeouts - simpler implementation with per-profile wrapper only

Customization

Modify ICP Criteria

Edit the ICP_EVALUATION_PROMPT in backend/prompts.py to customize what defines your ideal customer.

Modify LLM Prompts

All LLM prompts are centralized in backend/prompts.py:

PROFILE_SUMMARY_SYSTEM_PROMPT - How to summarize LinkedIn profiles
COMPANY_SUMMARY_SYSTEM_PROMPT - How to summarize company data
ICP_EVALUATION_PROMPT - How to evaluate lead fit

The workflow passes complete raw JSON from Apify directly to AI models using json.dumps(data, indent=2) for better context and accuracy - no helper functions or pre-formatting.

Add/Remove Steps

All workflow steps are in backend/workflow.py. You can easily:

Add new data sources
Skip certain steps
Modify the evaluation logic
Add custom enrichment

Add More Automations

The structure is flexible to accommodate additional automations. Create new workflow files following the same pattern.

API Endpoints

GET / - Health check
POST /api/process-post - Process a LinkedIn post
```
{
  "post_url": "7392508631268835328"
}
```

External Services Used

Apify: LinkedIn data scraping
- apimaestro~linkedin-post-reactions: Get post reactions
- anchor~linkedin-profile-enrichment: Profile details
- logical_scrapers~linkedin-company-scraper: Company data (primary)
- apimaestro~linkedin-company-detail: Company data (backup)
Airtable: Lead database and CRM
Groq: AI summarization (Llama 3.3 70B model)
OpenAI: ICP evaluation (GPT-5 mini via /v1/responses endpoint with high reasoning effort)

Development

Run in Development Mode

Backend:

cd backend
uvicorn main:app --reload --host localhost --port 8000

Frontend:

cd frontend
npm run dev

Testing Individual Components

Test each workflow component independently before running the full pipeline:

cd backend
python test_components.py

Available Tests:

Test Apify post reactions scraper
Test LinkedIn profile enrichment
Test company data scraping (primary and backup)
Test Groq AI summarization
Test OpenAI ICP evaluation
Test Airtable record creation
Test full end-to-end pipeline (single lead)

How to use:

Edit backend/test_components.py
Replace example URLs/IDs with real LinkedIn data
Uncomment the specific tests you want to run
Run the script

See detailed instructions in backend/test_components.py.

Build for Production

Frontend:

cd frontend
npm run build
npm run start

🧪 TESTING INSTRUCTIONS

Follow these steps to test the application:

Step 1: Test Backend Server

Start the backend server:
```
cd backend
python main.py
```
Verify server is running:
- You should see: INFO: Uvicorn running on http://localhost:8000
- Open http://localhost:8000 in browser
- You should see: {"message": "LinkedIn Lead Profiling API is running"}
Check for errors:
- ✅ No errors → Backend is ready
- ❌ Import errors → Run pip install -r requirements.txt
- ❌ Environment errors → Check .env file exists and has all keys

Step 2: Test Frontend Server

In a new terminal, start the frontend:
```
cd frontend
npm run dev
```
Verify frontend is running:
- You should see: - Local: http://localhost:3000
- Open http://localhost:3000 in browser
- You should see the LinkedIn Lead Profiling dashboard
Check the UI:
- ✅ Input form visible → "Enter LinkedIn Post URL or ID"
- ✅ "Process Post" button visible → UI is ready

Step 3: Test with a LinkedIn Post

Find a LinkedIn post with reactions:
- Go to LinkedIn and find any post with reactions
- Copy the post URL or ID (e.g., 7392508631268835328)
Process the post:
- Paste the URL/ID in the input field
- Click "Process Post"
- Watch the backend terminal for logs

Expected behavior:

Backend logs should show:
====================================
STARTING LINKEDIN POST PROCESSING
Post ID: 7392508631268835328
====================================

STEP 1: Fetching post reactions...
✓ Fetched 6 reactions from post...

--- Processing Reactor 1/6: John Doe ---
STEP 2a: Checking Airtable...
→ New profile. Proceeding with enrichment...
...

Check results:
- ✅ Progress bar updates
- ✅ Results table appears with processed leads
- ✅ ICP fit strength shown (High/Medium/Low)
- ✅ "View Profile" links work

Step 4: Verify Airtable Integration

Open your Airtable base
Check the "Leads" table
Verify:
- ✅ New records created
- ✅ All fields populated (name, title, ICP fit, etc.)
- ✅ URN field contains LinkedIn URN
- ✅ Profile URL is clickable

Troubleshooting

Frontend can't connect to backend:

Make sure backend is running on localhost:8000
Check browser console for errors
Verify next.config.js proxy configuration

Backend API errors:

Check .env file has all required tokens
Verify Apify token is valid
Verify Airtable base ID and table name are correct
Check backend terminal for specific error messages

No results returned:

Check if post ID is valid (should be a numeric string)
Verify Apify can access the LinkedIn post
Check rate limits on Apify account

Airtable errors:

Verify table schema matches exactly (field names are case-sensitive)
Check Airtable token has read/write permissions
Ensure base ID is correct

Quick Test with Provided Data

Use this test post ID from the reference: 7392508631268835328

Expected result: 6 reactors should be processed and appear in results table.

License

Private - Not for distribution

Notes

Keep your .env files secure and never commit them
The automation respects rate limits of external services
Profile data is cached in Airtable to avoid redundant API calls
All logs are printed to console for transparency

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
frontend		frontend
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
Verify LINKEDIN PROFILE RELEVANCE apify project		Verify LINKEDIN PROFILE RELEVANCE apify project

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Lead Profiling Automation

Features

Three Processing Workflows:

Core Capabilities:

UI Features (NEW: 2025-01-21):

Architecture

Project Structure

🚀 NEXT STEPS FOR USER

1. Get API Credentials

Apify (Already provided)

Airtable

OpenAI

Groq (Already provided)

2. Set Up Airtable Table

3. Configure Environment Variables

Authentication

Setup Instructions

Backend Setup

Frontend Setup

Airtable Schema- NOT USING RIGHT NOW

Usage

Option 1: From Post Reactors (Automatic)

Option 2: Manual Profile Input

Option 3: Custom Evaluation (NEW)

Recovering Lost Jobs (NEW)

Results Table Layout

Workflow Steps

From Post Reactors Workflow:

Manual Input Workflow:

Timeout Handling

Customization

Modify ICP Criteria

Modify LLM Prompts

Add/Remove Steps

Add More Automations

API Endpoints

External Services Used

Development

Run in Development Mode

Testing Individual Components

Build for Production

🧪 TESTING INSTRUCTIONS

Step 1: Test Backend Server

Step 2: Test Frontend Server

Step 3: Test with a LinkedIn Post

Step 4: Verify Airtable Integration

Troubleshooting

Quick Test with Provided Data

License

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages