Skip to content

dograh-hq/linkedin-get-icp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

#start:

  • frontend: npm run dev

  • backend: python main.py

LinkedIn Lead Profiling Automation

⚠️ TEMPORARY NOTICE (2025-01-11): Airtable integration is currently disabled due to performance issues. All profiles are processed and displayed on the frontend, but NOT saved to Airtable. This is temporary until Airtable subscription is sorted. See CHANGES.md Section 13 for details and revert instructions.

An automated system that profiles LinkedIn users, enriches their data, and evaluates them against criteria (default ICP or custom use cases). Supports multiple workflows: processing post reactors, manual profile input, and custom evaluation criteria.

Features

Three Processing Workflows:

  • πŸ”„ From Post Reactors (ICP): Automatically fetch and process reactions from a LinkedIn post against Dograh's ICP
  • ✍️ Manual Input (ICP): Process specific profiles by entering LinkedIn URLs against Dograh's ICP
  • 🎯 Custom Evaluation (NEW): Define your own evaluation criteria for any use case - works with both post reactors and manual profiles

Core Capabilities:

  • πŸ‘€ Enrich profile data from LinkedIn (via Apify)
  • 🏒 Gather company information with fallback mechanism
  • πŸ€– AI-powered profile and company summarization (Groq Llama 3.3 70B)
  • 🎯 ICP matching evaluation (OpenAI GPT-5 mini with high reasoning effort)
  • πŸ“Š Store and track leads in Airtable
  • πŸ’» Full-stack dashboard with real-time progress tracking
  • ⚑ Smart rate limiting (max 100 profiles per batch)
  • πŸ” Automatic deduplication (skips existing profiles)
  • πŸ“ˆ Incremental results display (see leads as they're processed, every 20 seconds)

UI Features (NEW: 2025-01-21):

  • πŸ“‹ Job History Sidebar: Collapsible sidebar showing your last 50 jobs
    • Click the πŸ“‹ button (top-left) to open job history
    • View job status, timestamp, and input preview
    • Click any job to resume tracking and view results
    • Persists across browser sessions (stored in localStorage)
    • Filter by workflow type (post reactors, manual input, custom evaluation)
    • Clear all history with one click
  • πŸ” Load Results by Job ID: Manually load results from any job
    • Collapsible section below page title on all 3 pages
    • Paste any job ID to view results from closed tabs or previous sessions
    • Works across all workflow types (post/manual/custom)
    • Continues polling if job is still processing
    • Perfect for recovering lost tabs or sharing job results

Architecture

  • Frontend: Next.js (TypeScript) - Runs on 0.0.0.0:3000
  • Backend: FastAPI (Python) - Runs on localhost:8000
  • Proxy: Next.js proxies /api/* requests to FastAPI

Project Structure

linkedin-profiling/
β”œβ”€β”€ frontend/                    # Next.js frontend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ page.tsx            # Post reactors workflow (ICP mode)
β”‚   β”‚   β”œβ”€β”€ manual-input/
β”‚   β”‚   β”‚   └── page.tsx        # Manual profile input workflow (ICP mode)
β”‚   β”‚   β”œβ”€β”€ custom-evaluation/
β”‚   β”‚   β”‚   └── page.tsx        # Custom use case evaluation (NEW)
β”‚   β”‚   β”œβ”€β”€ login/
β”‚   β”‚   β”‚   └── page.tsx        # Authentication
β”‚   β”‚   └── layout.tsx
β”‚   β”œβ”€β”€ middleware.ts           # Route protection
β”‚   β”œβ”€β”€ next.config.js          # Proxy configuration
β”‚   β”œβ”€β”€ package.json
β”‚   └── tsconfig.json
β”‚
β”œβ”€β”€ backend/                    # FastAPI backend
β”‚   β”œβ”€β”€ main.py                # FastAPI server (ICP + custom endpoints)
β”‚   β”œβ”€β”€ workflow.py            # Linear automation (supports both modes)
β”‚   β”œβ”€β”€ prompts.py             # Centralized LLM prompts (ICP + custom)
β”‚   β”œβ”€β”€ test_components.py     # Component testing script
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── .env.example
β”‚
└── README.md

πŸš€ NEXT STEPS FOR USER

Before running the application, complete these setup tasks:

1. Get API Credentials

You need accounts and API keys for these services:

Apify (Already provided)

  • βœ… Token included in .env.example: apify_api_____________

Airtable

  • Sign up at https://airtable.com
  • Create a new base (or use existing)
  • Go to https://airtable.com/create/tokens
  • Create a personal access token with data.records:read and data.records:write scopes
  • Copy your Base ID from the base URL: https://airtable.com/YOUR_BASE_ID/...
  • Create a table named "Leads" (or your preferred name)

OpenAI

Groq (Already provided)

  • βœ… Token included in .env.example: gsk_______________

2. Set Up Airtable Table

Create a table with these fields (exact names required - case-sensitive):

Field Name Field Type Options
URN Single line text -
Name Single line text -
company_name Single line text -
company_website URL or Single line text -
Email Address Email -
country Single line text -
current_job_location Single line text -
Title Long text -
Profile URL URL -
icp_fit_strength Single select Options: High, Medium, Low
Reason Long text -
validation_judgement Single select Options: Correct, Incorrect, Unsure
validation_reason Long text -
profile_summary Long text -
company_summary Long text -

IMPORTANT: Field names are case-sensitive. Capital letters must match exactly (URN, Name, Email Address, Title, Profile URL, Reason). The remaining fields are lowercase (company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary).

3. Configure Environment Variables

Edit backend/.env with your credentials:

cd backend
cp .env.example .env
# Edit .env and add your tokens

Authentication

The application is protected with password authentication to prevent unauthorized access.

Login Credentials:

  • Password is set in backend/.env: PORTAL_PASSWORD=your-password-here
  • Default example password: ________ (in .env.example)
  • No username required - just enter the password
  • IMPORTANT: Change the default password before deploying or sharing

How It Works:

  • All routes except /login are protected by middleware
  • Frontend sends password to backend API for validation
  • Backend validates against PORTAL_PASSWORD environment variable
  • On successful authentication, a secure HTTP-only cookie is set (7-day expiration)
  • Password never stored in frontend code or committed to repository
  • Backend must be running for authentication to work

Login Page:

  • URL: http://localhost:3000/login
  • Simple password field with validation
  • Shows error message on incorrect password
  • Redirects to dashboard on successful login

Technical Details:

  • Password validation happens server-side (FastAPI backend)
  • Middleware checks for auth_token cookie on every request
  • Cookie is HTTP-only and secure (in production)
  • Frontend proxies auth requests to localhost:8000/api/auth/login

Setup Instructions

Backend Setup

  1. Navigate to the backend directory:

    cd backend
  2. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Create .env file from .env.example:

    cp .env.example .env
  5. Fill in your API credentials in .env:

    • APIFY_TOKEN: Your Apify API token
    • AIRTABLE_TOKEN: Your Airtable personal access token
    • AIRTABLE_BASE_ID: Your Airtable base ID
    • AIRTABLE_TABLE_NAME: Table name (default: "Leads")
    • OPENAI_API_KEY: Your OpenAI API key
    • GROQ_API_KEY: Your Groq API key
    • PORTAL_PASSWORD: Your portal authentication password (change from default!)
  6. Run the FastAPI server:

    python main.py

    Server will start on http://localhost:8000

Frontend Setup

  1. Navigate to the frontend directory:

    cd frontend
  2. Install dependencies:

    npm install
  3. Run the development server:

    npm run dev

    App will be available on http://0.0.0.0:3000 or http://localhost:3000

Airtable Schema- NOT USING RIGHT NOW

Create a table with these fields (exact names - case-sensitive):

Field Name Field Type Description
URN Single line text LinkedIn URN (unique identifier)
Name Single line text Full name
company_name Single line text Company name
company_website URL or Single line text Company website URL
Email Address Email Email address (if available)
country Single line text Country from profile (N/A if unavailable)
current_job_location Single line text Current job location (N/A if unavailable)
Title Long text Job title/headline
Profile URL URL LinkedIn profile URL
icp_fit_strength Single select "High", "Medium", "Low"
Reason Long text Reason for ICP evaluation
validation_judgement Single select "Correct", "Incorrect", "Unsure"
validation_reason Long text Reason for validation judgement
profile_summary Long text AI-generated profile summary
company_summary Long text AI-generated company summary

Note: Field names must match exactly. Capitalized fields: URN, Name, Email Address, Title, Profile URL, Reason. Lowercase fields: company_name, company_website, country, current_job_location, icp_fit_strength, validation_judgement, validation_reason, profile_summary, company_summary.

Usage

First-time Access:

  1. Start both backend and frontend servers
  2. Open http://localhost:3000 in your browser
  3. You'll be redirected to the login page
  4. Enter the password you set in backend/.env (PORTAL_PASSWORD)
  5. Click "Login" - you'll be redirected to the dashboard

Option 1: From Post Reactors (Automatic)

  1. Start both backend and frontend servers (if not already running)
  2. Login if not already authenticated
  3. Open http://localhost:3000 in your browser (or you'll already be there after login)
  4. Click "From Post Reactors" tab (default)
  5. Enter a LinkedIn post URL or ID (e.g., 7392508631268835328)
  6. Click "Process Post"
  7. View real-time progress and results in the dashboard
    • Progress bar updates every 20 seconds showing X/100 profiles processed
    • Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
    • No need to wait - review leads while processing continues

Option 2: Manual Profile Input

  1. Start both backend and frontend servers
  2. Open http://localhost:3000 in your browser
  3. Click "Manual Input" tab
  4. Paste LinkedIn profile URLs (one per line, max 100)
    • Format: https://linkedin.com/in/username
  5. Click "Process X Profiles"
  6. View real-time progress and results in the dashboard
    • Progress bar updates every 20 seconds showing X/100 profiles processed
    • Results appear incrementally as leads are saved to Airtable (~1 lead per minute)
    • No need to wait - review leads while processing continues

Option 3: Custom Evaluation (NEW)

  1. Start both backend and frontend servers
  2. Open http://localhost:3000 in your browser
  3. Click "Custom Evaluation β†’" tab
  4. Define your evaluation criteria (structured form):
    • Use Case Description (required): Describe what you're looking for
      • Example: "Find potential customers for HR analytics software"
    • Target Job Titles/Roles (optional): Comma-separated roles
      • Example: "HR Director, VP People, CHRO, Head of HR"
    • Target Industries/Sectors (optional): Industries to focus on
      • Example: "SaaS, Fintech, Healthcare, E-commerce"
    • Company Size (optional): Dropdown selection
      • Options: Any, 1-10, 10-50, 50-200, 200-1000, 1000+ employees
    • Additional Requirements (optional): Exclusions, examples, edge cases
      • Example: "Exclude consultants and agencies. Prefer B2B companies with venture funding."
  5. Choose input type:
    • LinkedIn Post URL/ID: Process reactions from a post
    • Manual Profile URLs: Enter specific profile URLs (one per line)
  6. Click "Start Custom Evaluation"
  7. View real-time progress and results in the dashboard
    • Same table layout as ICP mode (reuses "ICP Fit" column terminology)
    • Profiles evaluated against YOUR criteria instead of Dograh's ICP
    • Same validation workflow for quality control

Use Cases for Custom Evaluation:

  • Find founders of B2B SaaS companies in specific industries
  • Identify decision-makers at companies of specific sizes
  • Locate professionals with specific skills/experience combinations
  • Discover potential partners, investors, or collaborators
  • Any custom use case beyond the default ICP

Recovering Lost Jobs (NEW)

Scenario 1: You accidentally closed the browser tab

  1. Click the πŸ“‹ button at the top-left of any page
  2. Your job history sidebar will open showing recent jobs
  3. Click the job you want to resume
  4. Results will load immediately, and polling will continue if still processing

Scenario 2: You have a job ID from a previous session

  1. Open any of the 3 workflow pages
  2. Click "πŸ” Load Results by Job ID" section to expand
  3. Paste your job ID (e.g., 91abfc33-10a4-4629-84ba-d27eb2f4cf55)
  4. Click "Load Results"
  5. Results will display in the same table format

Scenario 3: Sharing results with a teammate

  1. Copy the job ID from the blue info box (shown during processing)
  2. Share the job ID with your teammate
  3. They can paste it into the "Load Results by Job ID" section
  4. Results will load (requires authentication with same portal password)

Note: Job history persists in your browser's localStorage and survives page refreshes and browser restarts. However, job data on the backend is temporary and will be lost if the backend server restarts.

Results Table Layout

The frontend dashboard displays processed leads in a structured table with the following columns:

Column Width Description
Name 150px Full name of the LinkedIn profile
Company 150px Company name + website URL (stacked display)
Email 180px Email address from profile (or "Not Available")
Phone 130px Phone number from profile (or "N/A")
Country 120px Country from profile (or "N/A")
Job Location 200px Current job location from profile (or "N/A")
Title 120px Job title/headline
ICP Fit 100px High/Medium/Low badge (color-coded)
ICP Reason 500px Detailed explanation of ICP evaluation
Validation 120px Correct/Incorrect/Unsure badge (color-coded)
Validation Reason 300px Explanation of validation assessment
Profile URL 120px Full clickable LinkedIn profile URL
Followers 100px Number of followers (right-aligned, comma-separated)
Connections 100px Number of connections (right-aligned, comma-separated)

Key Features:

  • Company column: Shows company name on first line, website URL on second line (if available)
    • Website extracted from company API: website, websiteUrl, or basic_info.website
    • Clickable blue link (12px font size)
  • Email column: Displays email address extracted from LinkedIn profile data
    • Shows "Not Available" when profile has no email
    • Positioned after Company, before Phone for logical contact information grouping
  • Phone column: Displays phone number extracted from profile data (mobileNumber field)
    • Shows "N/A" when profile has no phone number
  • Country column: Displays country from profile (addressCountryOnly field)
    • Shows "N/A" when country data is unavailable
  • Job Location column: Displays current job location from profile (jobLocation field)
    • Shows "N/A" when location data is unavailable
    • Examples: "Bengaluru, Karnataka, India", "San Francisco Bay Area"
  • Profile URL column: Displays full URL as clickable text (for easy bulk copying)
  • Followers column: Displays follower count from profile (followers field)
    • Right-aligned with comma separators (e.g., "11,147")
    • Shows "0" when follower data is unavailable
  • Connections column: Displays connection count from profile (connections field)
    • Right-aligned with comma separators (e.g., "7,453")
    • Shows "0" when connection data is unavailable
  • Optimized widths: ICP Reason (2.5x wider), Title (0.4x narrower), Validation Reason (1.5x wider) for better readability
  • Color-coded badges: Green (High/Correct), Yellow (Medium/Unsure), Red (Low/Incorrect)

Workflow Steps

All automation steps are visible in backend/workflow.py:

From Post Reactors Workflow:

  1. Fetch Post Reactions: Get reactors from the LinkedIn post
  2. For each reactor:
    • Check Airtable: Skip profiles already in the database
    • Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
    • Enrich Company: Fetch company information with fallback (Apify)
    • Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
    • Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
    • Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
    • Store: Save/update record in Airtable

Manual Input Workflow:

  1. Parse Profile URLs: Validate and clean provided LinkedIn profile URLs
  2. For each profile URL:
    • Extract Profile ID: Use LinkedIn username from URL as URN (e.g., "priteshkr" from "linkedin.com/in/priteshkr/")
    • Check Airtable: Skip profiles already in the database (using profile ID as URN)
    • Enrich Profile: Fetch detailed LinkedIn profile data (Apify)
    • Enrich Company: Fetch company information with fallback (Apify)
    • Summarize: Generate digestible summaries (Groq Llama 3.3 70B)
    • Evaluate ICP: Assess if lead matches your ICP (OpenAI GPT-5 mini with high reasoning)
    • Validate ICP: Quality check on ICP evaluation (Groq openai/gpt-oss-20b)
    • Store: Save/update record in Airtable with profile ID as URN

Note: Processing is limited to 100 profiles per batch (both workflows) to prevent API overload and avoid LinkedIn rate limits. This limit can be adjusted in backend/workflow.py by changing the MAX_REACTORS_PER_POST constant.

URN Format Difference:

  • Post Reactors: Uses Apify's URN field (e.g., urn:li:person:123456789)
  • Manual Input: Uses LinkedIn profile ID from URL (e.g., priteshkr from linkedin.com/in/priteshkr/)
  • This ensures consistent and predictable URNs for manually added profiles

Timeout Handling

Per-Profile Timeout (180 seconds):

  • Each profile has a 180-second (3-minute) timeout for all processing steps combined
  • If processing exceeds the timeout, the profile is automatically skipped
  • Batch processing continues immediately to the next profile (no blocking)
  • Skipped profiles are tracked separately with the reason for skipping

Common Skip Reasons:

  • ⏱️ "Processing exceeded 180s timeout" - Profile took too long to process
  • πŸ”Œ "Network error during profile fetch" - API connectivity issues
  • ⚠️ "API error: [specific message]" - Upstream service errors
  • ❌ "Could not fetch profile data" - Profile not accessible or invalid

Skipped Profiles Display:

  • Skipped profiles appear in a separate table below successful leads
  • Table shows: Profile Name, Skip Reason, Profile URL link
  • Orange/yellow color scheme distinguishes skipped from successful profiles
  • Allows manual review of profiles that need attention

Why 180 seconds?

  • Profile processing involves 7+ API calls (LinkedIn scraping, company data, multiple LLM calls)
  • Some LinkedIn profiles/companies are slow to scrape (30-60s each)
  • Conservative timeout ensures legitimate slow responses complete
  • Prevents indefinite hangs while allowing most profiles to succeed

Configuration:

  • Timeout can be adjusted in backend/workflow.py: PROFILE_TIMEOUT_SECONDS = 180
  • No individual API timeouts - simpler implementation with per-profile wrapper only

Customization

Modify ICP Criteria

Edit the ICP_EVALUATION_PROMPT in backend/prompts.py to customize what defines your ideal customer.

Modify LLM Prompts

All LLM prompts are centralized in backend/prompts.py:

  • PROFILE_SUMMARY_SYSTEM_PROMPT - How to summarize LinkedIn profiles
  • COMPANY_SUMMARY_SYSTEM_PROMPT - How to summarize company data
  • ICP_EVALUATION_PROMPT - How to evaluate lead fit

The workflow passes complete raw JSON from Apify directly to AI models using json.dumps(data, indent=2) for better context and accuracy - no helper functions or pre-formatting.

Add/Remove Steps

All workflow steps are in backend/workflow.py. You can easily:

  • Add new data sources
  • Skip certain steps
  • Modify the evaluation logic
  • Add custom enrichment

Add More Automations

The structure is flexible to accommodate additional automations. Create new workflow files following the same pattern.

API Endpoints

  • GET / - Health check
  • POST /api/process-post - Process a LinkedIn post
    {
      "post_url": "7392508631268835328"
    }

External Services Used

  • Apify: LinkedIn data scraping
    • apimaestro~linkedin-post-reactions: Get post reactions
    • anchor~linkedin-profile-enrichment: Profile details
    • logical_scrapers~linkedin-company-scraper: Company data (primary)
    • apimaestro~linkedin-company-detail: Company data (backup)
  • Airtable: Lead database and CRM
  • Groq: AI summarization (Llama 3.3 70B model)
  • OpenAI: ICP evaluation (GPT-5 mini via /v1/responses endpoint with high reasoning effort)

Development

Run in Development Mode

Backend:

cd backend
uvicorn main:app --reload --host localhost --port 8000

Frontend:

cd frontend
npm run dev

Testing Individual Components

Test each workflow component independently before running the full pipeline:

cd backend
python test_components.py

Available Tests:

  • Test Apify post reactions scraper
  • Test LinkedIn profile enrichment
  • Test company data scraping (primary and backup)
  • Test Groq AI summarization
  • Test OpenAI ICP evaluation
  • Test Airtable record creation
  • Test full end-to-end pipeline (single lead)

How to use:

  1. Edit backend/test_components.py
  2. Replace example URLs/IDs with real LinkedIn data
  3. Uncomment the specific tests you want to run
  4. Run the script

See detailed instructions in backend/test_components.py.

Build for Production

Frontend:

cd frontend
npm run build
npm run start

πŸ§ͺ TESTING INSTRUCTIONS

Follow these steps to test the application:

Step 1: Test Backend Server

  1. Start the backend server:

    cd backend
    python main.py
  2. Verify server is running:

    • You should see: INFO: Uvicorn running on http://localhost:8000
    • Open http://localhost:8000 in browser
    • You should see: {"message": "LinkedIn Lead Profiling API is running"}
  3. Check for errors:

    • βœ… No errors β†’ Backend is ready
    • ❌ Import errors β†’ Run pip install -r requirements.txt
    • ❌ Environment errors β†’ Check .env file exists and has all keys

Step 2: Test Frontend Server

  1. In a new terminal, start the frontend:

    cd frontend
    npm run dev
  2. Verify frontend is running:

    • You should see: - Local: http://localhost:3000
    • Open http://localhost:3000 in browser
    • You should see the LinkedIn Lead Profiling dashboard
  3. Check the UI:

    • βœ… Input form visible β†’ "Enter LinkedIn Post URL or ID"
    • βœ… "Process Post" button visible β†’ UI is ready

Step 3: Test with a LinkedIn Post

  1. Find a LinkedIn post with reactions:

    • Go to LinkedIn and find any post with reactions
    • Copy the post URL or ID (e.g., 7392508631268835328)
  2. Process the post:

    • Paste the URL/ID in the input field
    • Click "Process Post"
    • Watch the backend terminal for logs
  3. Expected behavior:

    Backend logs should show:
    ====================================
    STARTING LINKEDIN POST PROCESSING
    Post ID: 7392508631268835328
    ====================================
    
    STEP 1: Fetching post reactions...
    βœ“ Fetched 6 reactions from post...
    
    --- Processing Reactor 1/6: John Doe ---
    STEP 2a: Checking Airtable...
    β†’ New profile. Proceeding with enrichment...
    ...
    
  4. Check results:

    • βœ… Progress bar updates
    • βœ… Results table appears with processed leads
    • βœ… ICP fit strength shown (High/Medium/Low)
    • βœ… "View Profile" links work

Step 4: Verify Airtable Integration

  1. Open your Airtable base
  2. Check the "Leads" table
  3. Verify:
    • βœ… New records created
    • βœ… All fields populated (name, title, ICP fit, etc.)
    • βœ… URN field contains LinkedIn URN
    • βœ… Profile URL is clickable

Troubleshooting

Frontend can't connect to backend:

  • Make sure backend is running on localhost:8000
  • Check browser console for errors
  • Verify next.config.js proxy configuration

Backend API errors:

  • Check .env file has all required tokens
  • Verify Apify token is valid
  • Verify Airtable base ID and table name are correct
  • Check backend terminal for specific error messages

No results returned:

  • Check if post ID is valid (should be a numeric string)
  • Verify Apify can access the LinkedIn post
  • Check rate limits on Apify account

Airtable errors:

  • Verify table schema matches exactly (field names are case-sensitive)
  • Check Airtable token has read/write permissions
  • Ensure base ID is correct

Quick Test with Provided Data

Use this test post ID from the reference: 7392508631268835328

Expected result: 6 reactors should be processed and appear in results table.

License

Private - Not for distribution

Notes

  • Keep your .env files secure and never commit them
  • The automation respects rate limits of external services
  • Profile data is cached in Airtable to avoid redundant API calls
  • All logs are printed to console for transparency

Releases

No releases published

Packages

 
 
 

Contributors