Skip to content

alich03/LinkedIn-Scrapping-using-Selenium

Repository files navigation

Here’s a comprehensive instruction guide for your LinkedIn scraping project using Selenium that you can include in your documentation or LinkedIn profile:


LinkedIn Scraping Project using Selenium

This project demonstrates how to effectively use Selenium, BeautifulSoup, and Python to extract LinkedIn profile data based on specific search criteria. It includes scripts for scraping LinkedIn profiles, extracting emails, handling proxies, and fetching Google Maps reviews for extended insights.


Project Structure

  1. extract_emails_from_website_url.py

    • Extracts email addresses from a given list of website URLs.
    • Uses regex and BeautifulSoup for accurate parsing.
  2. gmap_reviews_scrap.py

    • Fetches and parses Google Maps reviews for specific businesses or locations.
    • Implements Selenium to handle dynamic content.
  3. linked_profile_details.py

    • Scrapes detailed profile information from LinkedIn.
    • Extracts key details such as name, position, company, location, and education using BeautifulSoup and Selenium.
  4. linkedin_scrapper.py

    • Automates LinkedIn search functionality to fetch profiles based on user-inputted search terms.
    • Extracts public profile data efficiently while respecting LinkedIn’s usage policies.
  5. proxy_auth_plugin.zip

    • Configures Selenium WebDriver to work with authenticated proxies.
    • Useful for bypassing geographical restrictions or avoiding rate limits.
  6. test_proxy_selenium.py

    • Tests the functionality of proxy authentication with Selenium.
    • Ensures robust handling of proxies for scraping tasks.

How It Works

  1. Input Search Criteria

    • Users provide search terms (e.g., job title, location, or company) in the linkedin_scrapper.py.
  2. Automated Profile Search

    • Selenium automates the LinkedIn search functionality, navigates through search results, and collects profile links.
  3. Profile Data Extraction

    • linked_profile_details.py visits each profile URL and extracts publicly available details like:
      • Name
      • Job Title
      • Company
      • Location
      • Connections count
  4. Email Extraction

    • For profiles with external website links, the extract_emails_from_website_url.py script scans the websites for email addresses.
  5. Handling Dynamic Content

    • Selenium ensures dynamic loading content (e.g., pop-ups or infinite scrolling) is handled smoothly.
    • BeautifulSoup complements Selenium for detailed parsing of HTML content.
  6. Proxy Integration

    • The proxy_auth_plugin.zip and test_proxy_selenium.py scripts integrate proxy support to ensure seamless scraping across regions and prevent IP bans.

Key Features

  • Dynamic Content Handling: Combines Selenium for navigation and BeautifulSoup for parsing.
  • Email Extraction: Accurately extracts email addresses from associated websites.
  • Proxy Support: Handles rate-limiting and geographical restrictions with ease.
  • Scalable Design: Modular scripts can be extended to other platforms like Google Maps or Indeed.
  • Ethical Usage: Designed for collecting publicly available information only.

Setup and Usage

  1. Install Dependencies
    Ensure you have the following Python packages installed:

    pip install selenium beautifulsoup4 requests
  2. Set Up WebDriver
    Download the appropriate WebDriver (e.g., ChromeDriver) for your browser and place it in your PATH.

  3. Configure Proxies (Optional)
    Add proxy details in proxy_auth_plugin.zip or directly in test_proxy_selenium.py.

  4. Run the Scripts

    • Start with linkedin_scrapper.py to search profiles:
      python linkedin_scrapper.py
    • Use linked_profile_details.py to fetch detailed profile information.
  5. Output

    • The extracted data is saved in structured formats (e.g., CSV or JSON) for easy analysis.

Important Notes

  • Respect LinkedIn’s Terms of Service: Ensure your use case complies with ethical guidelines and LinkedIn’s policies.
  • Avoid Excessive Requests: Use delays and proxy rotation to prevent detection or account restrictions.
  • Scrape Only Public Data: Do not attempt to extract sensitive or non-public information.

Use Cases

  • Recruitment: Aggregate potential candidate profiles based on specific job titles and skills.
  • Marketing: Build targeted lists of professionals for outreach campaigns.
  • Data Analysis: Analyze trends in job markets, skills demand, or company growth.

Let me know if you’d like additional sections or details!

About

Scrapping Linkedin data using BeautifulSoup and Selenium

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages