GitHub - alich03/LinkedIn-Scrapping-using-Selenium: Scrapping Linkedin data using BeautifulSoup and Selenium

Here’s a comprehensive instruction guide for your LinkedIn scraping project using Selenium that you can include in your documentation or LinkedIn profile:

LinkedIn Scraping Project using Selenium

This project demonstrates how to effectively use Selenium, BeautifulSoup, and Python to extract LinkedIn profile data based on specific search criteria. It includes scripts for scraping LinkedIn profiles, extracting emails, handling proxies, and fetching Google Maps reviews for extended insights.

Project Structure

extract_emails_from_website_url.py
- Extracts email addresses from a given list of website URLs.
- Uses regex and BeautifulSoup for accurate parsing.
gmap_reviews_scrap.py
- Fetches and parses Google Maps reviews for specific businesses or locations.
- Implements Selenium to handle dynamic content.
linked_profile_details.py
- Scrapes detailed profile information from LinkedIn.
- Extracts key details such as name, position, company, location, and education using BeautifulSoup and Selenium.
linkedin_scrapper.py
- Automates LinkedIn search functionality to fetch profiles based on user-inputted search terms.
- Extracts public profile data efficiently while respecting LinkedIn’s usage policies.
proxy_auth_plugin.zip
- Configures Selenium WebDriver to work with authenticated proxies.
- Useful for bypassing geographical restrictions or avoiding rate limits.
test_proxy_selenium.py
- Tests the functionality of proxy authentication with Selenium.
- Ensures robust handling of proxies for scraping tasks.

How It Works

Input Search Criteria
- Users provide search terms (e.g., job title, location, or company) in the linkedin_scrapper.py.
Automated Profile Search
- Selenium automates the LinkedIn search functionality, navigates through search results, and collects profile links.
Profile Data Extraction
- linked_profile_details.py visits each profile URL and extracts publicly available details like:
  - Name
  - Job Title
  - Company
  - Location
  - Connections count
Email Extraction
- For profiles with external website links, the extract_emails_from_website_url.py script scans the websites for email addresses.
Handling Dynamic Content
- Selenium ensures dynamic loading content (e.g., pop-ups or infinite scrolling) is handled smoothly.
- BeautifulSoup complements Selenium for detailed parsing of HTML content.
Proxy Integration
- The proxy_auth_plugin.zip and test_proxy_selenium.py scripts integrate proxy support to ensure seamless scraping across regions and prevent IP bans.

Key Features

Dynamic Content Handling: Combines Selenium for navigation and BeautifulSoup for parsing.
Email Extraction: Accurately extracts email addresses from associated websites.
Proxy Support: Handles rate-limiting and geographical restrictions with ease.
Scalable Design: Modular scripts can be extended to other platforms like Google Maps or Indeed.
Ethical Usage: Designed for collecting publicly available information only.

Setup and Usage

Install Dependencies
Ensure you have the following Python packages installed:
```
pip install selenium beautifulsoup4 requests
```
Set Up WebDriver
Download the appropriate WebDriver (e.g., ChromeDriver) for your browser and place it in your PATH.
Configure Proxies (Optional)
Add proxy details in proxy_auth_plugin.zip or directly in test_proxy_selenium.py.
Run the Scripts
- Start with linkedin_scrapper.py to search profiles:
```
python linkedin_scrapper.py
```
- Use linked_profile_details.py to fetch detailed profile information.
Output
- The extracted data is saved in structured formats (e.g., CSV or JSON) for easy analysis.

Important Notes

Respect LinkedIn’s Terms of Service: Ensure your use case complies with ethical guidelines and LinkedIn’s policies.
Avoid Excessive Requests: Use delays and proxy rotation to prevent detection or account restrictions.
Scrape Only Public Data: Do not attempt to extract sensitive or non-public information.

Use Cases

Recruitment: Aggregate potential candidate profiles based on specific job titles and skills.
Marketing: Build targeted lists of professionals for outreach campaigns.
Data Analysis: Analyze trends in job markets, skills demand, or company growth.

Let me know if you’d like additional sections or details!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkedIn Scraping Project using Selenium

Project Structure

How It Works

Key Features

Setup and Usage

Important Notes

Use Cases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
extract_emails_from_website_url.py		extract_emails_from_website_url.py
gmap_reviews_scrap.py		gmap_reviews_scrap.py
linked_profile_details.py		linked_profile_details.py
linkedin_scrapper.py		linkedin_scrapper.py
proxy_auth_plugin.zip		proxy_auth_plugin.zip
test_proxy_selenium.py		test_proxy_selenium.py

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Scraping Project using Selenium

Project Structure

How It Works

Key Features

Setup and Usage

Important Notes

Use Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages