Indeed offers

This project aims to collect indeed job offers and store it in a database.

We will be using python and the selenium package to scrap data from the indeed website and store the data in a PostgreSQL database.

It is worth noting that this project has been built using linux.

Documentation

Python documentation

Installation

Getting started with PostgreSQL and setting up service connection file

Populate sql tables, functions, triggers

export PGSERVICEFILE=.pg_service.conf

Populate the tables :

psql "service=offers" < sql/tables/create_tables.sql

Populate the functions :

psql "service=offers" < sql/functions/scraped_stats.sql 
psql "service=offers" < sql/functions/update_timestamp_on_updates.sql

Populate the triggers :

psql "service=offers" < sql/triggers/trigger_jobs.sql

Create a python virtual environment

Create a venv environment and activate it :

python3 -m venv venv
source venv/bin/activate

Install the dependencies :

pip3 install -r requirements.txt

Few things to notice before launching indeed_scraping.py script :

In the __name__ == '__main__' part : If you don't want to open the google chrome browser, set the boolean to True :

soup_list = scrap_offer_indeed(list_keyword, offer_age, indeed_country, False)
"""
...
...
"""
df_description = scrap_indeed_description(id_offers_to_scrap, url_offers_to_scrap, False)

You can also set up the amount of scraping you want to do in the for i in range(10): You can edit the number in the for loop

Launch the indeed_scraping.py script :

python3 indeed_scraping.py

Get jobs table stats :

psql "service=offers" -c "SELECT scraped_stats();"

Output :

{
    "scraped": 30,
    "total_jobs": 977,
    "scrap_progress": 0.03,
    "to_scrap": 947
}

Get jobs table output :

psql "service=offers" -c "SELECT id, job_title, company_name, company_rating, scraped_at, url from jobs limit 1"

Ouput :

id	job_title	company_name	company_rating	scraped_at	url
5143d988833fff26	Procurement Data Analyst	PCL Construction	3.8	2023-01-03 17:38:41	https://ca.indeed.com/viewjob?jk=5143d988833fff26

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
sql		sql
.gitignore		.gitignore
.pg_service_sample.conf		.pg_service_sample.conf
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
indeed_scraping.py		indeed_scraping.py
index.html		index.html
postgresql_credentials_sample.json		postgresql_credentials_sample.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indeed offers

Documentation

Installation

Populate sql tables, functions, triggers

Create a python virtual environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Indeed offers

Documentation

Installation

Populate sql tables, functions, triggers

Create a python virtual environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages