Skip to content

DiaaMohsen/wuzzuf_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawling posted Software Engineering jobs from https://wuzzuf.net

NOTE: Before crawling i checked robots.txt and there's no disallows

What does this spider specifacally do?

  • Get all aviable jobs links and crawl some of this vacancies fields: [job_title, job_url, posted_on, job_roles, keywords, company_name, company_location, company_website, company_industries]
  • There are other fields like job description and requirements and other fields
  • Write crawled results in csv file

To run this spider run:

  • For csv format: scrapy crawl jobs -o jobs.csv -t csv
  • For JL fomat: scrapt crawl jobs -o jobs.jl

I find these tutorials good ones to start with

ToDo:

  • Invest more time on Scrapy Framework and its concepts like pipelines and extractors

  • Read Scrapy's documentation

  • Add more fields for a job

  • Split jobs to be written on multiple files for example: jobs per company or jobs per industry or jobs per location

  • Index these crawled data into DB (would do it in another repo and use these spider)

  • Build a Job Recommendation Engine

  • Whenever something come to mind i'll add it to this list

About

Crawling posted jobs on wuzzuf.net

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages