Skip to content

Latest commit

 

History

History
58 lines (35 loc) · 2.07 KB

File metadata and controls

58 lines (35 loc) · 2.07 KB

Leader Behaviour Prediction

This project will deal with extracting and gathering information about the behaviour/ bad work (corresponding to predefined adjectives ) of a leader/ representative by constantly scraping news website.

We have converted the output to a JSON file.

Installing required libraries

sudo pip install requirements.txt

Scraping Times of India website

I have scraped Times Of India Website specially for this purpose.

The dataset got after scraping Times of India website

This dataset have the details of the scrapped article. We have to scrap the text and get the names. Then we have to match the details of the adjective with the matched names that is got.

The dataset is present in the path :

LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/newsTOI.sqlite

Dataset

Scraped names of the members of parliaments in US :

LeaderBehaviour/getUSNames/getUSNames/spiders/getUSNames.json

Scraped the names of the members of parliaments in India :

LeaderBehaviour/getIndianPolNames/getIndianPolNames/spiders/getIndianPolNames.json

Additional Objectives :

* used headers/ user-agent in scrapy.
* need to use proxy/ integrate with Tor to make it completely untraceable.

Possible name extraction from the extracted text :

LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/extractNamesTOI.py
LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/probable_names_extracted.json

Note

Go to the directory real_shit, then copy the scrapTOI.sqlite, then run *** python get_neg.py***.