Scrapper

Description

Scrapper is a node.js script which efficiently fetches all hyperlinks in a website. It throttles the requests made to website at max 5 requests at a time.

Install nodejs
- For mac and windows download the installer from https://nodejs.org/en/download/
- For Linux run this command in terminal
```
  sudo apt-get install nodejs
```
Clone the repository.
Move inside the project root path i.e, "scrapper/".
Run this command in terminal
```
npm install 
```
This will install all the node dependencies for the project.

Inside project root path edit the file config.json as

 {
 "baseUrl":<target website address>, 
 "domainName": <target website domain name>,
 "csvFilePath":<path of csv file>
 }

Default to get started:

 {
 "baseUrl": "https://medium.com",
 "domainName": "medium.com",
 "csvFilePath": "links.csv"
 }

Start the script. Inside project root path "scrapper/" run these commands in terminal,

For script designed with async run,
```
 node withAsync.js
```
For script designed without async run,
```
 node withoutAsync.js
```
In order to stop the script press Enter key while the script is running. The script will write the fetched urls in csv file and then exit.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
helpers		helpers
.gitignore		.gitignore
README.md		README.md
config.json		config.json
package.json		package.json
withAsync.js		withAsync.js
withoutAsync.js		withoutAsync.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapper

Description

Table of contents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scrapper

Description

Table of contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages