GitHub - AirLoft/web-scraper: nodejs scripts that scrape tech news from mainstream media

run npm install to install dependent modules, then run nodemon <filename> to start listening port 3000.

Support media source

douban(豆瓣), cnode

Main idea

run nodejs script on rented server whenever pull request reaches, then scrape curated tech news link and brief description from media and return back in JSON.

Challenges

How "curated"? use tagging and classfify the article by keywords, natural language processing!
How "briefly summerize an article"? use given description as much as possible, while try to abstract "key sentences" from articles? Should I go this way?
How to handle "pull request"? Run the scripts on server restlessly? Nope! But only running when received request will give a long reaction time. Use pointer! Avoid direct object comparison but use pointer comparison!

Reference

Hand-on experience on AWS elastic beanstalk

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitattributes		.gitattributes
cnode.js		cnode.js
douban.js		douban.js
index.html		index.html
package.json		package.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Support media source

Main idea

Challenges

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Support media source

Main idea

Challenges

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages