TCC-DataScraper

Files used to compile monthly peacekeeping contribution statistics from PDF's, manipulate into MySQL database, and analyze in R.

Step 1 - Run scraper.py through scraperwiki data store and export extracted data as sqlite db into folder Step 2 - run sqlite_csv.py to format and generate gender.csv Step 3.1 - remove commas for "Tanzania,", "Moldova," and "Macedonia," Step 3.2 - replace blank entries with 0's step 3.3 - fix tccIso3Num entries with leading 0's Step 3.4 - insert "," to assis in changing to sql statement Step 3.5 - Reformat csv file as SQL load statement and load into gender table in main peacekeeping db *** INSERT INTO gender (date, dateString, tcc, tccIso3Alpha,tccIso3Num, mission, ip_M, ip_F, ip_T, fpu_M, fpu_F, fpu_T, eom_M, eom_F, eom_T, troops_M, troops_F, troops_T) VALUES (...), (...) Note-strings enclosed by '' lines separated by ,/n Step 4 - Run statements in sql_script.txt for each date extracted and export to csv's on desktop with minimum date extracted in the @extracted_date field Step 5 - convert contribution ints where value=0 to NA step 6 - run R script.R step 7 - upload contents of tcc_files to web server via ftp into documents folder (archive previous month's files into archived folder) step 8 - format to JSON schema by hand in text wrangler so that data.json.csv conforms to schema contained in tcc_schema...insert date objects to the end of tcc.json and upload into appropriate directory in webserver step 9 -

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ISO_Dictionaries		ISO_Dictionaries
tcc_files		tcc_files
0-scraper.py		0-scraper.py
1-sqlite_csv.py		1-sqlite_csv.py
2-sql_gender_load.txt		2-sql_gender_load.txt
3-sql_script.txt		3-sql_script.txt
4-R script.R		4-R script.R
5-JSON_creator.py		5-JSON_creator.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TCC-DataScraper

About

Uh oh!

Releases

Packages

IPIDataLab/TCC-DataScraper

Folders and files

Latest commit

History

Repository files navigation

TCC-DataScraper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages