Skip to content

Files used to compile monthly peacekeeping contribution statistics from PDF's, manipulate into MySQL database, and analyze in R.

Notifications You must be signed in to change notification settings

IPIDataLab/TCC-DataScraper

Repository files navigation

TCC-DataScraper

Files used to compile monthly peacekeeping contribution statistics from PDF's, manipulate into MySQL database, and analyze in R.

Step 1 - Run scraper.py through scraperwiki data store and export extracted data as sqlite db into folder Step 2 - run sqlite_csv.py to format and generate gender.csv Step 3.1 - remove commas for "Tanzania,", "Moldova," and "Macedonia," Step 3.2 - replace blank entries with 0's step 3.3 - fix tccIso3Num entries with leading 0's Step 3.4 - insert "," to assis in changing to sql statement Step 3.5 - Reformat csv file as SQL load statement and load into gender table in main peacekeeping db *** INSERT INTO gender (date, dateString, tcc, tccIso3Alpha,tccIso3Num, mission, ip_M, ip_F, ip_T, fpu_M, fpu_F, fpu_T, eom_M, eom_F, eom_T, troops_M, troops_F, troops_T) VALUES (...), (...) Note-strings enclosed by '' lines separated by ,/n Step 4 - Run statements in sql_script.txt for each date extracted and export to csv's on desktop with minimum date extracted in the @extracted_date field Step 5 - convert contribution ints where value=0 to NA step 6 - run R script.R step 7 - upload contents of tcc_files to web server via ftp into documents folder (archive previous month's files into archived folder) step 8 - format to JSON schema by hand in text wrangler so that data.json.csv conforms to schema contained in tcc_schema...insert date objects to the end of tcc.json and upload into appropriate directory in webserver step 9 -

About

Files used to compile monthly peacekeeping contribution statistics from PDF's, manipulate into MySQL database, and analyze in R.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published