UB CSE 587 - Data Intensive Computing
Lab 1: Developed a data collection application that collects data based on input hashtag using twitter’s SEARCH API and outputs plots of tweets grouped by location (using Google API) on a map and also returns nearby trending topics when an input location is given (R, twitteR).
Lab 2: Created an application that extracts and repurposes given data of Kaggle’s European Soccer Data by extracting only the required data from the database and using it to answer questions related to the database. This application also converts and transforms raw data which is based on Pew Research Center’s study about Gaming, Jobs and Broadband. Questions based on the study are taken and answered through the program in the form of graphs and plots which combine multiple meaningful information about the data (R, dplyr, sqlite).
Lab 3: Analysis and Prediction of various given datasets using Linear Modeling, K-nn classification and K-means clustering algorithms. (R)
Lab 4 & 5: Performed Word count, Lemmatization and Word Co-occurrence (n-gram) on given 400+ Latin Text documents efficiently using MapReduce algorithm in Hadoop and Spark. (Java, Hadoop, Spark)