Data-Science

UB CSE 587 - Data Intensive Computing

Lab 1: Developed a data collection application that collects data based on input hashtag using twitter’s SEARCH API and outputs plots of tweets grouped by location (using Google API) on a map and also returns nearby trending topics when an input location is given (R, twitteR).

Lab 2: Created an application that extracts and repurposes given data of Kaggle’s European Soccer Data by extracting only the required data from the database and using it to answer questions related to the database. This application also converts and transforms raw data which is based on Pew Research Center’s study about Gaming, Jobs and Broadband. Questions based on the study are taken and answered through the program in the form of graphs and plots which combine multiple meaningful information about the data (R, dplyr, sqlite).

Lab 3: Analysis and Prediction of various given datasets using Linear Modeling, K-nn classification and K-means clustering algorithms. (R)

Lab 4 & 5: Performed Word count, Lemmatization and Word Co-occurrence (n-gram) on given 400+ Latin Text documents efficiently using MapReduce algorithm in Hadoop and Spark. (Java, Hadoop, Spark)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Lab1		Lab1
Lab2		Lab2
Lab3		Lab3
Lab4		Lab4
Lab5		Lab5
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Science

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

muthupal007/Data-Science

Folders and files

Latest commit

History

Repository files navigation

Data-Science

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages