This project analyzes YouTube trending video data from major English-speaking countries. The data set includes information about likes, dislikes, views, comments, and categories. The project uses data wrangling, text mining, and time series analysis to address several business questions related to YouTube trending videos.
The project is divided into two primary phases: data wrangling and analysis. During the data wrangling phase, thorough cleaning and organization of the dataset are performed. This involves tasks such as removing duplicate entries, handling missing values, and ensuring appropriate data types are used. The subsequent analysis phase utilizes the refined dataset to provide insights into various aspects of YouTube trending videos.
The questions cover topics such as trending categories, the impact of disabling comments on video likes, factors influencing likes/dislikes/comments, the relationship between views and like rate over time, the likelihood of obtaining more views and likes based on categories, the influence of video games on violence, popular tags throughout the year, the pattern and relationship of view count over time, and prediction of high-quality videos.
By addressing these questions, the project aims to offer valuable insights and actionable findings to content creators, researchers, and anyone interested in understanding the dynamics of YouTube trending videos.
Tools used: R-studio
Programming Languages used: R