๐ Identifying key drivers of video performance (likes, shares, views)
๐ Python + SQL + PowerBI | Kaggle Dataset | 19K videos analyzed
What drives engagement on TikTok? A data dive into 19K videos
- Likes account for 82% of total engagement
- Video duration doesn't impact performance
- Shares are the #2 engagement driver (16%)
- Python (Pandas, Seaborn, matplotlib)
- SQL (pandasql)
- Power BI
Kaggle TikTok Video Metrics
Kaggle TikTok Video Metrics
There are 298 rows empty. They will be deleted. The cleaning will be done with dropna function
Check again:
Save the file and import the second one. I noticed that second file contains the exact same data as the one we just cleaned. We keep only the first file.
Now the file is cleaned, letโs do some analysis.
Letโs first see the distribution:
The biggest repartition is with the views and the likes. Is it possible to find some extreme values. Itโs same for the share count but with a lower gap. Letโs see the histograms for the repartition of each metrics:
We can observe that the big majority of the videos have a very low number of views. And thatโs view videos have a large number. Itโs the same trends for all the other metrics. The only difference is that a progressive decrease is visible.
For the duration repartition, we can observe that the peak of number of videos are at 5 seconds, 16 seconds, 28 seconds, 38 seconds, 50 seconds and 60 seconds for similar numbers. Then for the rest of the duration in the between itโs more or less the same numbers.
Now letโs see the correlation:
Itโs possible to observe that some correlations are strong and other are weak:
- Duration has very low correlation with all metrics
- The view count has medium correlation with the number of comments, of download, share and strong correlation with the likes
- The like has strong correlation with the view, share and download. And a correlation with comment.
- The share has strong correlation with the likes, and correlation with view, download and comment
- The download has a strong correlation with the likes and comment and a correlation with view and share
- The comment has strong correlation with the download and correlation with share, likes and view
Now letโs calculate the engagement rate: (all metrics added and divided by the number of view)
Letโs observe how the duration and view numbers are represented:
We can see that for all duration there is many views numbers. So, the duration of the video doesnโt impact it.
Now that we have saw some trends thanks to python. Letโs do an exploration using SQL. Firstly, install pandasql.
Then the exploration can start. Top 10 most viewed video:
Global average engagement rate:
The global average engagement rate is 34%.
Average length of video:
The average length of video is 32.42 seconds.
Number of videos with more than million views:
There are no video above 1 million views in this dataset.
Top 5 video with best engagement rate:
Some vide have 93/94% of engagement rate which is extremely high.
Average length of video per slice of views:
No matter the view range, the average duration is similar.
Average engagement rate per slice of views:
Above 10 000 views the engagement rate is similar 39/40%. When less, itโs a bit less, around 27%. But there is more video with less views than the other, so this is lowering the global average
#Key finding
- Likes : 82% of total engagement. Strong correlation
- Shares: 16% of total engagement, Moderate correlation
- Comments: 2% of total engagement, Weak correlation
- Duration: No significant impact, Negligible correlation
Now letโs visualize on Power BI
Delete โ# โcolumn, for the count, change type to who number, Creation of the following measures:
- Engagement Rate
- Average Video Duration
- Total Views
- Total Likes
- Total Comments
- Total Shares
- Total Downloads
- Average Engagement Rate per Video
- View Category
- Top Viewed Videos
- Engagement by View Range
- Total Video Duration
- Total Videos
- Top Engaged Videos
- Average Share Rate
In this visualization, at the top of it, the main data are displayed: number of videos, the average view, average duration of the video, the engagement rate (any engagement) and the total of views. In the dataset there was 19 000 videos for a total of 5 billion views. Then, we can see that in terms of engagement, the ratio view/like is 25% (the likes being the engagement KPI the most used). 82% of the total engagement is from the likes, which is followed by the shares at 16%. The download and comments are a very low indicator for the engagement. We can see the duration of the video has not really an impact about the engagement. The engagement rate is similar at any duration of videos, even for the shortest and longest ones.
The top 10 video, have more or less the same number of views and like. But differ greatly in terms of comment, shares and downloads. The duration is also totally different, from short (7s) to average (30s) to long (almost a minute. This confirms the trend we mentioned before. And in terms of view by duration, there is not a specific impact because the views can be high or low no matter the duration of the video.
Likes (82%) are the KPI which impacts the most the engagement, followed by the shares (16%). The duration of the video doesnโt have impact for the views or the engagement rate Limits and next steps: The dataset is a small sample of 19k videos No creator data or music data or category data Next steps could be to link the results with datasets containing creatorsโ data, and determinate the virality of video, Integrate with TikTok API for real-time data
- tiktok_viz.pdf : PowerBi vizualisation in PDF
- tiktok_viz.pbix : PowerBi vizualisation in PBIX
- tiktok_dataset : dataset Kaggle not used
- tiktok_dataset2 : dataset Kaggle used
- tiktok_dataset_cleaned : data set cleaned
- tiktokdata.ipynb : Jupyter notebook



























