Skip to content

YassinAnalytics/tiktok-engagement-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TikTok Engagement Analysis

๐Ÿ” Identifying key drivers of video performance (likes, shares, views)
๐Ÿ“Š Python + SQL + PowerBI | Kaggle Dataset | 19K videos analyzed

Python Pandas PowerBI

What drives engagement on TikTok? A data dive into 19K videos

๐Ÿ” Key Findings

  • Likes account for 82% of total engagement
  • Video duration doesn't impact performance
  • Shares are the #2 engagement driver (16%)

๐Ÿ› ๏ธ Tech Stack

  • Python (Pandas, Seaborn, matplotlib)
  • SQL (pandasql)
  • Power BI

๐Ÿ“‚ Dataset

Kaggle TikTok Video Metrics
Kaggle TikTok Video Metrics

Data cleaning

image

image

image

There are 298 rows empty. They will be deleted. The cleaning will be done with dropna function

image

Check again:

image

image

Save the file and import the second one. I noticed that second file contains the exact same data as the one we just cleaned. We keep only the first file.

Now the file is cleaned, letโ€™s do some analysis.

Analysis

image

Letโ€™s first see the distribution:

image

The biggest repartition is with the views and the likes. Is it possible to find some extreme values. Itโ€™s same for the share count but with a lower gap. Letโ€™s see the histograms for the repartition of each metrics:

image

image

We can observe that the big majority of the videos have a very low number of views. And thatโ€™s view videos have a large number. Itโ€™s the same trends for all the other metrics. The only difference is that a progressive decrease is visible.

image

image

image

image

image

For the duration repartition, we can observe that the peak of number of videos are at 5 seconds, 16 seconds, 28 seconds, 38 seconds, 50 seconds and 60 seconds for similar numbers. Then for the rest of the duration in the between itโ€™s more or less the same numbers.

Now letโ€™s see the correlation:

image

Itโ€™s possible to observe that some correlations are strong and other are weak:

  • Duration has very low correlation with all metrics
  • The view count has medium correlation with the number of comments, of download, share and strong correlation with the likes
  • The like has strong correlation with the view, share and download. And a correlation with comment.
  • The share has strong correlation with the likes, and correlation with view, download and comment
  • The download has a strong correlation with the likes and comment and a correlation with view and share
  • The comment has strong correlation with the download and correlation with share, likes and view

Now letโ€™s calculate the engagement rate: (all metrics added and divided by the number of view)

image

Letโ€™s observe how the duration and view numbers are represented:

image

We can see that for all duration there is many views numbers. So, the duration of the video doesnโ€™t impact it.

Analysis with SQL

Now that we have saw some trends thanks to python. Letโ€™s do an exploration using SQL. Firstly, install pandasql.

image

Then the exploration can start. Top 10 most viewed video:

image

Global average engagement rate:

image

The global average engagement rate is 34%.

Average length of video:

image

The average length of video is 32.42 seconds.

Number of videos with more than million views:

image

There are no video above 1 million views in this dataset.

Top 5 video with best engagement rate:

image

Some vide have 93/94% of engagement rate which is extremely high.

Average length of video per slice of views:

image

No matter the view range, the average duration is similar.

Average engagement rate per slice of views:

image

Above 10 000 views the engagement rate is similar 39/40%. When less, itโ€™s a bit less, around 27%. But there is more video with less views than the other, so this is lowering the global average

#Key finding

  • Likes : 82% of total engagement. Strong correlation
  • Shares: 16% of total engagement, Moderate correlation
  • Comments: 2% of total engagement, Weak correlation
  • Duration: No significant impact, Negligible correlation

Now letโ€™s visualize on Power BI

PowerBi vizualisation

image

Delete โ€œ# โ€œcolumn, for the count, change type to who number, Creation of the following measures:

  • Engagement Rate
  • Average Video Duration
  • Total Views
  • Total Likes
  • Total Comments
  • Total Shares
  • Total Downloads
  • Average Engagement Rate per Video
  • View Category
  • Top Viewed Videos
  • Engagement by View Range
  • Total Video Duration
  • Total Videos
  • Top Engaged Videos
  • Average Share Rate

image

Insights

In this visualization, at the top of it, the main data are displayed: number of videos, the average view, average duration of the video, the engagement rate (any engagement) and the total of views. In the dataset there was 19 000 videos for a total of 5 billion views. Then, we can see that in terms of engagement, the ratio view/like is 25% (the likes being the engagement KPI the most used). 82% of the total engagement is from the likes, which is followed by the shares at 16%. The download and comments are a very low indicator for the engagement. We can see the duration of the video has not really an impact about the engagement. The engagement rate is similar at any duration of videos, even for the shortest and longest ones.

The top 10 video, have more or less the same number of views and like. But differ greatly in terms of comment, shares and downloads. The duration is also totally different, from short (7s) to average (30s) to long (almost a minute. This confirms the trend we mentioned before. And in terms of view by duration, there is not a specific impact because the views can be high or low no matter the duration of the video.

Conclusion

Likes (82%) are the KPI which impacts the most the engagement, followed by the shares (16%). The duration of the video doesnโ€™t have impact for the views or the engagement rate Limits and next steps: The dataset is a small sample of 19k videos No creator data or music data or category data Next steps could be to link the results with datasets containing creatorsโ€™ data, and determinate the virality of video, Integrate with TikTok API for real-time data

Files included

  • tiktok_viz.pdf : PowerBi vizualisation in PDF
  • tiktok_viz.pbix : PowerBi vizualisation in PBIX
  • tiktok_dataset : dataset Kaggle not used
  • tiktok_dataset2 : dataset Kaggle used
  • tiktok_dataset_cleaned : data set cleaned
  • tiktokdata.ipynb : Jupyter notebook

About

# TikTok Engagement Analysis ๐Ÿ” Identifying key drivers of video performance (likes, shares, views) ๐Ÿ“Š Python + SQL + PowerBI | Kaggle Dataset | 19K videos analyzed

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors