YouTube Comment Sentiment

An end-to-end project to predict the sentiment of YouTube video comments using Machine Learning.

Overview

This project focuses on building a sentiment analysis system for YouTube comments, complete with a FastAPI-based inference endpoint and insights-providing API endpoints. The development process included robust experimentation, tracking, and pipeline reproduction (using MLFlow and DVC).

Key Features

Inference Endpoint: Built using the FastAPI framework to classify sentiment of comments.
Insights Endpoints: Additional APIs to provide analytics around comment sentiments.
Experiment Tracking: Leveraged MLFlow for tracking experiments.
Pipeline Reproduction: Utilized DVC (Data Version Control) for reproducibility.
Text Vectorization: Used TfidfVectorizer for transforming text data into feature vectors.
Model Selection: Experimented with various models and selected HistGradientBoostingClassifier as the best-performing classifier.

Experimentation

The experimentation phase focused on optimizing hyperparameters for the TfidfVectorizer and HistGradientBoostingClassifier model. Below is a screenshot showcasing how different hyperparameter combinations impacted accuracy:

Tech Stack

Tech	Stack
Data Handling
Backend Tools
Machine Learning
Frontend
Dev Tools

Setup

Model Training

Model training code is located in backend directory.
```
cd backend
```
Use uv to sync project's dependencies as well as training dependencies.
```
uv sync --extra=training --compile-bytecode --locked --no-dev
```
You may want to update the params.yaml file before training:
- To update dataset source and column names.
- To update text vectorizer class or hyperparameters.
- To update model or hyperparameters.
(Optional) You may also want to set a Remote Tracking URI for MLFlow (i.e. MLFLOW_TRACKING_URI environment variable) so that the logs and artifacts will store there. I uses Dagshub for this. If you don't set it then mlruns directory will be created locally and get store there.
```
# https://dagshub.com/docs/integration_guide/mlflow_tracking
export MLFLOW_TRACKING_URI=<tracking-uri>
```
(Optional) You can also set experiment name using MLFLOW_EXPERIMENT_NAME environment variable.

Use DVC cli command to start training pipeline.

# https://dvc.org/doc/command-reference/repro
uv run dvc repro

After some time, your first model will be trained. Then, you can see the logs and artifacts using MLFlow UI.
```
uv run mlflow ui
```
This will start a server at http://localhost:5000 (by default).
Now that you know how to train a model you can re-train another model with different parameters and compare their metrics and params in MLFlow UI with intuitive charts and graphs. See Experimentation section.
After model comparision you should select a best model and register it to Model Registry, so that you can use it in Backend API Server.

Backend API Server

Set MLFLOW_MODEL_URI environment variable (specially for backend server).
```
export MLFLOW_MODEL_URI=<model-uri>
```
See mlflow.sklearn.load_model API reference to know how to get MLFLOW_MODEL_URI.

Sync dependencies using uv.

cd backend
uv sync --compile-bytecode --no-dev --locked

Start FastAPI server using fastapi-cli.
```
uv run fastapi run src/app.py
```
The server is started at http://localhost:8000 (by default).

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
frontend		frontend
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Comment Sentiment

Overview

Key Features

Experimentation

Tech Stack

Setup

Model Training

Backend API Server

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube Comment Sentiment

Overview

Key Features

Experimentation

Tech Stack

Setup

Model Training

Backend API Server

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages