URL-classification

URL Classification for Phishing Detection

A machine learning-based solution to identify and classify potentially malicious phishing URLs.

Overview

This project implements a URL classification system that helps protect users from phishing attacks by analyzing and classifying URLs as either legitimate or potentially malicious. The system uses machine learning techniques to identify common patterns and characteristics associated with phishing URLs.

Members

Quan Pham
Linh Nguyen
Phat Tran
Kien Le

Tech Stack

Python
Scikit-learn
Pandas
Pytorch
Golang
HTML/JS/CSS
AWS Lambda
AWS API Gateway
MongoDB
Docker

Features

Real-time URL analysis and classification
Detection of common phishing patterns and techniques
Machine learning model trained on extensive phishing and legitimate URL datasets
Feature extraction from URLs including:
- Domain characteristics
- URL structure analysis
- Special character frequency
- Length-based features
- TLD analysis

Installation

Clone the repository:

git clone https://github.com/LinhNguyen2901/URL-classification.git

How It Works

URL Preprocessing: Extracts and normalizes URL components
Feature Engineering: Analyzes various URL characteristics
Classification: Applies trained machine learning model to determine URL legitimacy
Result Output: Provides classification result with confidence score

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/URL-classification)
Commit your changes (git commit -m 'Add something')
Push to the branch (git push origin feature/URL-classification)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Dataset sources
Contributors
Research papers and references

Disclaimer

This tool is meant to assist in identifying potential phishing URLs but should not be relied upon as the sole means of protection. Always exercise caution when clicking on unknown links and maintain proper cybersecurity practices.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
ML		ML
ML_deploy		ML_deploy
Test		Test
back-end		back-end
events		events
front-end		front-end
.gitignore		.gitignore
README.md		README.md
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URL-classification

URL Classification for Phishing Detection

Overview

Members

Tech Stack

Features

Installation

How It Works

Contributing

License

Acknowledgments

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

LinhNguyen2901/URL-classification

Folders and files

Latest commit

History

Repository files navigation

URL-classification

URL Classification for Phishing Detection

Overview

Members

Tech Stack

Features

Installation

How It Works

Contributing

License

Acknowledgments

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages