A machine learning-based solution to identify and classify potentially malicious phishing URLs.
This project implements a URL classification system that helps protect users from phishing attacks by analyzing and classifying URLs as either legitimate or potentially malicious. The system uses machine learning techniques to identify common patterns and characteristics associated with phishing URLs.
- Quan Pham
- Linh Nguyen
- Phat Tran
- Kien Le
- Python
- Scikit-learn
- Pandas
- Pytorch
- Golang
- HTML/JS/CSS
- AWS Lambda
- AWS API Gateway
- MongoDB
- Docker
- Real-time URL analysis and classification
- Detection of common phishing patterns and techniques
- Machine learning model trained on extensive phishing and legitimate URL datasets
- Feature extraction from URLs including:
- Domain characteristics
- URL structure analysis
- Special character frequency
- Length-based features
- TLD analysis
- Clone the repository:
git clone https://github.com/LinhNguyen2901/URL-classification.git
- URL Preprocessing: Extracts and normalizes URL components
- Feature Engineering: Analyzes various URL characteristics
- Classification: Applies trained machine learning model to determine URL legitimacy
- Result Output: Provides classification result with confidence score
- Fork the repository
- Create your feature branch (git checkout -b feature/URL-classification)
- Commit your changes (git commit -m 'Add something')
- Push to the branch (git push origin feature/URL-classification)
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset sources
- Contributors
- Research papers and references
This tool is meant to assist in identifying potential phishing URLs but should not be relied upon as the sole means of protection. Always exercise caution when clicking on unknown links and maintain proper cybersecurity practices.