This paper proposes a lightweight Bidirectional Long Short Term Memory (BiLSTM) based phishing URL detection framework for mitigating the threats posed by phishing attacks. The proposed framework initially splits the URLs into four distinct components namely, Protocol type, do- main, sub-domain and top level domain(TLD) using a set of special characters as delimiters. Different feature values corresponding to these four components are then used to build a vocabulary database of the URL corpus. Thereafter, a customized FastText word embedding technique is used to learn numeric feature vector representations of the tokens(URL features) present in the vocabulary database. These learned feature vectors, along with the pre-processed instances of the URL corpus are then provided as input to train a BiLSTM based classifier model for detection of malicious phishing URLs. Experimental results on a proprietary URL corpus comprising 200,000 normal and phishing URL instances show that the proposed framework achieves high accuracy in detecting malicious phishing URLs with minimal computational overhead.
Mangy007/PhishDetect
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|