Phishing website detection based on multidimensional features driven by deep learning

Peng Yang, Guangzhen Zhao, Peng Zeng

Research output: Contribution to journalArticlepeer-review

263 Scopus citations

Abstract

As a crime of employing technical means to steal sensitive information of users, phishing is currently a critical threat facing the Internet, and losses due to phishing are growing steadily. Feature engineering is important in phishing website detection solutions, but the accuracy of detection critically depends on prior knowledge of features. Moreover, although features extracted from different dimensions are more comprehensive, a drawback is that extracting these features requires a large amount of time. To address these limitations, we propose a multidimensional feature phishing detection approach based on a fast detection method by using deep learning. In the first step, character sequence features of the given URL are extracted and used for quick classification by deep learning, and this step does not require third-party assistance or any prior knowledge about phishing. In the second step, we combine URL statistical features, webpage code features, webpage text features, and the quick classification result of deep learning into multidimensional features. The approach can reduce the detection time for setting a threshold. Testing on a dataset containing millions of phishing URLs and legitimate URLs, the accuracy reaches 98.99%, and the false positive rate is only 0.59%. By reasonably adjusting the threshold, the experimental results show that the detection efficiency can be improved.

Original languageEnglish
Article number8610190
Pages (from-to)15196-15209
Number of pages14
JournalIEEE Access
Volume7
DOIs
StatePublished - 2019
Externally publishedYes

Keywords

  • Phishing website detection
  • convolutional neural network
  • long short-term memory network
  • machine learning
  • semantic feature

Fingerprint

Dive into the research topics of 'Phishing website detection based on multidimensional features driven by deep learning'. Together they form a unique fingerprint.

Cite this