基于Logistic回归和XGBoost的钓鱼网站检测方法

Translated title of the contribution: Phishing website detection method based on logistic regression and XGBoost

Peng Yang, Peng Zeng, Guangzhen Zhao, Peipei Lü

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

To balance the speed and the precise of phishing website detection, a phishing website detection method based on logistic regression and eXtreme gradient boosting (XGBoost) was proposed. The HTML features, the uniform resource locator (URL) features and the text vector features based on the term frequency-inverse document frequency (TF-IDF) were extracted according to the URL of the webpage. The high-dimensional and the sparse text features were converted into probabilistic features by using logistic regression. Based on these fusion features, a XGBoost classification model was constructed, and the time complexity analysis of the method was given. The real data were collected as the experimental data set. The experimental results show that the logistic regression method reduces the dimension of the fusion feature. The detection speed of the method is faster than that of the direct fusion method. The fusion features method contains more effective information than the unilateral feature method for the classifier to learn. The precision of the method is higher than that of the unilateral feature method. The precision is 96.67% and the recall is 96.6%.

Translated title of the contributionPhishing website detection method based on logistic regression and XGBoost
Original languageChinese (Traditional)
Pages (from-to)207-212
Number of pages6
JournalDongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition)
Volume49
Issue number2
DOIs
StatePublished - 20 Mar 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Phishing website detection method based on logistic regression and XGBoost'. Together they form a unique fingerprint.

Cite this