Abstract
To balance the speed and the precise of phishing website detection, a phishing website detection method based on logistic regression and eXtreme gradient boosting (XGBoost) was proposed. The HTML features, the uniform resource locator (URL) features and the text vector features based on the term frequency-inverse document frequency (TF-IDF) were extracted according to the URL of the webpage. The high-dimensional and the sparse text features were converted into probabilistic features by using logistic regression. Based on these fusion features, a XGBoost classification model was constructed, and the time complexity analysis of the method was given. The real data were collected as the experimental data set. The experimental results show that the logistic regression method reduces the dimension of the fusion feature. The detection speed of the method is faster than that of the direct fusion method. The fusion features method contains more effective information than the unilateral feature method for the classifier to learn. The precision of the method is higher than that of the unilateral feature method. The precision is 96.67% and the recall is 96.6%.
| Translated title of the contribution | Phishing website detection method based on logistic regression and XGBoost |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 207-212 |
| Number of pages | 6 |
| Journal | Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition) |
| Volume | 49 |
| Issue number | 2 |
| DOIs | |
| State | Published - 20 Mar 2019 |
| Externally published | Yes |