Abstract
In this paper, we address the problem of probability estimation of decision trees. This problem has received considerable attention in the areas of machine learning and data mining, and techniques to use tree models as probability estimators have been suggested. We make a comparative study of six well-known class probability estimation methods, measured by classification accuracy, AUC and Conditional Log Likelihood (CLL). Comments on the properties of each method are empirically supported. Our experiments on UCI data sets and our liver disease data sets show that the PETs algorithms outperform traditional decision trees and naïve Bayes significantly in classification accuracy, AUC and CLL respectively. Finally, a unifying pseudocode of algorithm is summarized in this paper.
| Original language | English |
|---|---|
| Pages (from-to) | 71-80 |
| Number of pages | 10 |
| Journal | WSEAS Transactions on Computers |
| Volume | 10 |
| Issue number | 3 |
| State | Published - Mar 2011 |
| Externally published | Yes |
Keywords
- AUC
- Classification
- Conditional log likelihood
- Decision trees
- Joint distribution
- Probability estimation tree