TY - GEN
T1 - Finding dependency trees from binary data
AU - Chaofeng, Sha
AU - Dao, Tao
AU - Aoying, Zhou
AU - Weining, Qian
PY - 2008
Y1 - 2008
N2 - Much work has been done in finding interesting subsets of items, since it has broad applications in financial data analysis, e-commerce, text data mining, and so on. Though the well-known frequent pattern mining attracted much attention in research community, recently, more work has been devoted to analysis of more sophisticated relationships among items. Chow-Liu tree and low-entropy tree, for example, were used to summarize the frequent patterns. In this paper, we consider finding a novel dependency tree from binary data. It has several advantages over previous related work. Firstly, we propose a novel distance measure between items based on information theory, which captures the expected uncertainty in the item pairs and the mutual information between them. Based on this distance measure, we present a simple yet efficient algorithm for finding the dependency trees from binary data. We also show how our new approach can find applications in frequent pattern summarization. Our running example on synthetic dataset shows that our approach achieves good results compared to existing popular heuristics.
AB - Much work has been done in finding interesting subsets of items, since it has broad applications in financial data analysis, e-commerce, text data mining, and so on. Though the well-known frequent pattern mining attracted much attention in research community, recently, more work has been devoted to analysis of more sophisticated relationships among items. Chow-Liu tree and low-entropy tree, for example, were used to summarize the frequent patterns. In this paper, we consider finding a novel dependency tree from binary data. It has several advantages over previous related work. Firstly, we propose a novel distance measure between items based on information theory, which captures the expected uncertainty in the item pairs and the mutual information between them. Based on this distance measure, we present a simple yet efficient algorithm for finding the dependency trees from binary data. We also show how our new approach can find applications in frequent pattern summarization. Our running example on synthetic dataset shows that our approach achieves good results compared to existing popular heuristics.
UR - https://www.scopus.com/pages/publications/52049124828
U2 - 10.1109/CIT.2008.Workshops.92
DO - 10.1109/CIT.2008.Workshops.92
M3 - 会议稿件
AN - SCOPUS:52049124828
SN - 9780769533391
T3 - Proceedings - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
SP - 80
EP - 85
BT - Proceedings - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
T2 - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
Y2 - 8 July 2008 through 11 July 2008
ER -