Abstract
False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, s. It controls the possibility of missing frequent items using a reliability parameter 6. The importance of false-negative frequent items mining is that it can exclude falsepositives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.
| Original language | English |
|---|---|
| Pages (from-to) | 422-434 |
| Number of pages | 13 |
| Journal | Lecture Notes in Computer Science |
| Volume | 3453 |
| DOIs | |
| State | Published - 2005 |
| Externally published | Yes |
| Event | 10th International Conference on Database Systems for Advanced Applications, DASFAA 2005 - Beijing, China Duration: 17 Apr 2005 → 20 Apr 2005 |