HOT: Hypergraph-based outlier test for categorical data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

As a widely used data mining technique, outlier detection is a process which aims at finding anomalies with good explanations. Most existing methods are designed for numeric data. They will have problems with real-life applications that contain categorical data. In this paper, we introduce a novel outlier mining method based on a hypergraph model. Since hypergraphs precisely capture the distribution characteristics in data subspaces, this method is effective in identifying anomalies in dense subspaces and presents good interpretations for the local outlierness. By selecting the most relevant subspaces, the problem of "curse of dimensionality" in very large databases can also be ameliorated. Furthermore, the connectivity property is used to replace the distance metrics, so that the distance-based computation is not needed anymore, which enhances the robustness for handling missing-value data. The fact, that connectivity computation facilitates the aggregation operations supported by most SQL-compatible database systems, makes the mining process much efficient. Finally, experiments and analysis show that our method can find outliers in categorical data with good performance and quality.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
EditorsKyu-Young Wang, Jongwoo Jeon, Kyuseok Shim, Jaideep Srivastava
PublisherSpringer Verlag
Pages399-410
Number of pages12
ISBN (Electronic)3540047603, 9783540047605
DOIs
StatePublished - 2003
Externally publishedYes
Event7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 - Seoul, Korea, Republic of
Duration: 30 Apr 20032 May 2003

Publication series

NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2637
ISSN (Print)0302-9743

Conference

Conference7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003
Country/TerritoryKorea, Republic of
CitySeoul
Period30/04/032/05/03

Fingerprint

Dive into the research topics of 'HOT: Hypergraph-based outlier test for categorical data'. Together they form a unique fingerprint.

Cite this