A hybrid approach to clustering in very large databases

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Current clustering methods always have such problems: 1) High I/O cost and expensive maintenance; 2) Pre-specifying the uncertain parameter k; 3) Lacking good efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a hybrid-clustering algorithm to solve these problems. It combines both distance and density strategies, and makes full use of statistics information while keeping good cluster quality. The experimental results show that our algorithm outperforms other popular algorithms in terms of efficiency, cost, and even get much more speedup as the data size scales up.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 5th Pacific-Asia Conference, PAKDD 2001, Proceedings
EditorsDavid Cheung, Graham J. Williams, Qing Li
PublisherSpringer Verlag
Pages519-524
Number of pages6
ISBN (Print)3540419101, 9783540419105
DOIs
StatePublished - 2001
Externally publishedYes
Event5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001 - Kowloon, Hong Kong
Duration: 16 Apr 200118 Apr 2001

Publication series

NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2035
ISSN (Print)0302-9743

Conference

Conference5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001
Country/TerritoryHong Kong
CityKowloon
Period16/04/0118/04/01

Fingerprint

Dive into the research topics of 'A hybrid approach to clustering in very large databases'. Together they form a unique fingerprint.

Cite this