Principal curve algorithms for partitioning high-dimensional data spaces

Junping Zhang; Xiaodan Wang; Uwe Kruger; Fei-Yue Wang

doi:10.1109/TNN.2010.2100408

Principal curve algorithms for partitioning high-dimensional data spaces

IEEE Trans Neural Netw. 2011 Mar;22(3):367-80. doi: 10.1109/TNN.2010.2100408. Epub 2010 Dec 30.

Authors

Junping Zhang¹, Xiaodan Wang, Uwe Kruger, Fei-Yue Wang

Affiliation

¹ Shanghai Key Laboratory of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai 200433, China. jpzhang@fudan.edu.cn

PMID: 21193373
DOI: 10.1109/TNN.2010.2100408

Abstract

Most partitioning algorithms iteratively partition a space into cells that contain underlying linear or nonlinear structures using linear partitioning strategies. The compactness of each cell depends on how well the (locally) linear partitioning strategy approximates the intrinsic structure. To partition a compact structure for complex data in a nonlinear context, this paper proposes a nonlinear partition strategy. This is a principal curve tree (PC-tree), which is implemented iteratively. Given that a PC passes through the middle of the data distribution, it allows for partitioning based on the arc length of the PC. To enhance the partitioning of a given space, a residual version of the PC-tree algorithm is developed, denoted here as the principal component analysis tree (PCR-tree) algorithm. Because of its residual property, the PCR-tree can yield the intrinsic dimension of high-dimensional data. Comparisons presented in this paper confirm that the proposed PC-tree and PCR-tree approaches show a better performance than several other competing partitioning algorithms in terms of vector quantization error and nearest neighbor search. The comparison also shows that the proposed algorithms outperform competing linear methods in total average coverage which measures the nonlinear compactness of partitioning algorithms.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Artificial Intelligence*
Data Interpretation, Statistical
Decision Trees
Models, Neurological
Neural Networks, Computer*
Nonlinear Dynamics*
Pattern Recognition, Automated / methods
Principal Component Analysis* / methods