A multi-step approach to time series analysis and gene expression clustering

Bioinformatics. 2006 Mar 1;22(5):589-96. doi: 10.1093/bioinformatics/btk026. Epub 2006 Jan 5.

Abstract

Motivation: The huge growth in gene expression data calls for the implementation of automatic tools for data processing and interpretation.

Results: We present a new and comprehensive machine learning data mining framework consisting in a non-linear PCA neural network for feature extraction, and probabilistic principal surfaces combined with an agglomerative approach based on Negentropy aimed at clustering gene microarray data. The method, which provides a user-friendly visualization interface, can work on noisy data with missing points and represents an automatic procedure to get, with no a priori assumptions, the number of clusters present in the data. Cell-cycle dataset and a detailed analysis confirm the biological nature of the most significant clusters.

Availability: The software described here is a subpackage part of the ASTRONEURAL package and is available upon request from the corresponding author.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Evaluation Study

MeSH terms

  • Artificial Intelligence
  • Cluster Analysis
  • Computer Graphics
  • Computer Simulation
  • Databases, Protein*
  • Gene Expression Profiling / methods*
  • Information Storage and Retrieval / methods*
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis / methods*
  • Proteins / metabolism*
  • Software*
  • Time Factors
  • User-Computer Interface*

Substances

  • Proteins