Background: Although the emerging complementary DNA (cDNA) array technology holds great promise to discern complex patterns of gene expression, its novelty means that there are no well-established standards to guide analysis and interpretation of the data that it produces. We have used preliminary data generated with the CLONTECH Atlas human cDNA array to develop a practical approach to the statistical analysis of these data by studying changes in gene expression during the development of acquired tamoxifen resistance in breast cancer.
Methods: For hybridization to the array, we prepared RNA from MCF-7 human breast cell tumors, isolated from our athymic nude mouse xenograft model of acquired tamoxifen resistance during estrogen-stimulated, tamoxifen-sensitive, and tamoxifen-resistant growth. Principal components analysis was used to identify genes with altered expression.
Results and conclusions: Principal components analysis yielded three principal components that are interpreted as 1) the average level of gene expression, 2) the difference between estrogen-stimulated gene expression and the average of tamoxifen-sensitive and tamoxifen-resistant gene expression, and 3) the difference between tamoxifen-sensitive and tamoxifen-resistant gene expression. A bivariate (second and third principal components) 99% prediction region was used to identify outlier genes that exhibit altered expression. Two representative outlier genes, erk-2 and HSF-1 (heat shock transcription factor-1), were chosen for confirmatory study, and their predicted relative expression levels were confirmed in western blot analysis, suggesting that semiquantitative estimates are possible with array technology.
Implications: Principal components analysis provides a useful and practical method to analyze gene expression data from a cDNA array. The method can identify broad patterns of expression alteration and, based on a small simulation study, will likely provide reasonable power to detect moderate-sized alterations in clinically relevant genes.