An attempt for combining microarray data sets by adjusting gene expressions

Cancer Res Treat. 2007 Jun;39(2):74-81. doi: 10.4143/crt.2007.39.2.74. Epub 2007 Jun 30.

Abstract

Purpose: The diverse experimental environments in microarray technology, such as the different platforms or different RNA sources, can cause biases in the analysis of multiple microarrays. These systematic effects present a substantial obstacle for the analysis of microarray data, and the resulting information may be inconsistent and unreliable. Therefore, we introduced a simple integration method for combining microarray data sets that are derived from different experimental conditions, and we expected that more reliable information can be detected from the combined data set rather than from the separated data sets.

Materials and methods: This method is based on the distributions of the gene expression ratios among the different microarray data sets and it transforms, gene by gene, the gene expression ratios into the form of the reference data set. The efficiency of the proposed integration method was evaluated using two microarray data sets, which were derived from different RNA sources, and a newly defined measure, the mixture score.

Results: The proposed integration method intermixed the two data sets that were obtained from different RNA sources, which in turn reduced the experimental bias between the two data sets, and the mixture score increased by 24.2%. A data set combined by the proposed method preserved the inter-group relationship of the separated data sets.

Conclusion: The proposed method worked well in adjusting systematic biases, including the source effect. The ability to use an effectively integrated microarray data set yields more reliable results due to the larger sample size and this also decreases the chance of false negatives.

Keywords: Different RNA sources; Different platforms; Gene expression; Integration method; Microarray; Systematic effects.