Motivation: It is biologically interesting to address whether human blood outgrowth endothelial cells (BOECs) belong to or are closer to large vessel endothelial cells (LVECs) or microvascular endothelial cells (MVECs) based on global expression profiling. An earlier analysis using a hierarchical clustering and a small set of genes suggested that BOECs seemed to be closer to MVECs. By taking advantage of the two known classes, LVEC and MVEC, while allowing BOEC samples to belong to either of the two classes or to form their own new class, we take a semi-supervised learning approach; for high-dimensional data as encountered here, we propose a penalized mixture model with a weighted L1 penalty to realize automatic feature selection while fitting the model.
Results: We applied our penalized mixture model to a combined dataset containing 27 BOEC, 28 LVEC and 25 MVEC samples. Analysis results indicated that the BOEC samples appeared to form their own new class. A simulation study confirmed that, compared with the standard mixture model with or without initial variable selection, the penalized mixture model performed much better in identifying relevant genes and forming corresponding clusters. The penalized mixture model seems to be promising for high-dimensional data with the capability of novel class discovery and automatic feature selection.