Modulad

Le Monde des Utilisateurs de L'Analyse de Données

Numéro 36

Feature selection for genomic data. Paola CERCHIELLO, Silvia FIGINI.
La revue MODULAD, numéro 36, Juillet 2007

Abstract:

Building predictive models for genomic mining requires feature selection, as an essential preliminary step to reduce the large number of available variable. Feature selection in the process of select a generally smaller subset of variables (features) that can be considered the best, from a statistical point of view, with respect to the employed model for the analysis. In gene expression microarray data, being able to select a few number of important genes not only makes data analysis efficient but also helps their biological interpretation. Microarray data have typically several thousands of genes (features) but only tens of samples.
Problems which can occur due to the small sample size have not been addressed well in the literature. Our aim is to discuss some issues on feature selection applied to microarray data in order to select the most important genes from a predictive point of view.

Keywords: Feature selection, Gene expression, Marker Selection, Kruskal-Wallis
test, Model Assessment, Predictive models.

Download paper : Feature selection for genomic data

Download slides : Feature selection for genomic data