Algoritmos Evolutivos Multiobjetivo aplicados a la Selección de Características en Microarrays de Datos de Cáncer
Resumen
El análisis de microarrays de expresión de genes es un tópico actual para el diagnóstico y clasificación del cáncer humano. Un microarray de datos de expresión de genes consiste en una matriz de miles de características de las cuales la mayoría es irrelevante para clasificar patrones de expresiones de genes. La elección de un subconjunto mínimo de características para clasificación es una tarea dificultosa. En este trabajo, se realiza una comparación entre dos algoritmos evolutivos multiobjetivo aplicados a conjuntos de expresiones de genes populares en la literatura (linfoma, leucemia y colon). Con el objetivo de remover las características con fuerte correlación se realiza una etapa de pre-procesamiento. Se muestra un análisis extenso y detallado de los resultados obtenidos para los algoritmos multiobjetivo seleccionados.
Descargas
Citas
S. Selvaraj and J. Natarajan, “Microarray data analysis and mining tools,” Bioinformation, vol. 6, no. 3, pp. 95–99, 2011.
P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Transactions on Computers, vol. C-26, no. 9, pp. 917–922, Sept 1977.
M. Dash and H. Liu, “Feature selection for classification,” Intelligent data analysis, vol. 1, no. 3, pp. 131–156, 1997.
H. Liu and Z. Zhao, “Manipulating data and dimension reduction methods: Feature selection,” in Encyclopedia of Complexity and Systems Science. Springer, 2009, pp. 5348–5359.
H. Liu, H. Motoda, R. Setiono, and Z. Zhao, “Feature selection: An ever evolving frontier in data mining,” in Feature Selection in Data Mining, 2010, pp. 4–13.
A. W. Whitney, “A direct method of nonparametric measurement selec-tion,” IEEE Transactions on Computers, vol. 100, no. 9, pp. 1100–1103, 1971.
T. Marill and D. Green, “On the effectiveness of receptors in recognition systems,” IEEE transactions on Information Theory, vol. 9, no. 1, pp. 11–17, 1963.
P. Pudil, J. Novovicovˇa,´ and J. Kittler, “Floating search methods in feature selection,” Pattern recognition letters, vol. 15, no. 11, pp. 1119– 1125, 1994.
Q. Mao and I. W.-H. Tsang, “A feature selection method for multivariate performance measures,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 9, pp. 2051–2063, 2013.
F. Min, Q. Hu, and W. Zhu, “Feature selection with test cost constraint,” International Journal of Approximate Reasoning, vol. 55, no. 1, pp. 167– 179, 2014.
B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutio-nary computation approaches to feature selection,” IEEE Trans. Evol. Comput., vol. 20, no. 4, pp. 606–626, 2016.
C. S. R. Annavarapu, S. Dara, and H. Banka, “Cancer microarray data feature selection using multi-objective binary particle swarm optimiza-tion algorithm,” EXCLI Journal, vol. 15, pp. 460–473, 2016.
A. Hasnat and A. U. Molla, “Feature selection in cancer microarray data using multi-objective genetic algorithm combined with correlation coef-ficient,” in 2016 International Conference on Emerging Technological Trends (ICETT), 2016, pp. 1–6.
M. M. Mafarja and S. Mirjalili, “Hybrid whale optimization algorithm with simulated annealing for feature selection,” Neurocomputing, vol. 260, pp. 302 – 312, 2017. U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and J. Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences, vol. 96, no. 12, pp. 6745–6750, 1999.
A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, Rosenwald, J. C. Boldrick et al., “Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling,” Nature, vol. 403, no. 6769, pp. 503–511, 2000.
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531– 537, 1999.
J. S. Dussaut, C. A. Gallo, F. Cravero, M. J. Martínez, J. A. Carballido, and I. Ponzoni, “Gernet: a gene regulatory network tool,” Biosystems, vol. 162, pp. 1–11, 2017.
J. A. Carballido, C. A. Gallo, J. S. Dussaut, and I. Ponzoni, “On evo-lutionary algorithms for biclustering of gene expression data,” Current Bioinformatics, vol. 10, no. 3, pp. 259–267, 2015.
P. G. Kumar, T. A. A. Victoire, P. Renukadevi, and D. Devaraj, “Design of fuzzy expert system for microarray data classification using a novel genetic swarm algorithm,” Expert Systems with Applications, vol. 39, no. 2, pp. 1811–1821, 2012.
R. K. Singh and M. Sivabalakrishnan, “Feature selection of gene expression data for cancer classification: a review,” Procedia Computer Science, vol. 50, pp. 52–57, 2015.
S. Shahbeig, M. S. Helfroush, and A. Rahideh, “A fuzzy multi-objective hybrid tlbo–pso approach to select the associated genes with breast cancer,” Signal Processing, vol. 131, pp. 58–65, 2017.
K. Deb, Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, 2001. [24] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (2nd Ed). Wiley, 2001.
M. Walesiak, A. Dudek, and M. A. Dudek, “clustersim package,” 2011. [26] M. Kuhn, “Caret package,” Journal of Statistical Software, vol. 28, no. 5, pp. 1–26, 2008.
N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Non-parametric Regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput., vol. 6, no. 2, pp. 182–197, 2002.
A. J. Nebro, E. Alba, G. Molina, F. Chicano, F. Luna, and J. J. Durillo, “Optimal antenna placement using a new multi-objective chc algorithm,” in 9th annual conference on Genetic and evolutionary computation. New York, NY, USA: ACM Press, 2007, pp. 876–883.
J. J. Durillo and A. J. Nebro, “jMetal: A java framework for multi-objective optimization,” Advances in Engineering Software, vol. 42, pp. 760–771, 2011.