Classification approaches for microarray gene expression data analysis

Almoeirfi, Makkeyah

Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/2535

Full metadata record

DC Field	Value	Language
dc.contributor.author	Almoeirfi, Makkeyah	-
dc.date.accessioned	2016-03-31T14:18:51Z	-
dc.date.available	2016-03-31T14:18:51Z	-
dc.date.issued	2015-03-13	-
dc.identifier.uri	https://zone.biblio.laurentian.ca/dspace/handle/10219/2535	-
dc.description.abstract	The technology of Microarray is among the vital technological advancements in bioinformatics. Usually, microarray data is characterized by noisiness as well as increased dimensionality. Therefore, data, that is finely tuned, is a requirement for conducting the microarray data analysis. Classification of biological samples represents the most performed analysis on microarray data. This study is focused on the determination of the confidence level used for the classification of a sample of an unknown gene based on microarray data. A support vector machine classifier (SVM) was applied, and the results compared with other classifiers including K-nearest neighbor (KNN) and neural network (NN). Four datasets of microarray data including leukemia data set, prostate dataset, colon dataset, and breast dataset were used in the research. Additionally, the study analyzed two different kernels of SVM. These were radial kernel and linear kernels. The analysis was conducted by varying percentages of dataset distribution coupled with training and test datasets in order to make sure that the best positive sets of data provided the best results. The 10-fold cross validation method (LOOCV) and the L1 L2 techniques of regularization were used to get solutions for the over-fitting issues as well as feature selection in classification. The ROC curve and a confusion matrix were applied in performance assessment. K-nearest neighbor and neural network classifiers were trained with similar sets of data and comparison of the results was done. The results showed that the SVM exceeded the performance and accuracy compared to other classifiers. For each set of data, support vector machine was the best functional method based on the linear kernel since it yielded better results than the other methods. The highest accuracy of colon data was 83% with SVM classifier, while the accuracy of NN with the same data was 77% and KNN was 72%. Leukemia data had the highest accuracy of 97% with SVM, 85% with NN, and 91% with KNN. For breast data, the highest accuracy was 73% with SVM-L2, while the accuracy was 56% with NN and 47% with KNN. Finally, the highest accuracy of prostate data was 80% with SVM-L1, while the accuracy was 75% with NN and 66% with KNN. It showed the highest accuracy as well as the area under curve compared to k-nearest neighbor and neural network in the three different tests.	en_CA
dc.language.iso	en	en_CA
dc.subject	Microarray	en_CA
dc.subject	data	en_CA
dc.subject	gene	en_CA
dc.title	Classification approaches for microarray gene expression data analysis	en_CA
dc.type	Thesis	en_CA
dc.description.degree	Master of Science (MSc) in Computational Sciences	-
dc.publisher.grantor	Laurentian University of Sudbury	-
Appears in Collections:	Computational Sciences - Master's theses Master's Theses

Files in This Item:

File	Description	Size	Format
Makkeyah thesis - final for library.pdf		1.9 MB	Adobe PDF	View/Open

Show simple item record