Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3682
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPatel, Nakul-
dc.date.accessioned2021-05-27T19:34:01Z-
dc.date.available2021-05-27T19:34:01Z-
dc.date.issued2020-04-15-
dc.identifier.urihttps://zone.biblio.laurentian.ca/handle/10219/3682-
dc.description.abstractAbnormal growth in cells with the potential to diffuse to other parts of the human body could occur due to multiple reasons such as changes in DNA segments activity. Altering DNA methylation is known as an important factor in cancer development and altering DNA activity by avoiding some of the normal activities of DNA. Feature selection and feature extraction is used to reduce the dimensionality in high dimensional datasets as well as to filter the most useful features in predicting gene expression for a cancer. A number of feature extraction methods have been used in literature for selecting the most useful features. In this study Semi-orthogonal Non-Negative Factorization (SONMF) was studied and tested on four microarray cancer datasets for feature extraction and compared with FFT features, Symmetry of Methylation Density Features, Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). Five different classifiers, namely Naïve Bayes, Support Vector Machine (SVM), K-nearest Neighbour (KNN), Random Forest and Neural Network were used to predict the gene expression of the four cancer microarray datasets. The experiments show that for colon cancer dataset, Semi-orthogonal NMF (SONMF) and Non-negative Matrix Factorization (NMF) with Naïve Bayes classifier performed the best compared with other feature extraction methods. It was shown by the oneway analysis of variance that the accuracy, specificity and sensitivity of SONMF was significantly higher than PCA. However, in terms of the highest accuracy, SONMF and NMF feature extraction methods give the best performance with Naïve Bayes classifier for Colon cancer dataset. For Oral cancer dataset, the highest accuracy was observed with SONMF and Neural Network classifier. In Leukemia cancer, the highest accuracy of 100% was observed with NMF, SONMF and PCA with Neural Network and SVM classifiers. However, comparing the median for the best classifier shows that the median of the SONMF and NMF were slightly higher than PCA. For prostate cancer dataset, SONMF with Naïve Bayes classifier gave the highest accuracy. However, the classification accuracy was not significantly different from PCA and NMF. Overall, the results of SONMF were more consistent compared with other features extraction methods.en_US
dc.language.isoenen_US
dc.subjectDNA Methylationen_US
dc.subjectfeature selectionen_US
dc.subjectfeature extractionen_US
dc.subjectnon-negative matrix factorizationen_US
dc.subjectSemi-orthogonal non-negative matrix factorizationen_US
dc.subjectprincipal component analysisen_US
dc.subjectEnhanced fourier transformen_US
dc.subjectsymmetry percentage error.en_US
dc.titleSemi-orthogonal non-negative factorization as a feature extraction method to improve prediction accuracy of microarray cancer dataen_US
dc.typeThesisen_US
dc.description.degreeMaster of Science (MSc) in Computational Sciencesen_US
dc.publisher.grantorLaurentian University of Sudburyen_US
Appears in Collections:Computational Sciences - Master's theses
Master's Theses

Files in This Item:
File Description SizeFormat 
Nakul Patel - Thesis FINAL - 06May2020.pdf2.81 MBAdobe PDFThumbnail
View/Open


Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.