Veuillez utiliser cette adresse pour citer ce document : https://zone.biblio.laurentian.ca/handle/10219/3682
Titre: Semi-orthogonal non-negative factorization as a feature extraction method to improve prediction accuracy of microarray cancer data
Auteurs: Patel, Nakul
Mots clés: DNA Methylation;feature selection;feature extraction;non-negative matrix factorization;Semi-orthogonal non-negative matrix factorization;principal component analysis;Enhanced fourier transform;symmetry percentage error.
Date publié: 15-avr-2020
Abstrait: Abnormal growth in cells with the potential to diffuse to other parts of the human body could occur due to multiple reasons such as changes in DNA segments activity. Altering DNA methylation is known as an important factor in cancer development and altering DNA activity by avoiding some of the normal activities of DNA. Feature selection and feature extraction is used to reduce the dimensionality in high dimensional datasets as well as to filter the most useful features in predicting gene expression for a cancer. A number of feature extraction methods have been used in literature for selecting the most useful features. In this study Semi-orthogonal Non-Negative Factorization (SONMF) was studied and tested on four microarray cancer datasets for feature extraction and compared with FFT features, Symmetry of Methylation Density Features, Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). Five different classifiers, namely Naïve Bayes, Support Vector Machine (SVM), K-nearest Neighbour (KNN), Random Forest and Neural Network were used to predict the gene expression of the four cancer microarray datasets. The experiments show that for colon cancer dataset, Semi-orthogonal NMF (SONMF) and Non-negative Matrix Factorization (NMF) with Naïve Bayes classifier performed the best compared with other feature extraction methods. It was shown by the oneway analysis of variance that the accuracy, specificity and sensitivity of SONMF was significantly higher than PCA. However, in terms of the highest accuracy, SONMF and NMF feature extraction methods give the best performance with Naïve Bayes classifier for Colon cancer dataset. For Oral cancer dataset, the highest accuracy was observed with SONMF and Neural Network classifier. In Leukemia cancer, the highest accuracy of 100% was observed with NMF, SONMF and PCA with Neural Network and SVM classifiers. However, comparing the median for the best classifier shows that the median of the SONMF and NMF were slightly higher than PCA. For prostate cancer dataset, SONMF with Naïve Bayes classifier gave the highest accuracy. However, the classification accuracy was not significantly different from PCA and NMF. Overall, the results of SONMF were more consistent compared with other features extraction methods.
URI: https://zone.biblio.laurentian.ca/handle/10219/3682
Apparaît dans les collections:Computational Sciences - Master's theses
Master's Theses

Fichiers dans cet item:
Fichier Description TailleFormat 
Nakul Patel - Thesis FINAL - 06May2020.pdf2.81 MBAdobe PDFThumbnail
Parcourir/Ouvrir


Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.