Please use this identifier to cite or link to this item:
Title: Semi-orthogonal non-negative factorization as a feature extraction method to improve prediction accuracy of microarray cancer data
Authors: Patel, Nakul
Keywords: DNA Methylation;feature selection;feature extraction;non-negative matrix factorization;Semi-orthogonal non-negative matrix factorization;principal component analysis;Enhanced fourier transform;symmetry percentage error.
Issue Date: 15-Apr-2020
Abstract: Abnormal growth in cells with the potential to diffuse to other parts of the human body could occur due to multiple reasons such as changes in DNA segments activity. Altering DNA methylation is known as an important factor in cancer development and altering DNA activity by avoiding some of the normal activities of DNA. Feature selection and feature extraction is used to reduce the dimensionality in high dimensional datasets as well as to filter the most useful features in predicting gene expression for a cancer. A number of feature extraction methods have been used in literature for selecting the most useful features. In this study Semi-orthogonal Non-Negative Factorization (SONMF) was studied and tested on four microarray cancer datasets for feature extraction and compared with FFT features, Symmetry of Methylation Density Features, Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). Five different classifiers, namely Naïve Bayes, Support Vector Machine (SVM), K-nearest Neighbour (KNN), Random Forest and Neural Network were used to predict the gene expression of the four cancer microarray datasets. The experiments show that for colon cancer dataset, Semi-orthogonal NMF (SONMF) and Non-negative Matrix Factorization (NMF) with Naïve Bayes classifier performed the best compared with other feature extraction methods. It was shown by the oneway analysis of variance that the accuracy, specificity and sensitivity of SONMF was significantly higher than PCA. However, in terms of the highest accuracy, SONMF and NMF feature extraction methods give the best performance with Naïve Bayes classifier for Colon cancer dataset. For Oral cancer dataset, the highest accuracy was observed with SONMF and Neural Network classifier. In Leukemia cancer, the highest accuracy of 100% was observed with NMF, SONMF and PCA with Neural Network and SVM classifiers. However, comparing the median for the best classifier shows that the median of the SONMF and NMF were slightly higher than PCA. For prostate cancer dataset, SONMF with Naïve Bayes classifier gave the highest accuracy. However, the classification accuracy was not significantly different from PCA and NMF. Overall, the results of SONMF were more consistent compared with other features extraction methods.
Appears in Collections:Computational Sciences - Master's theses

Files in This Item:
File Description SizeFormat 
Nakul Patel - Thesis FINAL - 06May2020.pdf2.81 MBAdobe PDFView/Open

Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.