Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3407
Title: Prediction of cancer for microarray and DNA methylation data with Non-Negative Matrix Factorization and machine learning methods
Authors: Patel, Parth
Keywords: Microarray datasets;Feature Extraction;Feature Selection;Principal Component Analysis;Non-negative Matrix Factorization;K-means;Methylation;Random forest;Support Vector Machine;KNN;ANN
Issue Date: 21-Jun-2019
Abstract: Over the past few years, there has been a massive spread of microarray technology in many biological patterns particularly pertaining to certain diseases like leukaemia, prostate cancer, etc. Over the years there have been numerous mathematical techniques which have been applied on microarray data and group them into clusters to show a similar pattern for expression. One hurdle in the proper understanding of such datasets is that they are very large and thus for an efficient and effective means of studying the same, we need to reduce their dimensions by a very large extent. In this thesis, we’ve exploited the matrix-like structure of such microarray data and then use a popular technique called Non-Negative Matrix Factorisation (NMF) which is used for dimensionality reduction primarily in the field of biological data. The approach not only transforms the data into a form easily readable by reducing its dimensions but also allows for clustering in the end in order to get accuracy measures for the same. In this thesis, we have applied different NMF algorithms to five different datasets for obtaining matrices with a reduced number of features. Out of the five, two are methylation datasets while the other three are ordinary cancer microarray datasets. Some other results like the heat-maps for the matrices were also obtained. We’ve also compared the accuracy of the NMF algorithm with a more conventional PCA algorithm for different dimensions and the results showed that in case of NMF a higher accuracy was observed across all the three datasets. A total of four different classifiers which are: Random Forest, SVM, KNN and ANN were also used to check the classification accuracy after application of NMF while comparing the same with PCA algorithm.
URI: https://zone.biblio.laurentian.ca/handle/10219/3407
Appears in Collections:Master's theses
Master's Theses

Files in This Item:
File Description SizeFormat 
Parth Patel-THESIS-FINAL.pdf3.61 MBAdobe PDFThumbnail
View/Open


Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.