Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPatel, Parth-
dc.description.abstractOver the past few years, there has been a massive spread of microarray technology in many biological patterns particularly pertaining to certain diseases like leukaemia, prostate cancer, etc. Over the years there have been numerous mathematical techniques which have been applied on microarray data and group them into clusters to show a similar pattern for expression. One hurdle in the proper understanding of such datasets is that they are very large and thus for an efficient and effective means of studying the same, we need to reduce their dimensions by a very large extent. In this thesis, we’ve exploited the matrix-like structure of such microarray data and then use a popular technique called Non-Negative Matrix Factorisation (NMF) which is used for dimensionality reduction primarily in the field of biological data. The approach not only transforms the data into a form easily readable by reducing its dimensions but also allows for clustering in the end in order to get accuracy measures for the same. In this thesis, we have applied different NMF algorithms to five different datasets for obtaining matrices with a reduced number of features. Out of the five, two are methylation datasets while the other three are ordinary cancer microarray datasets. Some other results like the heat-maps for the matrices were also obtained. We’ve also compared the accuracy of the NMF algorithm with a more conventional PCA algorithm for different dimensions and the results showed that in case of NMF a higher accuracy was observed across all the three datasets. A total of four different classifiers which are: Random Forest, SVM, KNN and ANN were also used to check the classification accuracy after application of NMF while comparing the same with PCA algorithm.en_US
dc.subjectMicroarray datasetsen_US
dc.subjectFeature Extractionen_US
dc.subjectFeature Selectionen_US
dc.subjectPrincipal Component Analysisen_US
dc.subjectNon-negative Matrix Factorizationen_US
dc.subjectRandom foresten_US
dc.subjectSupport Vector Machineen_US
dc.titlePrediction of cancer for microarray and DNA methylation data with Non-Negative Matrix Factorization and machine learning methodsen_US
dc.description.degreeMaster of Science (MSc) in Computational Sciencesen_US
dc.publisher.grantorLaurentian University of Sudburyen_US
Appears in Collections:Computational Sciences - Master's theses
Master's Theses

Files in This Item:
File Description SizeFormat 
Parth Patel-THESIS-FINAL.pdf3.61 MBAdobe PDFThumbnail

Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.