Please use this identifier to cite or link to this item:
Title: Improved prediction of gene expression of epigenomics data of lung cancer using machine learning and deep learning models
Authors: Shi, ZhengXin
Keywords: Epigenomics;deep learning;histone modification;DNA methylation;RNA-sequencing;feature selection;classification
Issue Date: 26-Feb-2020
Abstract: Epigenetics is the study of biological mechanisms that will switch genes on and off, its alterations are deeply involved in the change of gene expression among various diseases including cancers. Machine learning is frequently used in cancer diagnosis and detection. In this research, four types of data are used towards the correct prediction of lung cancer, including DNA Methylation data, Histone data, Human Genome data, and RNA-Seq data. Four feature selection methods - ReliefF, Gain Ratio (GR), Principle Component Analysis (PCA), Correlation-based feature selection (CFS) and seven different classifiers - Random Forest (RF), Support Vector Machine (SVM) with Gaussian Kernel function and Linear Kernel function, Logistic Regression (LR), Naive Bayes (NB), Artificial Neural Network, and Convolutional Neural Network (CNN) were implemented in this study. The processing of these data sets is done using custom R-script. The tools that were used for feature selection and classification in the presented work are Weka 3 and Python. With the help of machine learning and deep learning methods, we were able to improve the accuracy and area under the curve (AUC) of the lung cancer prediction from an earlier published work. It was observed that the CNN model overperformed the other six classification methods.
Appears in Collections:Computational Sciences - Master's theses
Master's Theses

Files in This Item:
File Description SizeFormat 
Zhengxin Shi - Thesis FINAL.pdf1.56 MBAdobe PDFThumbnail

Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.