IIIT Hyderabad Publications |
|||||||||
|
Integrative Analysis of gene expression and DNA methylation in Papillary Renal Cell CarcinomaAuthor: Noor Pratap Singh Date: 2019-07-12 Report no: IIIT/TH/2019/76 Advisor:Vinod P K AbstractPapillary renal cell carcinoma (PRCC) is the second most common subtype of renal cell carcinoma accounting for 10 − 15% cases. PRCC is a heterogeneous disease with variations in disease progression and clinical outcomes. The advent of next generation sequencing techniques (NGS) has led to the creation of multi-omics data of PRCC. In this thesis, multi-omics data involving gene expression and DNA methylation of PRCC is analyzed to identify biomarkers, develop predictive models of tumor (with Group Lasso) using both gene expression and DNA methylation features showed the overall best performance on test data with MCC and PR AUC of 0.77 and 0.82, respectively, across different rep-resentations of methylation data and feature sets. Our study not only generates insights into the gene regulation but also develops models that will have diagnostic applications. stage prediction and understand the relationship between them. We first developed a machine learning pipeline incorporating different feature selection algorithms and classification models for tumour stage prediction using RNA sequencing dataset (RNASeq). To get a reliable feature set, we extracted features from different partitions of the training dataset and aggregated them into feature sets for classification. We evaluated the performance of different algorithms on the basis of 10-fold cross validation (CV) and independent test dataset. 10-fold CV was also performed on a microarray dataset of PRCC. A random forest based feature selection yielded minimum number of features (104) and a best performance with area under Precision Recall curve (PR AUC) of 0.80, MCC (Matthews Correlation Coefficient) of 0.71 and accuracy of 88% with Shrunken Centroid classifier on a test dataset. We identified 80 genes that are consistently altered between stages by different feature selection algorithms. The extracted features are related to cellular components -centromere, kinetochore and spindle, and biological process mitotic cell cycle. The interactions between features also reveal the systems-level alteration driving the progression from early stage to late stage of PRCC. The potential mechanisms for an increase in chromosome instability in the late stage of PRCC has been proposed. We then investigated the methylation patterns of PRCC and its relationship with the gene expression using Illumina methylation 450K array. We found that both the promoter and body of tumor suppressor genes, microRNAs and gene clusters and families including cadherins, protocadherins, claudins and collagens are hypermethylated in PRCC. Hypomethylated genes in PRCC are associated with the immune function. The gene expression of several novel candidate genes including interleukin receptor IL17RE and immune checkpoint genes HHLA2, SIRPA and HAVCR2 shows significant correlation with the DNA methylation. A significant correlation within the tumor samples was also observed for genes RRM2, NCAPG and SLC7A11, which we found to be potential biomarkers for stage progression in the first part of our study. Further, we developed predictive models based on single- and multi-omics data to distinguish early and late stages of PRCC. A comparative study of predictive models, data in- tegration techniques and representations of methylation data was performed. Multiple kernel learning (with Group Lasso) using both gene expression and DNA methylation features showed the overall best performance on test data with MCC and PR AUC of 0.77 and 0.82, respectively, across different rep- resentations of methylation data and feature sets. Our study not only generates insights into the gene regulation but also develops models that will have diagnostic applications. Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |