IIIT Hyderabad Publications
Understanding the differences in preliminary stages of gene expression regulation in healthy and unhealthy mammalian tissues, and identification of potential markers
Author: Neelima Ch
Report no: IIIT/TH/2016/60
Advisor: Kshitish Acharya,Abhijit Mitra
Background Determining gene and transcript expression patterns is crucial for understanding cellular and molecular biology concepts. Currently, a large amount of gene expression data is available, and many tools and databases are developed to predict or extract gene expression patterns. But, these resources have many limitations. So, there is a need to develop a new resource that can help in establishing expression patterns. Gene expression patterns can also assist in the identification of biomarkers, which in turn is an important step for identification of better diagnostics and prognostics. Microarray experiments have been extensively used to identify genes associated with various physiological conditions. But, there is a scope for making a better use of such available data. For example, the gene-level expression data can be used to identify transcript isoforms associated with a disease or a physiological condition. Establishing expression of the transcript isoforms in various physiological and disease conditions is important, as most of the human genes undergo alternative splicing and produce an enormous number of transcript isoforms, and these isoforms are also associated with diseases. In fact, establishing alternatively spliced isoforms in different tissues and conditions is essential for understanding many fundamental molecular events, such as mechanisms of alternative splicing and its regulation. Moreover, the mechanisms of alternative splicing are far from being understood clearly. Non-obstructive azoospermia is a common male infertility disorder that can be considered as a case for reanalyzing the gene expression data corresponding to it. Microarray studies have enabled identification of genes differentially regulated in this disease. But, no meta-analysis attempts have been made to identify genes strongly associated with the disease, which can be further used as potential biomarkers. It should be possible to identify transcript isoforms as potential biomarkers for non-obstructive azoospermia. Objectives The current study aimed at: Compiling existing gene expression data for one tissue, the mammalian testis Comparing the utilities of the existing gene expression databases with the newly developed testis-specific database (developed using the compiled data) Establishing expression profiles at the transcript isoform level via a new software and validating the results Using the new software and RNA-sequencing experiments to establish the transcriptome for non-obstructive azoospermia and identifying potential biomarkers at transcript isoform level and Analyzing the data to explore the factors associated with alternative splicing, in the context of mammalian testis Results The mass scale gene expression data, corresponding to various testicular physiological conditions and normal physiological condition for other tissues, were curated, and gene expression datasets were accumulated/derived. These datasets were later used to derive genes associated with different physiological conditions. The database developed using such data was found to be better and more reliable than most other gene expression databases. This could be due to both wealth of data and the new features used - including a meta-analysis method employed in the development of the new database. A web server, TIPMaP, was developed to predict expression profiles of transcript-isoforms using the original/existing microarray gene expression data. These data were reanalyzed using good quality probes and the transcript expression profiles were determined. The expression pattern predicted by the tool was validated for a few transcripts using RT-PCR and RNA-sequencing experiments. There was a high percentage agreement between the experimental observations and the web server results, and the extent of such agreement was dependent on the consistency in the results across the microarray hybridizations. Thus, the tool aids in establishing expression patterns of specific transcripts and provides a quantitative means of reproducibility of these patterns. Transcripts differentially regulated/expressed in non-obstructive azoospermia were identified by re-analyzing an already published gene expression dataset using TIPMaP and by RNA-sequencing. The results helped to identify the potential biomarkers for non-obstructive azoospermia. An in-depth analysis of alternatively spliced transcript isoforms helped to compile the type of events that occur in the mammalian testis-associated genes. Additionally, a set of factors that might promote alternative splicing in the testis tissue were identified. These factors include a few testis-specific splice factors, over/under-represented splicing regulatory motifs, and RNA secondary structural stabilities at the splice site junctions. Conclusions The majority of the existing gene expression data was compiled for the mammalian testis tissue via a laborious biocuration process. The new gene expression database developed using such curated data was found to be superior to other available databases in many aspects. A new computational tool was developed to derive expression profiles of transcript isoforms, using refined microarray probes and corresponding gene-level expression data. The expression pattern predicted by the tool was validated experimentally and the results were convincing. The transcriptome for non-obstructive azoospermia was established for the first time using RNA sequencing, and a few potential biomarkers were also identified. Novel insights were obtained towards the factors that promote alternative splicing in the mammalian testis tissue.
Full thesis: pdf
Centre for Computational Natural Sciences and Bioinformatics
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.