IIIT Hyderabad Publications |
|||||||||
|
Copy Number Variation detection using Next Generation Sequencing dataAuthor: Sriharsha Vogeti Date: 2018-07-13 Report no: IIIT/TH/2018/33 Advisor:Nita Parekh AbstractCopy number variations, a type of structural variations, contribute most to the variability between human genomes. CNVs play an important role in human evolution, adaptability to environmental factors and disease susceptibility. The development of next generation sequencing technologies has enabled researchers to identify CNVs at a very high resolution like never before. Several methods have been proposed to identify CNVs leveraging the advantages offered by NGS data. However, there is still scope of improvements. We have developed a depth of coverage based CNV detection pipeline called integrated platform for Copy number variations Detection, Annotation and Visualisation (iCopyDAV), which gives users the option of choosing a method from multiple methods at every stage of the pipeline. The pipeline provides an end-to- end solution starting from pre-treatment of raw alignment data to variant calling, annotation and visualization. A comprehensive comparative analysis of our pipeline carried out revealed that the total variation minimization (TVM) method resulted in accurate detection of small CNVs <5kbp at >30x coverage with very low breakpoint errors. TVM showed precision 1 at sequencing depths >20x and had the best F-score at all depths except at 50X. iCopyDAV high precision score makes it a reliable tool to detect CNVs from NGS data. Using TVM approach in iCopyDAV we performed a population level study of chromosome 11 of five South Asian, one European, one East Asian and one African populations, considering twenty samples from each population. Copy number variant regions (CNVRs) constructed for each population, were used in multivariate analysis to study population stratification. Structural features such as tandem repeats, segmental duplications etc., spanning CNVRs were analyzed. Functional annotations such as genes, enhancers, long non-coding RNA and miRNA binding sites etc., spanning the CNVRs were studied. Gene ontology and pathway enrichment analysis of the genes associated with detected CNVRs were performed. Diseases associations of CNVRs were also analyzed in our studies. Our analysis of chromosome 11 with respect to all the aforementioned features of detected CNVRs did not reveal any population-specific trends, suggesting a need for carrying such CNV analysis for all the chromosomes. Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |