IIIT Hyderabad Publications |
|||||||||
|
Information-Theoretic Results for DNA-based Data Storage in the Shotgun Sequencing Channel with ErasuresAuthor: Hrishi Narayanan 2019113022 Date: 2024-06-21 Report no: IIIT/TH/2024/111 Advisor:Nita Parekh,Prasad Krishnan AbstractIn shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called reads). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless substrings of the original sequence. Modern shotgun sequencers however also output quality scores for each base read, indicating the confidence in its identification. Bases with low quality scores can be considered to be erased. Motivated by this, we consider the shotgun sequencing channel with erasures, where each symbol in any read can be independently erased with some probability δ. We identify achievable rates for this channel, using a random code construction and a decoder that uses typicality-like arguments to merge the reads. To do this, we analyse the probability of error of the decoder and establish that the probability of error vanishes, as the length of the code goes to infinity, when the rate of the code is bounded based on the parameters of the channel. Our achievability result subsumes the achievability result obtained in the prior work for the shotgun sequencing channel (without erasures, i.e., with erasure probability δ = 0) [1]. However, the case of non-zero erasure probability has never been considered in the literature before, and hence our achievability results are completely novel in this case. For given parameters of the problem, we give some numerical comparisons of our achievable rate with an ‘interpolated’ version of the achievable rate from prior work for the δ = 0 case, and show that our result is a non-trivial improvement over such an interpolation. Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |