IIIT Hyderabad Publications |
|||||||||
|
Machine Learning for Protein Stability Prediction Upon MutationAuthor: Yashas B. L. Samaga 20171080 Date: 2022-04-09 Report no: IIIT/TH/2022/63 Advisor:Deva U Priyakumar AbstractEngineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. They were trained using small and biased datasets. The methods have been previously evaluated for symmetric consistency by testing with hypothetical reverse mutations, and many fail the test. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new S transitive dataset, and a new machine learning based method, first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue’s contributions towards protein stability (∆G) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ∆∆G. We show that this self-consistent machine learning architecture is immune to many common biases in datasets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data. Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |