IIIT Hyderabad Publications |
|||||||||
|
Machine Learning for Inverse Problems in Chemistry: Spectra to StructureAuthor: bhuvanesh sridharan Date: 2023-06-12 Report no: IIIT/TH/2023/86 Advisor:Deva U Priyakumar AbstractThe discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied on experimentation and recently on combinatorial methods to generate new substances often complimented by computational methods. The sheer size of the chemical space makes it infeasible to search through all possible molecules exhaustively. This calls for fast and efficient methods to navigate the chemical space to find substances with desired properties. This class of problems is referred to as inverse design problems. There is a variety of inverse problems in chemistry encompassing various subfields like drug discovery, retrosynthesis, structure identification etc. Recent developments in modern machine learning (ML) methods have shown great promise in being able to tackle problems of this kind. This has helped in making major strides in all key phases of molecule discovery ranging from in silico candidate generation to their synthesis with focus on small organic molecules. Optimization techniques like Bayesian optimization, reinforcement learning, attention-based transformers, deep generative models like variational autoencoders and generative adversarial networks form a robust arsenal of methods. The first chapter of this thesis summarizes the development of deep learning to tackle a wide variety of inverse design problems in chemistry towards the quest for synthesizing small organic compounds with purpose. Spectroscopy is the study of how matter interacts with electromagnetic radiations of specific frequencies that has led to several monumental discoveries in science. The spectra of any particular molecule is highly information-rich; while structure to spectra is straightforward using computational methods, the inverse relation of spectra to the corresponding molecular structure is still an unsolved problem. Nuclear Magnetic Resonance (NMR) spectroscopy is one such critical technique in the scientists’ toolkit to characterise small organic molecules to biomolecular structures like proteins and nucleic acids. In the second half of the thesis, a novel machine learning framework is proposed that attempts to solve this inverse problem by navigating the chemical space to find the correct structure given an NMR spectra. The proposed framework uses a combination of online Monte-Carlo-Tree-Search (MCTS) and a set of offline trained Graph Convolution Networks to build a molecule iteratively from scratch. Our method is able to predict the correct structure of the molecule ∼ 80% of the time in its top 3 guesses. We believe that the proposed framework is a significant step in solving the inverse design problem of NMR spectra to molecule that would be a significant step forward in high-throughput molecular synthesis Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |