Machine learning for molecular geometry optimization and 3D structure generation

Author: Modee Rohit Laxman 20172153
Date: 2024-03-27
Report no: IIIT/TH/2024/22
Advisor:Deva U Priyakumar

Abstract

Artificial intelligence (AI) has infiltrated all fields of science, from high-energy particle physics to biology to computational chemistry. In the last couple of decades, there has been tremendous advancement in machine learning (ML) applications in computational chemistry. Deep learning (DL) has achieved some success in the automation of feature design, physicochemical property prediction, accelerated chemical space search, and the design of new drug-like molecules. Much work is still needed in terms of property prediction of inorganic molecules, along with the search and design of new molecules and material design with desired properties. This research aims to develop machine learning methods for 3D structure generation and molecular geometry optimization. Use of neural network potential (NNPs) can accelerate the process of 3D structure generation and molecular geometry optimization. Various neural network potentials (NNPs) have been reported in the literature to be as fast as force fields and as accurate as DFT. There has been a lack of standard comparative evaluation of these NNPs, which motivated us to do a benchmark study on NNPs. In this benchmark study, we evaluate and compare four NNPs, i.e., ANI, PhysNet, SchNet, and BAND-NN, for their accuracy in energy prediction, transferability to larger molecules, ability to produce accurate PES, and applicability in geometry optimization. In the context of 3D structure generation (Molecules and material design), there are two major components: search algorithm and property predictor. We need a fast and accurate method to predict the energy of the given system to accelerate the search in conformational space. For this, we developed a model known as DART, which predicts the energy of Gallium clusters using a Topological Atomistic Descriptor (TAD). TAD is a very simple and elegant descriptor that tries to encode structural information by dividing the connectivity information using distance cutoffs. We show the DART models ability to predict the energies of Gallium clusters accurately. For the second component, i.e., the search algorithm, we developed an RL-based model, MeGen, to generate 3D low-energy isomers of Gallium clusters, which uses DART as a reward function. Here we showed that MeGen is significantly more efficient than the conventional workflow for generating ground-state geometries as well as low-lying isomers in terms of time and computational resources. Following a similar train of thought, we developed the MolOpt model. This multi-agent RL-based search algorithm can perform molecular geometry optimization (MGO) by searching for the low-energy structure on the potential energy surface. We show that MolOpt trained on ethane and butane can be used to optimize larger alkanes up to octane. We compare our model with other optimizers and show that MolOpt outperforms the MDMin optimizer and performs similarly to the FIRE optimizer. We further developed an improved version of MolOpt known as MolOpt2. We have made algorithmic changes in MolOpt2, and MolOpt2 is trained on a diverse set of molecules. Hence, due to algorithmic changes, our new model (MolOpt2) can perform MGO on molecules containing elements CHNO and having a size of up to nine heavy atoms. Similar to MolOpt, we compare MolOpt2 with other optimizers and show that MolOpt2 outperforms the MolOpt, MDMin optimizer and performs similarly to the FIRE optimizer.

Full thesis: pdf

Centre for Computational Natural Sciences and Bioinformatics

IIIT Hyderabad Publications

Machine learning for molecular geometry optimization and 3D structure generation

Abstract