IIIT Hyderabad Publications |
|||||||||
|
Generative schemes for drug design with shape captioningAuthor: Shikhar Shasya 20171029 Date: 2023-09-23 Report no: IIIT/TH/2023/138 Advisor:Prabhakar Bhimalapuram AbstractIn this work, we develop three (related) schemes to generate novel molecules based on a seed molecule using an attentive captioning network. Input is the grid representation of the seed molecule, which could be one of the following: voxelised grid of the seed molecule, a grid reconstructed grid from it, or a sampled grid from it. The reconstructed grid is generated by passing the voxelised grid to the variational autoencoder and the sampled grid is generated by conditioning the decoder phase of the variational autoencoder on pharmacophoric requirements. The first scheme called ‘Direct Generation’ uses a RNN with the voxelised grid of the seed molecule as input to give an SMILES output of generated molecule. The second and third schemes utilize an additional variational autoencoder (VAE) to generate the grid which is used as input to RNN mentioned above in the subsequent step; the latent space of this VAE is modelled as a Riemannian manifold attached with a metric which is learnt along with the encoder and decoder networks of this VAE, which we name RHVAE. The second scheme, named ‘Autoencoded Generation’, takes as input the seed molecules’ voxelised grid representation for the encoder and its output along with the Remianian metric generates the latent space representation; this latent space representation along with the pharmacophoric requirement conditions form inputs for the decoder which outputs the ‘reconstruction grid’. In the third scheme, named ‘Sampled Generation’, starts with a point (and its Remanian metric) randomly sampled from latent space distribution learnt during training. This is evolved using either Hamiltonain Monte Carlo or Random Walk Monte Carlo, and then the evolved point with pharmacophoric conditions is sent to decoder to generate a ‘sampled grid’. In the subsequent step of Autoencoded Generation (Sampled Generation), this reconstructed grid (sampled grid) is sent to RNN for obtaining the SMILES of generated molecules. Overall, we demonstrate the generation of meaningful ligand shapes through the autoencoder network which can then be passed to our attentive captioning network to generate novel molecules while requiring smaller data sets for training while retaining the similar performance. Full thesis: pdf Centre for Computational Natural Sciences and Bioinformatics |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |