I'm a final year Integrated Master's student at IIT Kharagpur in India, where I study Computational Biology, and will graduate with a Bachelor's+Master's degree(s) in Biotechnology and Biochemical Engineering after Spring 2025.
In my undergrad journey, I have worked with Prof. Koel Chaudhary and Prof. Soumya De at IIT Kharagur. In the summer of 2023, I got an amazing opportunity to work with Prof. Brian Ingalls at the University Of Waterloo. I worked at Caltech in the summer of '24 with Prof. Anima Anandkumar and Dr. Shengchao Liu on understanding Protein-DNA interactions using Cross-Attention and language modelling. Currently, I am wrapping up my Master's thesis on using discrete diffusion for controllable generation of DNA-binding protein sequences, and am excited to see where this project goes.
I'm primarily interested in Deep Learning for Protein Design, Gene regulation and Biological systems, Mathematical modelling of biological networks and dynamics, and AI for science. Most of my work has involved applying robust learning methods to Biological problems in an interpretable manner. Apart from Biology, I am keenly interested in and have also worked in Deep Learning for Vision (specifically SSL, secure ML and Generative modelling), as well as Causal Inference and Graph Learning. More recently, I have been interested in Fourier Neural Operators, Geometric Deep Learning, State-space models and their applications in Biology.
News
[Dec 2024]
Our paper on backdoor and adversarial attacks targetting SSL was
accepted at ICASSP 2025. See you at Hyderabad!
[Nov 2024]
Our work on understanding Protein-DNA interactions using Protein and Genomics Foundation models was accepted at the MLSB, FM4Science and AIDrugX workshops, NeurIPS 2024.
Current Projects
Discrete Diffusion For Tunable DNA-binding Protein Design
[Slides]
Using Discrete Diffusion to learn the distribution of DNA-binding protein sequences. Using our work on Seq2Contact to guide the diffusion process to generate proteins with high affinity for specified DNA targets. Working on optimizing CRISPR-Cas protein design, stitching together different functional domains by inpainting a DNA-binding domain in between, generating DNA sequences for binding to a target protein by inverting the Seq2Contact model, and engineering protein-DNA interactions. This project is in collaboration with Prof. Riddhiman Dhar at IIT Kharagpur. Image shows ESMFold folded sequences sampled by our trained discrete diffusion model, with as less as 25% sequence similarity to the training set and the presence of different DNA-binding domains (through PFam scans). The protein sequences were not cherry-picked, and the structures are coloured by confidence (blue: High, red: Low).
Finding Allosteric Networks in the CAP-cAMP system using Deep Learning and Molecular Dynamics
[Slides]
Using MD simulations and trajectory analysis to infer Allosteric networks in the CAP-cAMP system (where the cAMP ligand binds to CAP protein, leading to allosteric orientation changes and finally, transcription). Current progress includes uncovering of three prospective networks using AlloReverse. Future plans include using State-space models to learn the dynamics of protein-ligand interactions, and finding a generalizable method for allosteric discovery. This work is in collaboration with Prof. Soumya De at IIT Kharagpur.
Publications
[Note: Highlighted papers indicate first authorship; (*) indicates equal contribution].
Cross-Attention coupled with Protein and Genomics Foundation models to understand Protein-DNA interactions speeds
up inference and achieves State-of-the-art performance in predicting contacts in Protein-DNA complexes (using purely sequence data for inference).
An intuitively elegant and simple defense strategy to defend against standard SSL augmentation invariant frequency based backdoor attacks. Taking a leaf out of frequency domain attacks, we also use frequency domain patching to increase model robustness in SSL.
An extension of our previous work (see below) on using (ATR-FTIR)Spectroscopy and Deep Learning for prediction of Idiopathic Recurrent Spontaneous Miscarriage (IRSM). This work focuses on the classification of IRSM using ATR-FTIR Spectroscopy, which is a non-invasive and cost-effective technique and improves on our previous work on using Raman Spectroscopy in a similar problem setting.
Learning meaningful representations of multi-cell timeseries (Cellmodeller) simulations using causal representation
learning. Current work includes extending this to real-world data, and for proxy-simulation generation for biological experiments.
Dadoma Sherpa,
Dhruva Abhijit Rajwade,
Imon Mitra,
Dhruba Dhar,
Sunita Sharma,
Pratip Chakraborty,
Koel Chaudhury IEEE International Conference on Computer, Electrical & Communication Engineering, 2023
Paper
/
Code
Using Raman Spectroscopy and Machine Learning for prediction of Idiopathic Recurrent Spontaneous Miscarriage
(IRSM). Improved this work using ATR-FTIR Spectroscopy in a follow-up study, and currently working on a
multi-omics approach to the same problem.
Using different statistical tests and emperical analyses to identify risk factors associated with mortality in Hypersensitivity pneumonitis, a rare lung disease. Checking for Publication bias and heterogeneity in the data, and using meta-analysis to combine results from different studies.
Miscellaneous
Lung Segmentation and Disease Classification using Deep Learning
[Code]
Worked on using the Geneva HRCT dataset to segment lungs and classify diseases using Deep Learning. The final model uses a U-Net architecture for segmentation and a CNN for classification. The model was trained on a subset of the dataset and tested on CT scan images obtained externally through collaborations with Hospitals.
Functional Network Analysis of Calcium ion pathways in beta-islets of the Pancreas
[Code]
Worked on using Calcium ion imaging time-series data to extract functional networks in beta-islets of the pancreas. Used
Voronoi Delaunay triangulation (see image) to extract the network, and used graph theory to analyze the network. Code is incomplete and will be updated soon.
This website's template is taken from Jon Barron.
Do not scrape the HTML from this page itself, as it includes analytics tags that you do not
want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll
fork of this page.