Internship at LLBio-IT School

Published on Mar 10, 2022

“Data is the new science. Big Data holds the answers.” - Pat Gelsinger

Introduction

From January 2022 to March 2022, I participated in a virtual merit internship with LLBio-IT School – Nyberman, where I focused on analyzing COVID-19 data from both a statistical and computational perspective. This involved deep research of potential drug targets, molecular modelling, and docking simulations of ligands with the spike glycoprotein of Coronavirus variants.

Progress

Review of Literature

The first task was a comprehensive review of existing research on potential drug targets, focusing on the Spike Glycoprotein of Coronavirus variants. This laid the groundwork for identifying candidate ligands for further analysis. After analysis of available resources, the following targets were selected: Paracetamol, Cefuroxime, Ampicillin, and Ceftriaxone.

Data Analysis

Using various bioinformatics tools and databases, I analyzed the spike glycoprotein sequences of different Coronavirus strains, including SARS-CoV-2, MERS-CoV, and HCoV-229E. This involved multiple sequence alignment, SIFT scores, and phylogenetic analysis to identify conserved regions and potential drug binding sites. For more details on this part, refer to the medium article.

Molecular Modelling and Docking

I engaged in molecular modelling to visualize the 3D structures of the spike glycoprotein across different variants, using databases like SWISS-PROT and tools such as Procheck. Docking simulations were performed to evaluate the binding affinities of identified ligands, using Pymol, Autodock, and Autogrid to model and analyze interactions.

Deep Learning Applications

Although not a primary focus of my internship, I explored the potential of using AlphaFold2 for molecular modelling to predict protein structures, which is detailed in a separate article linked in the references.

Results

The project concluded with successful docking simulations that identified potential drugs based on their binding energies. These included Cefuroxime and Ampicillin, among others, showing notable interactions with the spike protein of various Coronavirus strains. Visual models and binding site images were created to illustrate these interactions.

Technology Learned/Used

During this internship, I deepened my proficiency in:

Python for data processing and analysis
Pandas and Matplotlib for statistical analysis
AutoDock&AutoGrid for molecular docking simulations
SWISS-PROT and Procheck for molecular modelling
AlphaFold2 for protein structure prediction

Conclusion

This internship was an invaluable experience that not only enhanced my skills in data analysis and molecular biology but also provided a deep dive into the challenges and methodologies of drug development in response to pandemic outbreaks. The insights gained are applicable broadly across the field of bioinformatics.