Learning the Language of Chemistry

How Deep Learning is shaping up the Drug Discovery Process

Traditional Drug Discovery Process

Traditional Drug Discovery Process

Drug Discovery 101

Drug Discovery 101

Cue Deep Learning!

NLP In Generative Chemistry

Different representations

Different representations

Transformers to the Rescue

Transformers to the Rescue

Transformers to the Rescue

Encoder Only

Transformers to the Rescue

Decoder Only

Transformers to the Rescue

Encoder-Decoder

Best of both worlds!

Mega-Mol-BART

Mega-Mol-BART

Results

Results

Future Direction

QUESTIONS ?

BabootaRahul

rahulbaboota

By Rahul Baboota

Learning the Language of Chemistry

An Introduction to the different methods of Feature Selection in Python .

Computer Science Junior

Learning the Language of Chemistry

Traditional Drug Discovery Process

Traditional Drug Discovery Process

​Takes 10+ years for the development of a single drug!

Lower estimates for the development of a drug is roughly ~$2B!

Drug Discovery 101

Target : A compound that is associated with a disease and it is the desired therapy destination.

Lead : A molecule that shows remedial promise but it is not fully optimized.

Drug : An optimized lead leading to desired results.

Drug Discovery 101

Cue Deep Learning!​

NLP In Generative Chemistry

Different representations

Different representations

Transformers to the Rescue

Transformers have revolutionized NLP tasks.

Championed the field of Self-Supervised learning.

SOTA in:

Transformers to the Rescue

The famous Transformer architecture!

The daunting model has two broad parts:

Encoder : Learn the latent embeddings.

Decoder : Generate output auto-regressively.

Transformers to the Rescue

Encoder Only

BERT based models.

Uses contexts bidirectionally for effective learning.

Proposed Masked Language Modeling (MLM).

Transformers to the Rescue

Decoder Only

GPT based models.

Unidirectional Autoregressive model.

Transformers to the Rescue

Encoder-Decoder

Best of both worlds!

Mega-Mol-BART

Developed at NVIDIA in collaboration with AstraZeneca.

Uses the Megatron framework (Mega)

Trained on ~500M molecules (Mol)

Uses the Bidirectional and Autoregressive Transformer architecture (BART)

Mega-Mol-BART

500M molecules sampled randomly from ZINC15

BART

Results

Results

Usage:

Results:

Results

Future Direction

Train even bigger models

Expand to more downstream fine-tuning tasks.

BioNemo!

More from Rahul Baboota

Takes 10+ years for the development of a single drug!

Cue Deep Learning!