rl403[at]cam.ac.uk
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Xu et al.
arXiv:1502.03044 |
Quantitative Structure Activity Relationship
The basic assumption for all molecule based hypotheses is that similar molecules have similar activities. This principle is also called [Quantitative] Structure–Activity Relationship ([Q]SAR).
”
(thanks wikipedia)
Quantitative
Regression rather than classification
Structure
atoms
bonds
Caffeine
carbon atom
Activity
IC50
Relationship
7.4
A QSAR model is a mapping between a chemical structure and a number.
This may also include a why as well as a what.
Cat detector
Atoms
Functional Groups
Pharmacophores
Raw pixels
Edges
Facial features
Faces
Increasing complexity
x 1000s
number of compounds per protein
0: C
1: C, N, N
2: C, C, C, C, N
8201
5
Repeat for all atoms!
Circular Fingerprints
hash
modulo-2048
1
1
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization
from keras.regularizers import WeightRegularizer
from keras.optimizers import Adam
model = Sequential([
Dense(2000, input_dim=2048, init='he_normal', activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(2000, init='he_normal', activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(581, init='he_normal', activation='sigmoid')])
model.compile(optimizer=Adam(lr=0.0005), loss='binary_crossentropy')
Binary crossentropy:
hist = model.fit(X_train, Y_train,
nb_epoch=250, batch_size=2048,
class_weight=Y_train.sum(axis=0),
callbacks=[ModelCheckpoint('classification.h5')],
validation_data=(X_valid, Y_valid))
precision: 0.6243 recall: 0.6702 f1: 0.6465 mcc: 0.6464 roc_auc: 0.9785 pr_auc: 0.6654
>>> caff_pred = pd.Series(model.predict(caff_fp)[0], index=Y.columns)
>>> caff_pred = caff_pred.sort_values(ascending=True)
target_id
P29275 0.809085 # Adenosine receptor a2b
P29274 0.428097 # Adenosine receptor a2a
P21397 0.295609 # Monoamine oxidase A
P27338 0.241524 # Monoamine oxidase B
P33765 0.076975 # Adenosine receptor A3
dtype: float64
Extracting information from a black box, using a black box!
0 0
1 0
2 0
3 1
4 0
..
2045 0
2046 1
2047 0
Name: paclitaxel, dtype: uint8
0 0
1 0
2 0
3 0
4 0
..
2045 0
2046 1
2047 0
Name: caffeine, dtype: uint8
0.809
0.740
+0.069 difference
Adenosine a2b
Monoamine oxidase
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization
from keras.regularizers import WeightRegularizer
from keras.optimizers import Adam
model = Sequential([
Dense(2000, input_dim=2048, init='he_normal', activation='relu'),
Dropout(0.5),
BatchNormalization(),
Dense(2000, init='he_normal', activation='relu'),
Dropout(0.5),
BatchNormalization(),
Dense(710, init='he_normal', activation='linear')])
model.compile(optimizer=Adam(0.0001), loss=sum_abs_error, metrics=[r2])
Sum absolute error:
Only one problem...
NaN
K.is_nan = T.isnan # tf.is_nan
K.logical_not = lambda x: 1 - x # tf.logical_not
def sum_abs_error(y_true, y_pred):
valids = K.logical_not(K.is_nan(y_true))
costs_with_nan = K.abs(y_true - y_pred)
costs = K.switch(valids, nan_cost, 0)
return K.sum(costs, axis=-1)
hist = model.fit(X_train, Y_train,
nb_epoch=250, batch_size=2048,
class_weight=Y_train.sum(axis=0),
callbacks=[ModelCheckpoint('classification.h5')],
validation_data=(X_valid, Y_valid))
train
valid
test
Neural Network | Random Forest | |
---|---|---|
R2 | 0.722 | 0.528 |
MSE | 0.546 | 0.707 |
Only for Dopamine D2 - receptor
Estimated error in the data ~0.4, so this is good!
model = Sequential([
Dense(2000, input_dim=2048, init='he_normal', activation='relu'),
Dropout(0.5),
BatchNormalization(),
Dense(2000, init='he_normal', activation='relu'),
Dropout(0.5),
BatchNormalization(),
Dense(20, init='he_normal', activation='linear')])
model.compile(optimizer=Adam(0.0001), loss=sum_abs_error)
model.fit(X_train, Y_train[selected_targets])
new_model = Model(inputs=model.inputs[0], outputs=model.layers[-4].output)
new_model.compile('sgd', 'use')
X_deep = new_model.predict(X_train)
Andreas Bender
Günter Klambauer
The Bender group
PyData sponsors