practical learning in A.I.M.

STO

IR

send at best time on contact level

send best incentive on contact level

12 possibilities

5-10 possibilities

best:
most opens

best:
most purchase

STO

IR

each contact has an open probability at every slot

each incentive has an effect
on buying prob

stability
over time

standard A/B test

choose best

0:00

02:00

22:00

20:00

...

02:00

standard A/B test

0:00

02:00

22:00

20:00

...

0:00

02:00

20:00

22:00

0:00

02:00

choose best

02:00

standard A/B test

choose best

exploration

exploitation

bandit algo

combines
exploration & exploitation

5%: randomly

95%: to currently best

simple bandit algo

STO

IR

send at best time on contact level

send best incentive on contact level

best:
most opens
on the long run

best:
most purchase
on the long run

bayesian bandit algo

good on the long run

probability of choosing one option is the probability
that it is the best

Beta distribution

probabilities of true open probabilities?

observed: 5/10

what we think based on your history?

28/30 >> 25/30

25/30 >> 1/1

what we think based on your history?

1 learning step

5 open / 10 send

5 open / 11 send

6 open / 11 send

STO 1 learning step

3 open / 10 send

3 open / 11 send

4 open / 11 send

IR 1 learning step

5% buying prob increase

4% buying prob increase

6% buying prob increase

?

several learning steps

5 open / 10 send

25 open / 50 send

+ prior knowledge

open probability: estimate 50%

Emarsys open probability: estimate 17%

Lesara 02:00 open probability: estimate 30%

how to recommend?

probability of choosing one option is the probability that it is the best

sample from each distribution, then choose option
with highest sample value

fast, simple

sample from Beta?

not trivial --> need for UDF

limitations

1. not applicable at all

2. applicable, but time would be important factor

truly personal ?!

STO: learning on contact level

IR: learning on account level
truly personal: buying prob

STO and IR learning

By Czeller Ildi

STO and IR learning

how does the algo behind STO and IR learns from new information and how does it use its available information?

1,291

Czeller Ildi

czeildi

practical learning in A.I.M.

STO

IR

send at best time on contact level

send best incentive on contact level

12 possibilities

5-10 possibilities

best: most opens

best: most purchase

STO

IR

each contact has an open probability at every slot

each incentive has an effect on buying prob

stability over time

stability over time

standard A/B test

choose best

standard A/B test

standard A/B test

choose best

exploration

exploitation

bandit algo

combines exploration & exploitation

5%: randomly

95%: to currently best

simple bandit algo

STO

IR

send at best time on contact level

send best incentive on contact level

best: most opens on the long run

best: most purchase on the long run

bayesian bandit algo

good on the long run

probability of choosing one option is the probability that it is the best

Beta distribution

probabilities of true open probabilities?

observed: 5/10

what we think based on your history?

28/30 >> 25/30

25/30 >> 1/1

what we think based on your history?

1 learning step

5 open / 10 send

5 open / 11 send

6 open / 11 send

STO 1 learning step

3 open / 10 send

3 open / 11 send

4 open / 11 send

IR 1 learning step

5% buying prob increase

4% buying prob increase

6% buying prob increase

?

several learning steps

5 open / 10 send

25 open / 50 send

+ prior knowledge

open probability: estimate 50% Emarsys open probability: estimate 17% Lesara 02:00 open probability: estimate 30%

how to recommend?

probability of choosing one option is the probability that it is the best

sample from each distribution, then choose option with highest sample value

fast, simple

sample from Beta?

not trivial --> need for UDF

limitations

1. not applicable at all

2. applicable, but time would be important factor

truly personal ?!

STO: learning on contact level

IR: learning on account level truly personal: buying prob

STO and IR learning

STO and IR learning

Czeller Ildi

More from Czeller Ildi

best:
most opens

best:
most purchase

each incentive has an effect
on buying prob

stability
over time

stability
over time

combines
exploration & exploitation

best:
most opens
on the long run

best:
most purchase
on the long run

probability of choosing one option is the probability
that it is the best

open probability: estimate 50%

Emarsys open probability: estimate 17%

Lesara 02:00 open probability: estimate 30%

sample from each distribution, then choose option
with highest sample value

IR: learning on account level
truly personal: buying prob