Let's
Change

the

game

As a service, we have decisions to make Every Seond

  • What shall we serve?

  • To whom?

  • when?

What shall we serve this customer?

Orange

or

banana

"Educated" Guess

I like orange so people must like orange too. 

Let the customer decide

We have orange and banana, chose whichever you like. 

What is orange? What is banana? 

      Why do I care? 

      Why gave me so many choices? 

 

      Customer:

A/B test!!!

Let's serve orange and banana to equal number of random people. 

three weeks later...

It seems that there are more happy customers served with banana than those served with orange. 

Banana is awesome, let's serve only banana going forward.

 

Other drawback of a/B

 
Oops, I got my statistics wrong. It turns out it's just a coincidence that there are more happy customers served with banana. It's a misinterpretation of the result. 
Let's study the customer, maybe she likes orange better when she is thirsty and banana better when she is hungry.
 
We should observe cues and learn from her reactions. By doing so, we may figure out when we shall serve which.

Deep Q Learning

I noticed that the customer likes orange better after she just had banana, and doesn't like having four bananas in a roll.
 
We can build a multi-course serving plan.

More Deep Q Learning

Is it too good to be possible?

What's new in this algorithm

  • deep Convolution Neural Network

  • Memory Pool

  • Stickiness to old models. 

Proof of Concept

  • Based on a simplified version of DQN
  • Learns to play different genres preferred by the user when engaged with different activities. 
  • Learns to alternate between preferred genres to maximize variety of the music

some Example applications

  • Which recommendation algo/settings to use for this customer now

  • When do we prompt for social sharing, registration, social sharing, etc

  • Auto Equalizer settings 
    "songs on iheart just sound better!" 
    (the success of beats headphones)

  • Multi-step user registration invitation

Recommendation Models

  • Cues:

    • Location, Time, Accelerometer Reading, Recent Play History
  • Actions:

    • ​which recommendation model to use
  • Rewards: 

    • Thumbs up, Favoriting, Share, Listening Time
    • Thumbs down, Skips, Stops 

Prompt Sharing

  • Cues:

    • User share history, recent user interaction 
  • Actions:

    • ​To share or not to share
  • Rewards: 

    • Choose to share
    • Dismissed, App closed

Auto equalizer settings

  • Cues:

    • Acoustic profile, Time, Location, Accelerometer Reading
  • Actions:

    • Which equalizer preset to use
  • Rewards: 

    • Thumbs up, Fave, Share, Listening Time
    • Thumbs down, Skip, Volume Down 
  • MULTI-STEP USER REGISTRATION INVITATION

A salesperson makes a conversation before asking you the question:

 

Do you want one? 

Introducing Shifu

A production ready scala port of the original DQN algorithm from DeepMind

Shifu

Neural Network

Linear Algebra Lib

Agent

Akka Interface

DB

How Do we start?

A/B/C testing!

  1. A for strategy A
  2. B for strategy B
  3. C for Shifu driven decision between A and B

Get the Data in!

and then

Quesitons/Discussion

deck

By Kailuo Wang

deck

  • 958