Building Smart Bots

Hi!

  • Hanneli - @hannelita
  • Computer Engineer
  • Programming
  • Electronics
  • Math <3 <3
  • Physics
  • Lego
  • Meetups
  • Animals
  • Coffee
  • GIFs
 

Agenda

  • My first (dummy) bots

  • Initial attempts

  • Cassandra helping to colect data

  • Spark helping to analyse it
  • Machine Learning making smart bots

  • Epic Fails

  • Resources

Disclaimer

Code may have personal info, so I will ommit sensitive content

Introductory content

Show the main ideas

Many GIFs

Why bots?

Several actions can be automated

And become way better

Examples

  • Chat responses
  • Reminders
  • Default answers
  • Discard unimportant emails
  • Social media feed
  • Collect GIFs
  • Announce meetups
  • etc

My first (dumb) bots

  • Cron job on Linux
  • Periodically send troll messages in Turkish to undergrad students mail group (admins thought system had been hacked)

The troll-less case: Hangouts

People talking to me

  • Ask questions
  • Ask more questions!
  • Randomly talk to me, since I was a teacher at the time

Demanding too much efforts

Options

  • Impolitely ignore
  • Spend too much time answering

Automatically answer.

How can I automate a chat?

Take #1

Automatically respond to everyone

Hey Hanneli, sup?

Hi

Can you help me with this algebraic equation?

Hi

Évariste

Nope.

Create contextual answers.

Take #2

Hey Hanneli, sup?

Sure, could it be later?

Can you help me with this algebraic equation?

Bot analysis: It's a greeting

Bot analysis: It's a question

Hi

Évariste

It's better.

Hey Hanneli, sup?

Sure, could it be later?

Did you see the movie yesterday?

Bot analysis: It's a greeting

Bot analysis: It's a question

Hi

Évariste

if {
} else {
  if {
  } else {
    if {
    } else {
      if {
...

Take #3

Refine the cases and answers

Need something smarter to handle the answers.

HI HONEY

Sure, could it be later?

DID YOU MAKE UP YOUR ROOM?

Bot analysis: It's a greeting

Bot analysis: It's a question

Hi

Mom

Take #4

Select people to *not* generate automatic answers. ("whitelist")

New Contacts

Take #5

It's a lot of information

Brute force may not be good.

You can loose a lot of info

Hey Hanneli, sup?

Sure, could it be later?

Can I send you cool GIFs?

Bot analysis: It's a greeting

Bot analysis: It's a question

Hi

Évariste

Analyse previous chats and copy behaviours

Take #6

Read all my chat records and collect data

chat_timestamp sender message
2015-08-27-14:11:00 Ilma R. Só agora...
2015-08-27-14:11:00 Hanneli T.  Certo
chat_timestamp sender message message_category
2015-08-27-14:11:00 Ilma R. Só agora... Informative
2015-08-27-14:11:00 Hanneli T.  Certo Motivational
chat_timestamp sender message message_category
2015-08-27-14:11:00 Ilma R. Só agora... Informative
2015-08-27-14:11:00 Hanneli T.  Certo Motivational
2015-10-05-14:19:00 Leandro P. https://i.imgur.com/sSOmKqw.gifv GIF

Cassandra can store this information <3

chat_timestamp sender message message_category
2015-08-27-14:11:00 Ilma R. Só agora... Informative
2015-08-27-14:11:00 Hanneli T.  Certo Motivational
2015-10-05-14:19:00 Leandro P. https://i.imgur.com/sSOmKqw.gifv GIF

chat_by_category

chat_timestamp sender message language
2015-08-27-14:11:00 Ilma R. Só agora... Portuguese
2015-08-27-14:11:00 Évariste G.  Je ne sais... French
2015-10-05-14:19:00 Anton T. Привіт Ukrainian

chat_by_language

chat_timestamp sender message allow message_category
2015-08-27-14:11:00 Ilma R. Só agora... Yes Informative
2015-08-27-14:11:00 Évariste G.  Je ne sais... Yes Casual
2015-10-05-14:19:00 Anton T. Привіт Yes Casual

chat_by_allow

contact_id name email twitter facebook
1 Ilma R. ilma@ilma.com _ ilma.k.r

contacts

contact_id name email I_follow_tw my_friend_fb
1 Ilma R. ilma@ilma.com yes yes

contact_tracking

Cool! I have data! 

Quick Q/A

How did you fill up "message_category" column?

Basically an ugly parser with custom parameters that apply to myself 

Example: URL ending with .gif, .gifv, .webm is categorised as GIF.

Quick Q/A

Where do you run this code?

Initially local. Then I went to Amazon.

API reads chat

Hangouts

Parse and create objects

Write in Cassandra

Existing messages:

Quick Q/A

API reads chat

Hangouts

Parse and create objects

Write in Cassandra

New messages (after seeding Cassandra)

Reads from Cassandra

Produces enhanced object 

Response

Quick Q/A

API reads chat

Hangouts

Parse and create objects

Write in Cassandra

New messages - Improving

Reads from Cassandra

Produces enhanced object 

Response

Queue

Queue

Quick Q/A

What is this "Enhanced Object?"

An object with extra inferred parameters built with Cassandra information. Example: Do I follow the person on Twitter? Yes - Do not generate auto-answer. No - Electable to generate an auto reply 

No formalism applied. 

Great. How can I automate this EVEN MOAR?

Hey Hanneli, sup?

Did you see my Pull Request?

Hi

Évariste

I want to reply "Yes I saw it!"

Need to collect information from Github.

Hey Hanneli, sup?

Where is the meetup today?

Hi

Évariste

I want to reply "Paulista Avenue, 2001"

Need to collect information from Meetup.

Hey Hanneli, sup?

Where are the links for your presentation slides?

Hi

Évariste

I want to reply "slides.com/smart-bots"

Need to collect information Slides. And Calendar.

The more information I collect, more precise are the responses

Too much data for my "home-made" analyser

Spark comes to rescue :)

Take #7

Analyse the information that I have on Cassandra with spending too much time on boilerplate stuff

Cassandra Connector for Spark - https://github.com/datastax/spark-cassandra-connector

From table to RDD (Resilient Data Model)

Instant benefits from Spark

  • Simplify my ugly parsers

API reads chat

Hangouts

*Raw* data to spark

Write raw in Cassandra

text.flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _)

Instant benefits from Spark

  • Simplify my ugly parsers

Significant decrease for writing time!

Instant benefits from Spark

  • Simplify my ugly code to categorise

API reads chat

Hangouts

*Raw* data to spark

Write raw in Cassandra

filter and process with Spark

Produces enhanced object 

Response

Queue

Good.

Hey Hanneli, sup?

What time is the meetup?

Hi

Évariste - Monday, 8:45

7PM

Where is it?

Paulista Ave, 2001

Hey Hanneli, sup?

What time is the meetup?

Hi

Évariste - Wednesday, 7:45

7PM

Where is it?

Paulista Ave, 2001

Hey Hanneli, sup?

What time is the meetup?

Hi

Évariste - Wednesday, 7:45

7PM

Where is it?

Paulista Ave, 2001

He always asks where is the meetup.

Hey Hanneli, sup?

What time is the meetup?

Hi

Évariste - Wednesday, 7:45

7PM, at Paulista Ave, 2001

Great response.

Learn to improve the answers.

Machine Learning

Take #8

Supervised

Unsupervised

Reinforcement

Machine Learning

Take #8

Supervised

Unsupervised

Reinforcement

Supervised model

Predict - He asked "what time", he will ask "where"

Person started conversation complaining, he/she might be angry

Reinforcement

Person started conversation with trivial question that could be found on Google. Should I answer or ignore it? 

Analyse the environment.

Mathematics to make things better

  • Regression models
  • Generate equations to parametrise behaviours
  • Matrixes
  • Analytical Geometry
  • Topology (!!)

Which topics are you considering?

  • Have I had any previous contact with the person on the internet?
  • Does this person have Twitter? (quick search before deciding; 123.people is a good start)
  • Does it look like Spam? 
  • Is he/she trying to contact me in other social media?
  • Did he/she try to send me emails?
  • Does the message have any blacklisted sentences? *

* Custom list based on my context.

There is no default cookbook for the algorithms, but the idea may be applied in other areas

MOAR bots

Take #9

Meetups

  • Answer messages
  • Tweet and announce
  • Find venues
  • Schedule new events

Motivational bots

  • Send positive messages!

Github activity bot

  • Monitor my commits, terminal actions
  • Monitor my new PRs, automatically comments on some tasks on Pivotal
  • Monitor build status
  • Comments on PRs

Email responder

  • Automatically answer some emails
  • Search for new Spam
  • Automatically create filers to Ads

Epic Fails

Take #10

Because bots are not so perfect (yet)

API reads chat

Hangouts

*Raw* data to spark

Write raw in Cassandra

filter and process with Spark

Produces enhanced object 

Response

Queue

Queue

Endpoint A

Endpoint B

Take #11

Swiping the endpoint addresses by mistake.

API reads chat

Hangouts

*Raw* data to spark

Write raw in Cassandra

filter and process with Spark

Produces enhanced object 

Response

Queue

Queue

Endpoint B

Endpoint A

Take #12

Weird answers. Friend called Paul Regus (PR)

Paul != Pull Request (PR)

Check if message sender is a developer before evaluating the 'PR'.

Take #13

Trust Problem

Hey Hanneli

Hi

It that you?

Yes

How can I be sure it's not the bot?

It is not

Take #14

Topics and references

Take #15

Discussion points

  • Security - it is sensitive information! OAuth, cryptography and 2FA are a must.
  • Bias - How can you be sure you are being impartial? Should we be impartial all the time? 
  • Remove all possible sensitive info and open source it; how can I determine if all algorithms that are specific to myself were removed?
  • Migrate legacy code to new languages
  • Functional Programming makes things much more clear
  • Human factors - people get super sad/angry when they find out it was a bot.

Special Thanks

 

  • @wheresLINA
  • @planetcassandra
  • @lafp, @romulostorel        and @pedrofelipee       (GIFs)
  • Bots <3

Thank you :)

Questions?

 

hannelita@gmail.com

@hannelita