Building Smart Bots
Hi!
- Hanneli - @hannelita
- Computer Engineer
- Programming
- Electronics
- Math <3 <3
- Physics
- Lego
- Meetups
- Animals
- Coffee
- GIFs
Agenda
-
My first (dummy) bots
-
Initial attempts
-
Cassandra helping to colect data
- Spark helping to analyse it
-
Machine Learning making smart bots
-
Epic Fails
-
Resources
Disclaimer
Code may have personal info, so I will ommit sensitive content
Introductory content
Show the main ideas
Many GIFs
Disclaimer
This is not a talk about popular recent bot libraries (such as botkit)
Why bots?
Several actions can be automated
And become way better
Examples
- Chat responses
- Reminders
- Default answers
- Discard unimportant emails
- Social media feed
- Collect GIFs
- Announce meetups
- etc
My first (dumb) bots
- Cron job on Linux
- Periodically send troll messages in Turkish to undergrad students mail group (admins thought system had been hacked)
The troll-less case: Hangouts
People talking to me
- Ask questions
- Ask more questions!
- Randomly talk to me, since I was a teacher at the time
Demanding too much efforts
Options
- Impolitely ignore
- Spend too much time answering
Automatically answer.
How can I automate a chat?
Take #1
Automatically respond to everyone
Hey Hanneli, sup?
Hi
Can you help me with this algebraic equation?
Hi
Évariste
Nope.
Create contextual answers.
Take #2
Hey Hanneli, sup?
Sure, could it be later?
Can you help me with this algebraic equation?
Bot analysis: It's a greeting
Bot analysis: It's a question
Hi
Évariste
It's better.
Hey Hanneli, sup?
Sure, could it be later?
Did you see the movie yesterday?
Bot analysis: It's a greeting
Bot analysis: It's a question
Hi
Évariste
if {
} else {
if {
} else {
if {
} else {
if {
...
Take #3
Refine the cases and answers
Need something smarter to handle the answers.
HI HONEY
Sure, could it be later?
DID YOU MAKE UP YOUR ROOM?
Bot analysis: It's a greeting
Bot analysis: It's a question
Hi
Mom
Take #4
Select people to *not* generate automatic answers. ("whitelist")
New Contacts
Take #5
It's a lot of information
Brute force may not be good.
You can loose a lot of info
Hey Hanneli, sup?
Sure, could it be later?
Can I send you cool GIFs?
Bot analysis: It's a greeting
Bot analysis: It's a question
Hi
Évariste
Analyse previous chats and copy behaviours
Take #6
Read all my chat records and collect data
chat_timestamp | sender | message |
---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... |
2015-08-27-14:11:00 | Hanneli T. | Certo |
chat_timestamp | sender | message | message_category |
---|---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... | Informative |
2015-08-27-14:11:00 | Hanneli T. | Certo | Motivational |
chat_timestamp | sender | message | message_category |
---|---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... | Informative |
2015-08-27-14:11:00 | Hanneli T. | Certo | Motivational |
2015-10-05-14:19:00 | Leandro P. | https://i.imgur.com/sSOmKqw.gifv | GIF |
Cassandra can store this information <3
chat_timestamp | sender | message | message_category |
---|---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... | Informative |
2015-08-27-14:11:00 | Hanneli T. | Certo | Motivational |
2015-10-05-14:19:00 | Leandro P. | https://i.imgur.com/sSOmKqw.gifv | GIF |
chat_by_category
chat_timestamp | sender | message | language |
---|---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... | Portuguese |
2015-08-27-14:11:00 | Évariste G. | Je ne sais... | French |
2015-10-05-14:19:00 | Anton T. | Привіт | Ukrainian |
chat_by_language
chat_timestamp | sender | message | allow | message_category |
---|---|---|---|---|
2015-08-27-14:11:00 | Ilma R. | Só agora... | Yes | Informative |
2015-08-27-14:11:00 | Évariste G. | Je ne sais... | Yes | Casual |
2015-10-05-14:19:00 | Anton T. | Привіт | Yes | Casual |
chat_by_allow
contact_id | name | |||
---|---|---|---|---|
1 | Ilma R. | ilma@ilma.com | _ | ilma.k.r |
contacts
contact_id | name | I_follow_tw | my_friend_fb | |
---|---|---|---|---|
1 | Ilma R. | ilma@ilma.com | yes | yes |
contact_tracking
Cool! I have data!
Quick Q/A
How did you fill up "message_category" column?
Basically an ugly parser with custom parameters that apply to myself; written in Ruby and Python
Example: URL ending with .gif, .gifv, .webm is categorised as GIF.
Quick Q/A
Where do you run this code?
Initially local. Then I went to Amazon.
API reads chat
Hangouts
Parse and create objects
Write in Cassandra
Existing messages:
Quick Q/A
API reads chat
Hangouts
Parse and create objects
Write in Cassandra
New messages (after seeding Cassandra)
Reads from Cassandra
Produces enhanced object
Response
Quick Q/A
API reads chat
Hangouts
Parse and create objects
Write in Cassandra
New messages - Improving
Reads from Cassandra
Produces enhanced object
Response
Queue
Queue
Quick Q/A
What is this "Enhanced Object?"
An object with extra inferred parameters built with Cassandra information. Example: Do I follow the person on Twitter? Yes - Do not generate auto-answer. No - Electable to generate an auto reply
No formalism applied.
Great. How can I automate this EVEN MOAR?
Hey Hanneli, sup?
Did you see my Pull Request?
Hi
Évariste
I want to reply "Yes I saw it!"
Need to collect information from Github.
Hey Hanneli, sup?
Where is the meetup today?
Hi
Évariste
I want to reply "Paulista Avenue, 2001"
Need to collect information from Meetup.
Hey Hanneli, sup?
Where are the links for your presentation slides?
Hi
Évariste
I want to reply "slides.com/smart-bots"
Need to collect information Slides. And Calendar.
The more information I collect, the more precise are the responses
Too much data for my "home-made" analyser
Spark comes to rescue :)
Take #7
Analyse the information that I have on Cassandra with spending too much time on boilerplate stuff
Cassandra Connector for Spark - https://github.com/datastax/spark-cassandra-connector
From table to RDD (Resilient Data Model)
Instant benefits from Spark
- Simplify my ugly parsers
API reads chat
Hangouts
*Raw* data to spark
Write raw in Cassandra
text.flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _)
Instant benefits from Spark
- Simplify my ugly parsers
Significant decrease for writing time!
Instant benefits from Spark
- Simplify my ugly code to categorise
API reads chat
Hangouts
*Raw* data
Write raw in Cassandra
filter and process with Spark
Produces enhanced object
Response
Queue
Queue
Good.
Hey Hanneli, sup?
What time is the meetup?
Hi
Évariste - Monday, 8:45
7PM
Where is it?
Paulista Ave, 2001
Hey Hanneli, sup?
What time is the meetup?
Hi
Évariste - Wednesday, 7:45
7PM
Where is it?
Paulista Ave, 2001
Hey Hanneli, sup?
What time is the meetup?
Hi
Évariste - Wednesday, 7:45
7PM
Where is it?
Paulista Ave, 2001
He always asks where is the meetup.
Hey Hanneli, sup?
What time is the meetup?
Hi
Évariste - Wednesday, 7:45
7PM, at Paulista Ave, 2001
Great response.
Learn to improve the answers.
Machine Learning
Take #8
Supervised
Unsupervised
Reinforcement
Machine Learning
Take #8
Supervised
Unsupervised
Reinforcement
Supervised model
Predict - He asked "what time", he will ask "where"
Person started conversation complaining, he/she might be angry
Reinforcement
Person started conversation with trivial question that could be found on Google. Should I answer or ignore it?
Analyse the environment.
Mathematics to make things better
- Regression models
- Generate equations to parametrise behaviours
- Matrixes
- Analytical Geometry
- Topology (!!)
Which topics are you considering?
- Have I had any previous contact with the person on the internet?
- Does this person have Twitter? (quick search before deciding; 123.people is a good start)
- Does it look like Spam?
- Is he/she trying to contact me in other social media?
- Did he/she try to send me emails?
- Does the message have any blacklisted sentences? *
* Custom list based on my context.
There is no default cookbook for the algorithms, but the idea may be applied in other areas
With Machine Learning
API reads chat
Hangouts
*Raw* data
Write raw in Cassandra
filter and process with Spark
Produces enhanced object
Response
Queue
Queue
Machine Learning Processor
Write in Cassandra
Reads from Cassandra
MOAR bots
Take #9
Meetups
- Answer messages
- Tweet and announce
- Find venues
- Schedule new events
Motivational bots
- Send positive messages!
Github activity bot
- Monitor my commits, terminal actions
- Monitor my new PRs, automatically comments on some tasks on Pivotal
- Monitor build status
- Comments on PRs
Email responder
- Automatically answer some emails
- Search for new Spam
- Automatically create filers to Ads
Epic Fails
Take #10
Because bots are not so perfect (yet)
API reads chat
Hangouts
*Raw* data to spark
Write raw in Cassandra
filter and process with Spark
Produces enhanced object
Response
Queue
Queue
Endpoint A
Endpoint B
Take #11
Swiping the endpoint addresses by mistake.
API reads chat
Hangouts
*Raw* data to spark
Write raw in Cassandra
filter and process with Spark
Produces enhanced object
Response
Queue
Queue
Endpoint B
Endpoint A
Take #12
Weird answers. Friend called Paul Regus (PR)
Paul != Pull Request (PR)
Check if message sender is a developer before evaluating the 'PR'.
Take #13
Trust Problem
Hey Hanneli
Hi
It that you?
Yes
How can I be sure it's not the bot?
It is not
Take #14
Topics and references
- XMPP
- Apps Script
- Parser and Regex
- Spark + Cassandra course
- Presentation
- ML free content
- Quora Thread for ML
- Event driven, notifications.
- Math and computer theory may be helpful
Take #15
Discussion points
- Security - it is sensitive information! OAuth, cryptography and 2FA are a must.
- Bias - How can you be sure you are being impartial? Should we be impartial all the time?
- Remove all possible sensitive info and open source it; how can I determine if all algorithms that are specific to myself were removed?
- Migrate legacy code to new languages
- Functional Programming makes things much more clear
- Human factors - people get super sad/angry when they find out it was a bot.
Special Thanks
- @planetcassandra
- @lafp, @romulostorel and @pedrofelipee (GIFs)
- Bots <3
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
Cassandra Brussels meetup - Smart Bots
By Hanneli Tavante (hannelita)
Cassandra Brussels meetup - Smart Bots
- 3,409