Teaching the browser how to chat

@dbozhinovski

About

Does JavaScript for fun and profit
Works at Virtask
Organizes beer.js Skopje
Likes long walks on the beach (not)

What's this about

NLP - the boring stuff
1000 words
Did I mention this is offline?
A small example
Takeaway

First, the boring stuff

NLP

According to Wikipedia (as is the custom):

Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data.

https://en.wikipedia.org/wiki/Natural-language_processing

NLP basics in practice

Largely statistical
Lemmatisation
Part of speech tagging

Example

I am learning about chatbots

Parsing

[I, am, learning, about, chatbots]

Lemmatizing

[I, be, learn, about, chatbots]

PoS tagging

[pronoun, verb, verb, preposition, noun (plural)]

What can be done with the most common 1000 words

English language facts

If the 80-20 rule applies for most things, the ''94-6 rule'' applies when working with language - by Zipfs law:

The top 10 words account for 25% of used language.
The top 100 words account for 50% of used language.
The top 1,000 words account for 80% of used language. (sweet spot)
The top 50,000 words account for 95% of used language.

https://github.com/spencermountain/compromise/wiki/Justification

Enter compromise.js

https://github.com/spencermountain/compromise/wiki/Justification

The process is to get some curated data, find the patterns, and list the exceptions. Bada bing, bada boom. In this way a satisfactory NLP library can be built with breathtaking lightness. Namely, it can be run right on the user's computer instead of a server.

Online and offline NLP

But first, a war story

Compromise.js

The most used 1000 words make ~80% of used English
With those words plus a bit of "magic", we create a good-enough NLP that is able to work offline
Considering it can do plugins, custom lexicons and all sorts of config, we can cover a lot
That said, there are a ton of edge cases best left to linguistics experts and CompSci people
Best of all - it's ~200kb (dictionary included)

const sample = `The journey took us through Rome, Madrid and finally, Paris...`;
// The only part that actually matters :)
const places = nlp(sample).places().out('array');
document.querySelector('.out').innerText = places;

// No dice - compromise has no idea what Bitola and Ohrid even mean :)
const sample = "The journey took us through Skopje, Bitola and finally, Ohrid..."; 
const places = nlp(sample).places().out('array');
document.querySelector('.out').innerText = places;

Stuff compromise.js can do

Verb analysis (tense)
Noun analysis (singular / plural, place, name, organization, unit...)
Dates, Numbers, Values
Tags
Transformations

http://compromise.cool/

Stuff compromise.js sucks at

L A N G U A G E (s)

Now, for something (hopefully) cool

BeerBot

a short demo:

https://beerbot.darko.io

"The Brain"

import nlp from 'compromise';
import skills from './skills/'; // a skill - something that the bot knows how to do
import { get, set } from 'lodash';

const getReply = async (input) => {
  const skillMatch = skills.find((s) => {
    const rules = s.matchRules; // each skill comes with match rules
    const ruleMatch = s.matchRules.find(
      r => nlp(input).normalize().match(r).found
    ); // we look through a skill's match rules, and look for one that 
       // works with the given input

    if (ruleMatch) { // we return the first match we find
      return true;
    }
  });

  console.log(nlp(input).debug()); // VERY useful debugging info
  if (skillMatch) {
    // keep some history, for more fancy stuff
    const topicHistory = get(context, 'topics') || [];
    topicHistory.push(skillMatch.ID);
    set(context, 'topics', topicHistory);
    // reply = a function that gets executed on a skill match
    const reply = await skillMatch.reply(input, context); 
    return reply;
  } else {
    // Otherwise, fall back to something really basic
    return { mode: 'text', value: 'Hi there!' };
  }

};

Skills

import { set, get, random } from 'lodash';

const ID = 'greet'; // The name of the skill

const lexicon = {}; // Custom lexicon / tagging if we happen to need it

// Rule(s) to match input against
// Simple lookup, tagging, logic + full support for regex
const matchRules = [
  '(hi|hello|ahoy|greetings|#Expression) bot?'
];

// Some replies to return
const replies = [
  () => ({ mode: 'text', value: 'Hi there.' }),
  () => ({ mode: 'text', value: 'What\'s up?' }),
];

const reply = (input, context) => {
  // Store some metadata to localStorage
  const timesMatched = get(context, 'greet.matched', 0);
  set(context, 'greet.matched', timesMatched + 1);
  localStorage.setItem('bjs-bot-context', JSON.stringify(context));

  // get a random-ish reply, to avoid being very repetitive
  const replyRoll = random(0, replies.length - 1);

  return replies[replyRoll](input, context); // Return said reply
};

// Export stuff
export default { ID, lexicon, matchRules, reply };

That said, let's make the beerbot smarter!

or, how to make a weather skill

The takeaway

Chatbots aren't rocket science
Text-based experiences can be made better
Offline can be good enough
Browsers are really damn capable these days

Thanks!

github: @dbozhinovski
twitter: @d_bozhinovski