Teaching the browser how to chat

@dbozhinovski

About

  • Does JavaScript for fun and profit
  • Works at Virtask
  • Organizes beer.js Skopje
  • Likes long walks on the beach (not)

What's this about

  1. NLP - the boring stuff
  2. 1000 words
  3. Did I mention this is offline?
  4. A small example
  5. Takeaway 

First, the boring stuff

NLP

According to Wikipedia (as is the custom):

 

Natural-language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data.

 

 

https://en.wikipedia.org/wiki/Natural-language_processing

NLP basics in practice

  • Largely statistical
  • Lemmatisation
  • Part of speech tagging

Example

I am learning about chatbots

Parsing

[I, am, learning, about, chatbots]

Lemmatizing

[I, be, learn, about, chatbots]

PoS tagging

[pronoun, verb, verb, preposition, noun (plural)]

What can be done with the most common 1000 words

English language facts

If the 80-20 rule applies for most things, the ''94-6 rule'' applies when working with language - by Zipfs law:

  • The top 10 words account for 25% of used language.
  • The top 100 words account for 50% of used language.
  • The top 1,000 words account for 80% of used language. (sweet spot)
  • The top 50,000 words account for 95% of used language.

https://github.com/spencermountain/compromise/wiki/Justification

Enter compromise.js

https://github.com/spencermountain/compromise/wiki/Justification

The process is to get some curated data, find the patterns, and list the exceptions. Bada bing, bada boom. In this way a satisfactory NLP library can be built with breathtaking lightness. Namely, it can be run right on the user's computer instead of a server.

Online and offline NLP

But first, a war story

Compromise.js

  • The most used 1000 words make ~80% of used English
  • With those words plus a bit of "magic", we create a good-enough NLP that is able to work offline
  • Considering it can do plugins, custom lexicons and all sorts of config, we can cover a lot
  • That said, there are a ton of edge cases best left to linguistics experts and CompSci people
  • Best of all - it's ~200kb (dictionary included)
const sample = `The journey took us through Rome, Madrid and finally, Paris...`;
// The only part that actually matters :)
const places = nlp(sample).places().out('array');
document.querySelector('.out').innerText = places;
// No dice - compromise has no idea what Bitola and Ohrid even mean :)
const sample = "The journey took us through Skopje, Bitola and finally, Ohrid..."; 
const places = nlp(sample).places().out('array');
document.querySelector('.out').innerText = places;

Stuff compromise.js can do

  • Verb analysis (tense)
  • Noun analysis (singular / plural, place, name, organization, unit...)
  • Dates, Numbers, Values
  • Tags
  • Transformations

http://compromise.cool/

Stuff compromise.js sucks at

L A N G U A G E (s)

Now, for something (hopefully) cool

BeerBot

a short demo:

https://beerbot.darko.io

"The Brain"

import nlp from 'compromise';
import skills from './skills/'; // a skill - something that the bot knows how to do
import { get, set } from 'lodash';

const getReply = async (input) => {
  const skillMatch = skills.find((s) => {
    const rules = s.matchRules; // each skill comes with match rules
    const ruleMatch = s.matchRules.find(
      r => nlp(input).normalize().match(r).found
    ); // we look through a skill's match rules, and look for one that 
       // works with the given input

    if (ruleMatch) { // we return the first match we find
      return true;
    }
  });

  console.log(nlp(input).debug()); // VERY useful debugging info
  if (skillMatch) {
    // keep some history, for more fancy stuff
    const topicHistory = get(context, 'topics') || [];
    topicHistory.push(skillMatch.ID);
    set(context, 'topics', topicHistory);
    // reply = a function that gets executed on a skill match
    const reply = await skillMatch.reply(input, context); 
    return reply;
  } else {
    // Otherwise, fall back to something really basic
    return { mode: 'text', value: 'Hi there!' };
  }

};

Skills

import { set, get, random } from 'lodash';

const ID = 'greet'; // The name of the skill

const lexicon = {}; // Custom lexicon / tagging if we happen to need it

// Rule(s) to match input against
// Simple lookup, tagging, logic + full support for regex
const matchRules = [
  '(hi|hello|ahoy|greetings|#Expression) bot?'
];

// Some replies to return
const replies = [
  () => ({ mode: 'text', value: 'Hi there.' }),
  () => ({ mode: 'text', value: 'What\'s up?' }),
];

const reply = (input, context) => {
  // Store some metadata to localStorage
  const timesMatched = get(context, 'greet.matched', 0);
  set(context, 'greet.matched', timesMatched + 1);
  localStorage.setItem('bjs-bot-context', JSON.stringify(context));

  // get a random-ish reply, to avoid being very repetitive
  const replyRoll = random(0, replies.length - 1);

  return replies[replyRoll](input, context); // Return said reply
};

// Export stuff
export default { ID, lexicon, matchRules, reply };

That said, let's make the beerbot smarter!

or, how to make a weather skill

The takeaway

  1. Chatbots aren't rocket science
  2. Text-based experiences can be made better
  3. Offline can be good enough
  4. Browsers are really damn capable these days

Thanks!

  • github: @dbozhinovski
  • twitter: @d_bozhinovski

Questions?