Why chatbots are so bad?

Demystifying Natural Language Processing Used in Chatbots

How do a chatbot understands you?



Chatbots catagories user message (utterance) in intents to guess what action it should take to handle that message 

Information that a chatbot extract form your message

For example

message: "Hi I am Cheuk"
Intent: "Greeting"
Enterty: {"Name": "Cheuk"}
message: "Book a flight to Hong Kong on 8th Dec 2020"
Intent: "Flight booking"
Enterty: {"Destination": "Hong Kong", "Date": T20201208}

Popular NLP models used in NLU

Understanding Intents

catogorizing sentances
➡️ vectoring words and sentances
➡️ finding similarities in vector spaces

Extracting Entities

Named entity recognition (NER):

- hand writen rule

- Statical model + supervised ML

- Unsupervised ML

- Semi-supervised ML

  • Words or phrases from the vocabulary are represented as a vector in a vector space.
  • In this vector space, words have similar meanings are close to each other and calculations like
    “king” – “man” + ”woman” = “queen”
    is valid.
  • models are used to help find the mapping to this vector space.
  • Preserve the context of the word.

Vectorizing (Word Embedding)

Bag-of-word (n-gram):

number of occurances

disregarding grammar and even word order


pre-trained two-layer neural networks

large coprus and high dimension


unsupervised learning algorithm

aggregated global word-word co-occurrence statistics

Supervised learning approaches

  • Patterns with Part of Speach tags
  • (e.g. NLTK and SpaCy)
  • SpaCy create a Knowledge Base with Entity Links
  • Large corpus of taged data are used in training for machine learning models
  • Examples: Hidden Markov Models (HMMs), decision trees, support vector machines (SVMs), and conditional random fields (CRFs)

Unsupervised learning approaches

  • based on the context or on entities’ simultaneous occurrences (co-occurrence)
  • not very accurate

Semi-supervised learning approaches

Machine Learning in NLP

Conditional random field

statisical model -  discriminative undirected probabilistic graphical model

Linear Chain CRF - overcome label bias, great performance on sequence data

Machine Learning in NLP

Canonical correlation analysis (CCA)

investigate the relationship between two variable sets

examine the correlation of variables belonging to different sets

Machine Learning in NLP

Recurent Neural Networks (RNNs)

sequencial data
feed the output of the prvious prediction as inputs

(Bi-directionsl) GRU, LSTM

Long Short-term-memory

  • Recurrent Neural Network – the previous training data affects the next training data
  • Useful for sequential data
  • Have “gates” to control what previous contents to keep


  • context matters
  • bi-directional LSTM trained on a specific task to be able to create those embeddings
  • vast amounts of text data that such a model can learn from without needing labels


  • adopting a “masked language model” concept
  • fine-tuning BERT for feature extraction and NER
  • Understand meaning of each word (Morphology & Lexicology Layer)
  • Understanding the sentence structure (Syntax Layer)
  • Understand the meaning of the sentence (Semantics Layer)
  • Understand the meaning of the sentence in context (Pragmatics Layer)

To Conclude

Have you even jump in the middle of a conversation?


Conversation is difficult to understand

Context are hidden

Infer meanings

Users are unpredictable

Design choice to make things better

Design with intent in mind:

Do the user have a limited intent choices?

Guiding and hinting the users, fallback mechnisms

Collect user data and retrain