IBM Watson Natural Language Processing

Upkar Lidder

Developer Advocate, IBM 

 

> ulidder@us.ibm.com

> @lidderupk

> upkarlidder.com

 

IBM Developer

Workshop Material - http://bit.ly/hacker-nlp

Wifi - HD-Events / oz5l8x

Prerequisites

@lidderupk
IBM Developer

1. Create IBM Cloud Account using THIS URL

3. If you already have an account, use the above URL to sign into your IBM Cloud account.

2. Check your email and activate your account. Once activated, log back into your IBM Cloud account using the link above.

http://bit.ly/hacker-ibm

Call for Code 2019

@lidderupk
IBM Developer

Call for Code 2019 - Get started quickly

https://developer.ibm.com/callforcode/

  • Background
  • Research
  • Data
  • Videos
  • Solutions
  • Blogs/Articles
  • Code patterns
@lidderupk
IBM Developer

IBM Code Patterns

@lidderupk
IBM Developer

NL on IBM Cloud

@lidderupk
IBM Developer

Watson Assistant

Watson Assistant lets you build conversational interfaces into any application, device, or channel.

Watson Studio

Everything AI and ML with IBM. Project based development environment with access to assets like Cloud Object Store, Machine Learning Services, AutoAI, SPSS, Jupyter Notebooks, etc.

Discovery

Cognitive search and content analysis in your unstructured documents.

Natural Language Classifier

Create custom classifier for your domain data. You create a classifier instance by providing a set of representative strings and a set of one or more correct classes for each training.

NL on IBM Cloud

@lidderupk
IBM Developer

Watson Studio & Watson Machine Learning

@lidderupk
IBM Developer

Watson Studio

IBM Watson Studio 

@lidderupk
IBM Developer

IBM Watson Studio 

@lidderupk
IBM Developer

Watson Custom Speech to Text Model

@lidderupk
IBM Developer

Workshop - Goals

@lidderupk
IBM Developer
Successfully Create, Store and Deploy a Natural Language Classifier Model on IBM Cloud using Watson Studio and Watson Machine Learning Services.
The Classifier will categorize an email as ham or spam.

Question - ham or spam

@lidderupk
IBM Developer

Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!

Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?

Will purchase d stuff today and mail to you. Do you have a po box number?

I won a coupon for $400. I am happy.

Question - ham or spam

@lidderupk
IBM Developer

Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!

Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?

Will purchase d stuff today and mail to you. Do you have a po box number?

I won a coupon for $400. I am happy.

Supervised Learning - or Unsupervised Learning

@lidderupk
IBM Developer

Our model by end of the workshop  - ham or spam

@lidderupk
IBM Developer

Preprocessing for NLP  - remove "noise"

@lidderupk
IBM Developer

Tokenization

Input: Friends, Romans, Countrymen, lend me your ears;

Friends

Output:

Romans

Countrymen

lend

me

your

ears

Remove Stop Words

Noise

<div class="sl-block-content" data-placeholder-tag="p" data-placeholder-text="Text" style="z-index: 15;">
   <p>
      <span style="font-size:1.0em">
         <span style="text-align:start">Input: Friends, Romans, Countrymen, lend me your ears; </span>
      </span>
   </p>
</div>

Preprocessing for NLP  - remove "noise"

@lidderupk
IBM Developer

Stemming & Lemmatization

car, cars, car's, cars'

car

am, are, is

be

Vectorizing  - models do not understand text            vectors

@lidderupk
IBM Developer

Bag of Words

Term Frequency

tf-idf - Term Frequency, Inverse Document Frequency

raw count of a term in a document, i.e., the number of times that term toccurs in document d

The inverse document frequency is a measure of how much information the word provides, i.e., if it's common or rare across all documents.

But First - prepare the data

@lidderupk
IBM Developer
import pandas as pd

# create a dataframe from the data file
df = pd.read_csv('SpamHam-Train-new.csv')

# extract the spam and ham data into separate dataframes
df_spam = df[df['Ham-Spam'] == 'spam']
df_ham = df[df['Ham-Spam'] == 'ham']

# take out 5 examples to test for spam
df_test_spam = df_spam.sample(5)
df_spam = df_spam.drop(index=df_test_spam.index)

# take out 5 examples to test for ham
df_test_ham = df_ham.sample(5)
df_ham = df_ham.drop(index=df_test_ham.index)

# print out the number of rows in each frame
print('#rows in df_spam: ' + str(df_spam.shape[0]))
print('#rows in df_test_spam: ' + str(df_test_spam.shape[0]))
print('#rows in df_spam: ' + str(df_ham.shape[0]))
print('#rows in df_test_spam: ' + str(df_test_spam.shape[0]))

# save each data frame to a file of it's own
df_spam.to_csv('./spam_train.csv', index=False)
df_test_spam.to_csv('./spam_test.csv', index=False)
df_ham.to_csv('./ham_train.csv', index=False)
df_test_ham.to_csv('./ham_test.csv', index=False)

#rows in df_spam: 668
#rows in df_test_spam: 5
#rows in df_spam: 4320
#rows in df_test_spam: 5
#rows in df_spam: 668
#rows in df_test_spam: 5
#rows in df_spam: 4320
#rows in df_test_spam: 5

Steps

@lidderupk
IBM Developer
  1. Sign up / Log into IBM Cloud - http://bit.ly/hacker-ibm
  2. Create Watson Studio Service.
  3. Sign into Watson Studio and create a new Data Science Project. It also creates a Cloud Object Store for you.
  4. Upload csv data to your project - spam and ham.
  5. Add a new Natural Language Classifier to your project.
  6. Create a NLC model.
  7. Add labelled classes and training data to the model.
  8. Test your model in Jupyter Notebook !

Step 1 - sign up/ log into IBM Cloud

@lidderupk
IBM Developer

http://bit.ly/hacker-ibm

Step 2 - locate Watson Studio in Catalog

@lidderupk
IBM Developer

Step 3 - create Watson Studio instance

@lidderupk
IBM Developer
@lidderupk
IBM Developer

Resources List - If you want to look at all your services

Step 4 - launch Watson Studio

@lidderupk
IBM Developer

Step 5 - create a new project

@lidderupk
IBM Developer

Step 6 - pick Standard project

@lidderupk
IBM Developer

Step 6a - pick US South as region if asked

@lidderupk
IBM Developer

Step 7 - give the project a name and assign COS

@lidderupk
IBM Developer

Step 7a - you may have to add a new storage from this page

@lidderupk
IBM Developer

Step 8 - open asset tab

@lidderupk
IBM Developer

Step 9 - drag and drop data file into Load Assets

@lidderupk
IBM Developer

You will need ham-train and spam-train files

Step 10 - add Natural Language Classifier model to the project

@lidderupk
IBM Developer

Step 10a - add Natural Language Classifier service

@lidderupk
IBM Developer

Step 11 - create ham and spam classes

@lidderupk
IBM Developer

Step 12 - add training data

@lidderupk
IBM Developer

Step 13 - train the model

@lidderupk
IBM Developer

Step 14 - service credentials for the Notebook from NLC service

@lidderupk
IBM Developer

Step 14 - service credentials for the Notebook from NLC service

@lidderupk
IBM Developer
{
  "classifier_id": "2ef34ex558-nlc-1192",
  "url": "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/2ef34ex558-nlc-1192",
  "collection": [
    {
      "classes": [
        {
          "class_name": "ham",
          "confidence": 0.9951709163892192
        },
        {
          "class_name": "spam",
          "confidence": 0.004829083610780819
        }
      ],
      "text": "Let's go to Alaska this weekend !",
      "top_class": "ham"
    },
    {
      "classes": [
        {
          "class_name": "spam",
          "confidence": 0.97534853311385
        },
        {
          "class_name": "ham",
          "confidence": 0.02465146688614998
        }
      ],
      "text": "You have won a trip to Alaska !!",
      "top_class": "spam"
    }
  ]
}

Output from Notebook

@lidderupk
IBM Developer

Optional - web application using classifier

@lidderupk
IBM Developer

Optional - web application using classifier

@lidderupk
IBM Developer

Workshop

Thank you

 

Let's chat !

@lidderupk
IBM Developer

Upkar Lidder, IBM

@lidderupk

https://github.com/lidderupk/

ulidder@us.ibm.com

Made with Slides.com