Upkar Lidder
Upkar Lidder is a Full Stack Developer and Data Wrangler with a decade of development experience in a variety of roles. He can be seen speaking at various conferences and participating in local tech groups and meetups.
Upkar Lidder
Developer Advocate, IBM
> ulidder@us.ibm.com
> @lidderupk
> upkarlidder.com
IBM Signup - http://bit.ly/hacker-ibm
Workshop Material - http://bit.ly/hacker-nlp
Wifi - HD-Events / oz5l8x
@lidderupk
IBM Developer
1. Create IBM Cloud Account using THIS URL
3. If you already have an account, use the above URL to sign into your IBM Cloud account.
2. Check your email and activate your account. Once activated, log back into your IBM Cloud account using the link above.
http://bit.ly/hacker-ibm
@lidderupk
IBM Developer
https://developer.ibm.com/callforcode/
Background
Research
Data
Videos
Solutions
Blogs/Articles
Code patterns
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
Watson Assistant lets you build conversational interfaces into any application, device, or channel.
Everything AI and ML with IBM. Project based development environment with access to assets like Cloud Object Store, Machine Learning Services, AutoAI, SPSS, Jupyter Notebooks, etc.
Cognitive search and content analysis in your unstructured documents.
Create custom classifier for your domain data. You create a classifier instance by providing a set of representative strings and a set of one or more correct classes for each training.
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
Successfully Create, Store and Deploy a Natural Language Classifier Model on IBM Cloud using Watson Studio and Watson Machine Learning Services.
The Classifier will categorize an email as ham or spam.
@lidderupk
IBM Developer
Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!
Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?
Will purchase d stuff today and mail to you. Do you have a po box number?
I won a coupon for $400. I am happy.
@lidderupk
IBM Developer
Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!
Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?
Will purchase d stuff today and mail to you. Do you have a po box number?
I won a coupon for $400. I am happy.
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
Tokenization
Input: Friends, Romans, Countrymen, lend me your ears;
Friends
Output:
Romans
Countrymen
lend
me
your
ears
Remove Stop Words
Noise
<div class="sl-block-content" data-placeholder-tag="p" data-placeholder-text="Text" style="z-index: 15;">
<p>
<span style="font-size:1.0em">
<span style="text-align:start">Input: Friends, Romans, Countrymen, lend me your ears; </span>
</span>
</p>
</div>
@lidderupk
IBM Developer
Stemming & Lemmatization
car, cars, car's, cars'
car
am, are, is
be
@lidderupk
IBM Developer
Bag of Words
Term Frequency
tf-idf - Term Frequency, Inverse Document Frequency
raw count of a term in a document, i.e., the number of times that term toccurs in document d
The inverse document frequency is a measure of how much information the word provides, i.e., if it's common or rare across all documents.
@lidderupk
IBM Developer
import pandas as pd
# create a dataframe from the data file
df = pd.read_csv('SpamHam-Train-new.csv')
# extract the spam and ham data into separate dataframes
df_spam = df[df['Ham-Spam'] == 'spam']
df_ham = df[df['Ham-Spam'] == 'ham']
# take out 5 examples to test for spam
df_test_spam = df_spam.sample(5)
df_spam = df_spam.drop(index=df_test_spam.index)
# take out 5 examples to test for ham
df_test_ham = df_ham.sample(5)
df_ham = df_ham.drop(index=df_test_ham.index)
# print out the number of rows in each frame
print('#rows in df_spam: ' + str(df_spam.shape[0]))
print('#rows in df_test_spam: ' + str(df_test_spam.shape[0]))
print('#rows in df_spam: ' + str(df_ham.shape[0]))
print('#rows in df_test_spam: ' + str(df_test_spam.shape[0]))
# save each data frame to a file of it's own
df_spam.to_csv('./spam_train.csv', index=False)
df_test_spam.to_csv('./spam_test.csv', index=False)
df_ham.to_csv('./ham_train.csv', index=False)
df_test_ham.to_csv('./ham_test.csv', index=False)
#rows in df_spam: 668
#rows in df_test_spam: 5
#rows in df_spam: 4320
#rows in df_test_spam: 5
#rows in df_spam: 668 #rows in df_test_spam: 5 #rows in df_spam: 4320 #rows in df_test_spam: 5
@lidderupk
IBM Developer
@lidderupk
IBM Developer
http://bit.ly/hacker-ibm
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
You will need ham-train and spam-train files
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
{
"classifier_id": "2ef34ex558-nlc-1192",
"url": "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/2ef34ex558-nlc-1192",
"collection": [
{
"classes": [
{
"class_name": "ham",
"confidence": 0.9951709163892192
},
{
"class_name": "spam",
"confidence": 0.004829083610780819
}
],
"text": "Let's go to Alaska this weekend !",
"top_class": "ham"
},
{
"classes": [
{
"class_name": "spam",
"confidence": 0.97534853311385
},
{
"class_name": "ham",
"confidence": 0.02465146688614998
}
],
"text": "You have won a trip to Alaska !!",
"top_class": "spam"
}
]
}
@lidderupk
IBM Developer
@lidderupk
IBM Developer
@lidderupk
IBM Developer
Workshop
@lidderupk
IBM Developer
Upkar Lidder, IBM
@lidderupk
https://github.com/lidderupk/
ulidder@us.ibm.com
IBM Developer SF
By Upkar Lidder
Upkar Lidder is a Full Stack Developer and Data Wrangler with a decade of development experience in a variety of roles. He can be seen speaking at various conferences and participating in local tech groups and meetups.