You get to lead a project. Yay!
Your role is to try to figure out the problem.
It is same dataset and similar to first exercise, but some details are different.
The first part is to talk to the Client.
After getting some context from them
continue to next slide...
The client started the timer
You're going to start doing annotations following Annotator's lead to see what information is in the data.
Focus on which fields are missing.
The Client will time box it,
when the time runs out go to next slide
As you look at the data, two fields are missing:
"number of employees" and "business owner".
Try to convince the client that both of them should be dropped. Keep in mind that they're paying your bills and what they say goes.
Game master will tell you when to move on
Can you come up with clear annotation rules that are consistent and make client happy?
Click ahead
"You hired me because of my ruthless efficiency. Let's quickly come up with some rules for coffee shops. If we can't do it in 5 minutes, we will have to descope it from this sprint. Let's discuss them, one by one, to distinguish whether a business is a coffee shop or not."
Ask the client first.
If they agree then ask the annotator what they think
That didn't work, let's try something else
It's a coffee shop if it has picture of a coffee
Ask the client first.
If they agree then ask the annotator what they think
Well this one should definitely work!
It's not a coffee shop, it's a cake shop if they sell cakes
Ask the client first.
If they agree then ask the annotator what they think
Uh, oh! This is difficult!
It turns out that the rules for the type of business are contradictory, so let's solve that in "next sprint". Focus on Information Extraction now.
How are we going to do that?
Click ahead
{"label":"POSTCODE","pattern":[{"IS_DIGIT":true,"LENGTH":{"==":5}}]}
In the tool provided, we will use this regex pattern to find postcodes
Business names are very similar to Named Entity type ORG in a pre-trained Named Entity Recognition model in spacy. Let's use it!
Business owner is a sub-class of a person.
{"label":"BUSINESS_NAME","pattern":[{"ENT_TYPE":"ORG"}]}
{"label":"BUSINESS_NAME","pattern":[{"ENT_TYPE":"GPE"}]}
{"label":"BUSINESS_OWNER_PERSON","pattern":[{"ENT_TYPE":"PERSON"}]}
The data will already be loaded into the tool
You will only see one sentence at a time so that it's easier to filter information
Let's have another look at this tool we are going to use. Ask annotator to open the link to it
Unnecessary today:
In the future, we can test our rules/models on them