The ML
Happy (Half) Hour
¯\_(ツ)_/¯
---
1) explain what we think is now possible
2) pull back the hood to give some intuition of how this is possible
3) what are ways this isn’t magical
(doesn’t work)
4) brainstorm about implications
Aim:
"The team to come away with
2-3 'things we’ve learned'
that help them ask better questions and evaluate ML companies in a way than they couldn't before"
Aim:
a) OpenAI paper released today
b) Predictions for AI
c) Open discussion =]
Loose plan
Exploring the intersections of:
Unsupervised language modeling
+
Web crawling
+
Easier to use "programming" via deep learning
For my work
LMs as core primitives
Language models (LMs) are a supercharged version of your phone's predictive text
Prediction = LMs leverage every piece of info they can to answer "what's next given past history"
Storage = LMs act a form of compression
Communication = LMs can change their "language" depending on context
Step 1: Tokenization
Tokenization is fast and learns predictable repeated surface structure (~compression)
by-product of which was increased tourism to the town.
by█-█produc█t █of█ which█ was█ increase█d to█uri█s█m █to█ the█ to█wn█. █
Step 1: Tokenization
{"title": "█I█ █fo█un█d █a█ s█ec█re█t █spot█!█", "subreddit": "█FortNiteBR", █"is_self": true, █"url": "https://www█
Step 2: Predict next
Where the complex and large language model comes in - given history, guess the next token
{"title": "█I█ █fo█un█d █a█ s█ec█re█t █spot█!█", "subreddit": "█FortNiteBR", █"is_self": true, █"url": "https://www█
epic█games█.com█/█f█ornite█/█
epicgames.com█/█f█ornite█/█
ep█ic█ga█me█s█.com
OpenAI's new work
Released at 9am this morning ^_^
---
What OpenAI did was...
"OpenAI has basically shown that if the predictive text in your mobile had a supercomputer behind it, you could tab complete real work after it read enough of the web."
- Smerity's tl;dr
An extension of ...
The team at OpenAI performed character level language modeling on Amazon reviews.
This is a single neuron with no "supervision".
What OpenAI did was...
Crawled the outgoing links from Reddit and then extracted the text => 20GB
Ran a very large (256 of Google's TPUs) language model over the dataset
Tested the language model with zero tuning on:
- reading comprehension
- translation
- summarization
- question answering
OpenAI's results
All of this is from picking up
naturally occurring patterns in language
Translation (ish)
Question answering
Summarization
"To induce summarization behavior we add the text TL;DR: after the article"
(though results aren't great ...)
My reaction to OpenAI
Similar to much of my own stack (reassuring)
Puts more weight behind my core thesis:
mining knowledge from the web using LMs
Humans can teach language models by writing normally as they would on the web
Reworded: You can likely bring value to the long tail of online communities by simply reading their shared data
Limits of their OpenAI
Mostly what we've seen in the past but bigger
Models are too large for sane production
Requires huge resources (256 TPUs) for training
Not able to precisely control output
(IMHO) hyped in danger narrative
LMs as core primitives
Language models (LMs) are a supercharged version of your phone's predictive text
Prediction = LMs leverage every piece of info they can to answer "what's next given past history"
Storage = LMs act a form of compression
Communication = LMs can optimize their "language" depending on task and shared info
Predictions on ML + LM
> Language models extract + compress knowledge
> File formats will be deprecated
> Declarative programming for non-programmers
A new ML first programming language
(A Turing complete LM could serve as the building block of a programming language)
Predictions on ML + LM
LMs aren't limited to text - "guess the next X" works in {vision, audio, physics, ...}
"So what knowledge is left
unextracted
from the data we already have..?"
"How many proprietary are actually worthless as you can recreate that knowledge from an unsupervised method and open dataset?"
Predictions on ML + LM
As an example I went to use Magic Pony (previously acquired by Twitter) but instead found WaveOne:
Predictions on ML + LM
Predictions on ML + LM
Predictions on ML + LM
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = 379 bytes
Predictions on ML + LM
File formats will be deprecated:
If we can translate between Fr<>En,
why bother with JSON<>CSV?
by█-█produc█t █of█ which█ was█ increase█d to█uri█s█m █to█ the█ to█wn█. █
{"title": "█I█ █fo█un█d █a█ s█ec█re█t █spot█!█", "subreddit": "█FortNiteBR", █"is_self": true, █"url": "https://www█
Predictions on ML + LM
File formats will be deprecated:
If we can translate between Fr<>En,
why bother with JSON<>CSV?
Humans have internalized many menial chores which is a huge opportunity (= reduce pain)
(keep your data in the best human format!)
Bonus: language models are a form of compression so your data is smaller :)
Predictions on ML + LM
Declarative programming for non-programmers
Predictions on ML + LM
Declarative programming for non-programmers
SQL and databases allowed complex queries over structured data in a declarative way ...
What about complex queries on unstructured data? Do we (as humans) need to read the original data to understand or organize it?
"Find all articles on the Hacker News homepage about machine learning"
Predictions on ML + LM
Declarative programming for non-programmers
Bonus A: The representation of the data improves according to how it's queried and used
Bonus B: As the underlying model is a language model, we can use it to suggest to the user what to do next or indicate potential errors
"Find all articles on the Hacker News homepage about machine learning<TAB> and save to CSV?"
Predictions on ML + LM
> Language models extract + compress knowledge
> File formats will be deprecated
> Declarative programming for non-programmers
A new ML first programming language
(A Turing complete LM could serve as the building block of a programming language)
Minimums
Do they have a way of making and testing hypotheses about the company?
Even analytics (an SQL query run and turned into a graph) provides a heartbeat on the business
Extreme example: Freelancer.com's graphs
(inspired by Harrah's Casino)
Even enough data to ask:
"Is today the same as yesterday?"
Key take-aways
If ML is core:
How are they architecting for long term
If ML isn't core:
Do they have someone who can keep up with the off the shelf open source components?
{Google, NVIDIA, Facebook, Microsoft} are giving away their latest work
Startup Potential
The existing incumbents have optimized for the last war and are just as confused as anyone
The location and depth of moats is shifting
Low hanging fruit is everywhere
(many times more valuable than PageRank)
To existing startups
The open source ecosystem can be leveraged to help and is essentially R&D for you sponsored by {Google, NVIDIA, Facebook, ...}
AI is really just a tool for listening to your customers. If they're not listening (i.e. basic analytics) this won't magically help them.
Facebook's LASER
Sentence encoding (=understanding how sentences are similar) for 93 languages
"[N]o need to specify the input language ... According to our experience, the sentence encoder also supports code-switching, i.e. the same sentences can contain words in several different languages."
Facebook's LASER
Sentence encoding (=understanding how sentences are similar) for 93 languages
Facebook's LASER
Sentence encoding (=understanding how sentences are similar) for 93 languages
ML Happy Half Hour
By smerity
ML Happy Half Hour
- 1,983