Intro to Artificial Intelligence

October 2016, Stavros Vassos Helvia.io

(a very basic)
Intro to
(some part of)
Artificial Intelligence 

Artificial Intelligence through some applications

Self-driving cars

Games: Go

Games: Jeopardy

Movie recommendations

Predictive text

Artificial Intelligence through some applications

  • Some cases are clear in our head that this is AI because we relate them with "thinking" abilities.
  • Some cases are less exciting and we tend to not acknowledge AI behind them.
  • E.g, in principle spam filtering uses similar AI techniques as some of the previous examples.

Artificial Intelligence through some applications

  • Typically the exciting cases require a complex mix of methods from many subfields of AI and Data Science.
  • In any case, many exciting results have been driven mostly by the recent advances Machine Learning.
  • So let's go through some basic concepts and terminology!

Overview

  • Machine Learning
  • Classification
  • Deep Learning
  • Live experiment!
  • More

Machine Learning

Machine Learning (ML)

  • Arthur Samuel way back in 1959: “[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.

Programming vs ML

  • What does it mean to program something explicitly?
  • ML systems are programmed of course, but the way to achieve their functionality is not given by the programmer.
     
  • For instance, an ML system detects cats in images; the programmer has not given "cat detection" rules.
  • In contrast, the program that handles an ATM transaction has a very explicit and fixed rules.

Learning by example

  • Ok, so what is learning and how it works?
  • The programmer needs to give something to the ML system in order to program itself
  • And this is data, in particular lots of data!
     
  • For instance, in an ML system detects cats in images; the programmer has not given "cat detection" rules.
  • But the programmer gives a huge number of cat images so that the system can learn on its own!

Machine Learning (ML)

  • We prepare a big dataset of instances of the problem we want to solve, e.g., lots of cat images.
  • The ML system uses the dataset to train itself and create a model of problem we want to solve.
  • Then the model can be used to predict the answer to new problem instances, e.g., is this new image a cat?

Un/Supervised ML

  • There are two big categories depending on the dataset we provide (and different results we can get).
  • Supervised learning: we go over the dataset and mark ourselves the right answer.
  • Unsupervised learning: we give the dataset without hints and let the system figure out patterns.

Un/Supervised ML

  • There are two big categories depending on the dataset we provide (and different results we can get).
  • Supervised learning: we go over the dataset and mark ourselves the right answer.
  • Unsupervised learning: we give the dataset without hints and let the system figure out patterns.

ML input/output?

  • How do we actually give something as input to an ML system, e.g., a picture?
  • What is the output we received?
  • Let's see a specific class of problems and an example to make things more concrete.

Classification

Ice-cream example

  • Here's a simple (silly but useful) scenario.
  • We have data about how people eat their ice cream.
  • In particular we know per person:
    • how much time it took them to eat it;
    • how much noise they were making while eating it;
    • whether they are kids or not.
  • We want to predict whether a new person is a kid based on their ice-cream eating behavior.

Dataset

  • Let's make a training dataset for supervised ML:
    • x1: time to eat the ice cream;
    • x2: noise while eating the ice cream;
    • we put an O if they are kids and X otherwise.

Model

  • Let's assume we can separate the two classes of ice cream eaters using a simple linear function (model):
f(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}
f(x1,x2)={1,if w1x1+w2x2+b>00,otherwisef(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}

Training

  • In the training phase, the ML system looks into the labelled data and learns the parameters of the function so that when f=1 then the person is a kid.
f(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}
f(x1,x2)={1,if w1x1+w2x2+b>00,otherwisef(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}

Training

  • There is no single solution and the way to find one is numerical in the sense that we start with a random setting and update until it fits the data.
f(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}
f(x1,x2)={1,if w1x1+w2x2+b>00,otherwisef(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}

Classification

  • Now when we have information about a new person in terms of time and noise, the ML system can predict whether they are a kid.
f(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}
f(x1,x2)={1,if w1x1+w2x2+b>00,otherwisef(x_1,x_2) = \begin{cases} 1, & \text{if}\ w_1 x_1 + w_2 x_2 + b > 0 \\ 0, & \text{otherwise} \end{cases}

Classification

  • We want an ML system to learn to identify kids based on their ice cream eating behavior (time, noise).
  • We prepare a dataset with positive and negative examples (the Xs and the Os) and train an ML system to create a model for kid-ness.
  • Then the model can be used to predict kid-ness.

Perceptron

  • This simple method is a linear classifier called perceptron.
  • The perceptron is trained using backpropagation.
  • This essentially goes like this: we try out the labelled data, check the error and then go backwards tweaking the parameters to make the error smaller.

That was pretty easy

  • Why did we have to do this with ML?
  • Well, it's easy because we have just two inputs and it happened that the data is very well separated.

That was pretty easy

  • What if we have 100s of inputs, e.g., the user rankings for all the movies they have seen?
  • What if the data was like the this?

Deep Learning

Ice-cream example

  • Consider a more tricky dataset like this:                         
  • We cannot separate the two classes using a linear function like before.
  • We could try different functions, e.g, a circle.

Neural Networks

  • But the point is that we want the ML system to be flexible to "devise" it's own function (i.e., the model) in order to classify the data correctly.
  • Neural Networks is a way to achieve this. 
  • Instead of writing down a bigger function, we can link together many simple functions.
  • This is inspired by the neurons that are interconnected and trigger each other.

Neural Networks

  • Let's link together many classifiers like the linear one and form a Neural Network

Input-Hidden-Output

  • Input nodes on the left (one per input). 
  • Linear classifiers on the middle (as many as we want).
  • Output nodes on the right (one per output).

Ice-cream example

  • In the example before: 2 input nodes (noise,time), 1 hidden node (the linear classifier), 1 output (kid-ness) . 
  • The hidden node learned to identify kids.

Many inputs

  • When  there are many inputs (e.g., 100s of them),
    each hidden node will learn a different aspect of the problem we try to solve.

High-level classifiers

  • The output nodes will combine the information they get from the classifiers (hidden nodes) and work as more high-level classifiers to generate the answer.

Going deeper

  • We can do this for many hidden layers.
  • Each layer will work as if the previous hidden layer is the input it is trying to learn.

Deep Learning

  • This is a Neural Network that is Deep.
  • And we use the Machine Learning paradigm.
  • This is where the name Deep Learning comes from.

Deep Learning

  • The main idea is the same as in the case of f(x1,x2).
  • Essentially, the whole Deep NN is a just one complicated function that can fit itself to the data.

Deep Learning

  • Let's focus on the case of two inputs (x1, x2) and let's play with a visual tool to experiment with layers and nodes.

Live Experiment

NNs with Tensorflow

Overfitting

  • You need to find the right balance in defining the function  that fits in the training dataset

More

  • Vision
  • Games
  • Text

Convolutional NNs

  • Vision: exploit "local" relation of features in images.

Convolutional NNs

Reinforcement Learning

  • Games: exploit "rules" to do learn by playing thousands of games alone or to itself as an opponent.

Recurrent NNs

  • Text: exploit "temporal" relation between words.
  • Generative models for many scenarios.

Recurrent NNs

  • Text: exploit "temporal" relation between words.
  • Generative models for many scenarios.

NLP + AI = Chatbots!

  • Messaging apps as a uniform and familiar UI.
  • Natural language as a new UI element.

AI is not only ML

  • Knowledge representation
  • Action languages
  • Automated logical reasoning
  • Verification
  • ...

Takeaways

Applications?

  • The range can be confusing.
  • From analyzing item transactions that we mentioned in the beginning..
  • ..to AI that we fear that it may one day rule the world!

Applications?

  • ML research and tools progress rapidly.
  • Computing resources progress rapidly.

Applications?

  • On the one side you can use "black box" methods for implementing "users who liked X also liked Y".
  • On the other side you can expect that it is feasible to use lots of data to do problem solving of the form:
    • "evaluate this situation based on inputs";
    • "decide the best action based on inputs";
    • "classify data into categories";
    • combinations of all of the above.

It starts here

Questions?