Eat one marshmallow now - or wait for two later?
Humans can learn to wait for two marshmallows instead of eating one on the spot
Protocol says: EAT IMMEDIATELY...unless...I can learn a better rule?
if marshmallow == seen: eat()
Old way: explicit rules - we hand-write the code
Machine Learning way: give data & let machine find rule
Data
Cat
Marshmallow
Dog
Labels
Loss
Accuracy
Model tries to map labels to data
Each connection has a weight. Training is finding good weights.
Example from Fashion-MNIST: 28×28 pixels → Dense(128) → Dense(10)
We compute the gradient of the loss and move the weights a tiny step downhill.
Optimizer: e.g., ‘adam’ or SGD. Learning rate controls step size.
Unsupervised: find patterns in unlabeled data.
Reinforcement: learn by trial and error with rewards.
A state = a square. An action moves you to a new state. Each action has a probability.
Policy(s) = probabilities of each action in that state.
Value(s) = expected discounted reward if we start in s and follow the policy
Eat now = small reward now. Wait = bigger reward later
Let the agent explore and collect experience.
Use rewards to update its policy or value function.
Repeat until behavior is (mostly) good enough.
?
Machines optimize what we tell them to optimize.
Bad or biased data ⇒ bad or biased behavior
Neural nets can be powerful but also black boxes.