Questions
Are final project presentations a part of the grade?
Do we have to make all our assets (sound, images) ourselves for the project?
Can we work alone for the final project?
Yes, it's 10% of the final project grade.
No. Please take assets from places online (unless you want artistic control over them) and credit them appropriately.
Yes.
When making physics, is it better to make physics in a separate class or in the object being simulated?
Usually in the object being simulated.
How long does it take to make a simple physics engine?
A simple physics engine is almost always wrong. The question then becomes how wrong you want to let it be.
Are hurtboxes perfectly shaped to characters in newer games?
No, because perfect forming makes it hard to calculate collisions quickly. Most games still use variants on ellipsoids, cylinders, and boxes.
Golly gee willikers check out what machine learning can do!
Artificial Intelligence is dangerous and should be countered with nuclear airstrikes.
<API>
to accomplish <machine learning task>
in your application for <startup company>
which is valued at <number>
<money units>
.Going to teach you how to actually do any machine learning.
(x_train, _), (x_test, _) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
Even in well-designed libraries for machine learning, the code itself looks nothing like the concepts behind it. It's annoying.
A very high level overview of some of the concepts used in machine learning, specifically in the image generation arena.
How people think about the problems and their respective solutions.
Don't be afraid to ask me to slow down or re-explain something!
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks
Wikipedia
Completely specify what the computer should do.
Write code that tells the computer what operations to carry out.
def my_function(x):
return 2 * x + 1
Partially specify what the computer should do.
Write code that tells the computer mostly what it should do, but leave a few values ambiguous.
def my_function(x):
return a * x + b
Use examples to teach the computer what the parameters should be.
The computer is still executing a function! Instead of specifying what the function is 100%, we're going to feed the computer examples and adjust the function based on those examples!
What type of data is a grayscale (black & white) image?
What would a function look like to determine if a given pixel is part of the stroke or part of the background?
def is_background(image, i, j):
if image[i][j] > 5:
return True
else:
return False
def semantic_category(image, i, j):
# Do some analysis work
#
#
if is_car(analysis_result):
return 1
elif is_bus(analysis_result):
return 2
# etc. etc.
def semantic_segments(image):
for i in range(len(image)):
for j in range(len(image)):
semantic_category(image, i, j)
def my_function(x):
return a * x + b
First, we need a list of examples:
def my_function(x):
return a * x + b
Let's start by guessing values for a, b. Maybe we'll say they start at a = 0 and b = 5.
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
Well....that's not very good.
def my_function(x):
return a * x + b
What if we tried a = 0, b = 100?
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 100 |
5 | 11 | 100 |
100 | 201 | 100 |
-3 | -5 | 100 |
Well....that's not very good either.
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 100 |
5 | 11 | 100 |
100 | 201 | 100 |
-3 | -5 | 100 |
a = 0, b = 100?
a = 0, b = 5?
We need some way to assign a number to the intuitive idea of "how bad the error is."
One common idea: take the square of the difference between the output and the target value (least squares)
def mse(true_out, our_out):
loss = 0
for i in range(len(true_out)):
loss += (true_out - our_out) ** 2
These are known as loss functions. If you go into ML, you'll hear about a lot of these things:
def my_function(x):
return a * x + b
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
def my_function(x):
return a * x + b
We tried a = 0, b = 5. These were our results.
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
If we increased a, would the results be better or worse?
MSE Loss = 38552
def my_function(x):
return a * x + b
a = 0.1, b = 5
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5.2 |
5 | 11 | 5.5 |
100 | 201 | 55 |
-3 | -5 | 4.7 |
MSE Loss = 21440.38
We could just keep trying this!
But real models don't have 2 parameters, they have 3 million.
To actually try this for 3 million parameters (and then actually have to change things) is too slow.
Instead, rely on another observation: for our given examples, we can compute the loss exactly based off of a and b.
For given examples, the loss is a function of a and b!
def compute_loss(inputs, outputs, a, b):
loss = 0
for i in range(len(inputs)):
our_output = a * inputs[i] + b
true_output = outputs[i]
loss += (our_output - true_output) ** 2
return loss
Example: We tried a = 0, b = 5.
The (negative) gradient of the function at these values is (39320, 384), so we would increase both a and b.
Keep doing this, and eventually we'll arrive at the correct values for a and b.
def my_function(x):
return a * x + b
There's a million reasons why this shouldn't work.
But it does.
If you have a function you want to compute, and you don't know how to write code for it, but you can get lots of examples.
Early examples:
If you don't know what your output should be, machine learning will not help you figure that out.
If you know what your output should be, but you can't generate more than a few hundred examples, most ML algorithms will not help you.
def my_function(x):
return a * x + b
This is a very simple function!
No matter how we fiddle with the parameters, it will only ever output a line.
def my_function(x):
return a * cos(x) + b * sin(x) +
c * exp(x) + d * x**2 +
e * x**3 + f * sqrt(x) +
# and so on and so forth
Need a function that's capable of capturing more behaviors, but can still be adjusted by changing the parameters (so that we can still train the machine learning model).
def my_function(x):
return a * cos(x) + b * sin(x) +
c * exp(x) + d * x**2 +
e * x**3 + f * sqrt(x) +
# and so on and so forth
Each layer will apply a simple transformation to the input data, with learned parameters. For example:
Linear
ReLU
MaxPool
Linear
ReLU
MaxPool
C
A
B
D
E
Q
where \(w_1 \dots w_5\) can be adjusted by gradient descent.
Linear
ReLU
MaxPool
In early layers of an image processing network, we want to gather local information (e.g. colors, edges, brightness) but we don't want to overwhelm the network with too many parameters.
Solution: create neural network layers that operate on small 3x3 windows with learned weights, but the same weights for all areas in the image.
This, combined with clever usage of nonlinear layers and pooling, leads to the convolutional neural network.
def my_function(x):
return a * x + b
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
Example Input | Example Output | Our Output |
---|---|---|
2 | 5 | 5 |
5 | 11 | 5 |
100 | 201 | 5 |
-3 | -5 | 5 |
DOG
CAT
Text
For every million photons released by the light source, about one will enter the camera.
In practice, we find the photons that enter the camera and find out what color they would have had.
If we want to use raytracing to generate views of an existing scene, we need to fully digitize the scene before doing so, with full colors and textures---impractical!
Geometry hidden, you tell me color based off of input ray
What is \(f\)?
What is \(f\)?
What is \(f\)?
What is \(f\)?
Note: \(f\) is tied to the specific scene. If we want to raytrace a different scene, we will need a different \(f\) (i.e. different weights in our neural net).
Function to learn | Input ray -> ray color |
Loss Function | Image similarity |
Examples | ??? |
These photos are the result of raytracing the scene so they form a set of correct examples.
All 1024 x 1024 x 3 boxes with values [0,255].
Things that actually look like images.
The space of images is probably not actually a nice small ball within the space of all arrays.
A neural network which just attempts to replicate its output.
Center layer (bottleneck) is smaller than the input (and also the output).
In a perfect world, the bottleneck layer forms a perfect representation of all points in the space of images: every image corresponds to some value of the bottleneck, and every value of the bottleneck corresponds to an image.
But we can already do some cool stuff!
If we're trying to generate human faces, our latent space is the space of all human faces.
If we're trying to generate images, our latent space is the space of all images.
If we're trying to do voice generation with text-to-speech, the target voice (and all the sounds it can make) are our latent space.
Decoder Network
Classifier Network
Stylization Network
If the bottleneck layer has condensed information about the domain, we can use another network to extract this information!
http://taskonomy.stanford.edu/
But we can generalize this to an encoder-decoder architecture which attempts to capture a similar idea.
A U-Net (encoder-decoder architecture)
General idea: turn random noise into meaningful data.
In principle, we can approximate it with a neural network.
But we don't have examples of this function, so we can't train a neural network on examples directly.
Add noise step-by-step, so that there's a little less information in the image at each step (rather than going from the full image to random noise in a single step)
*this isn't what actually happens: see references for more information
Source: https://www.calibraint.com/blog/beginners-guide-to-diffusion-models
Noising Process
Modern machine learning in a nutshell
monkey eating banana split sundae
(I think GPT might have been sassing me...)
monkey eating banana split sundae
x1000
monkey eating banana split sundae
x1000
monkey eating banana split sundae
This means the text representations it learns should somehow be really well suited for visual tasks.
monkey eating banana split sundae
You need slightly more cleverness to deal with the fact that training a neural net over the full image space is too computationally expensive.
See reference Diffusion2 for more details (look for cascade and latent space diffusion)