The goal of this post is to provide a summary of machine learning essentials for beginners, enthousiasts, and professionals with implementation examples in the CG industry.
Supervised learning algorithms
Where data is labeled and models can be scored by comparing prediction from fact, we speak of supervised learning algorithms.
Where a target value is a linear mathematical combination of feature values, we speak of regression models.
This can be used for example with predicting shot bid days, render times, or samples settings on renders.
I.a Singular linear regression
Before we look into the practice, taking a closer look at the math gives us a better understanding of what the model is trying to achieve.
A linear function is defined as:
$$ f(x) = ax + b $$
Where a controls the slope and b controls the position on the y-axis, a regression model tries to find these variables based on the given features X and known target values y.
We can verify the model by plotting a prediction line and write out the formula using the predicted coefficient and intercept.
I.b Multiple linear regression
In reality, there will rarely be one feature datasets, but dozens if not hundreds of features. In this case we speak of multiple linear regression
$$ f(x) = a_0 x_0 + a_1 x_1 + a_2 x_2 + …+ a_i x_i + b $$
Visualising two feature trends relative to a target can still easily be done with a 3d plot, but as previously mentioned most likely we will have to deal with datasets with much greater feature counts.
At the same time, trends often won’t be linear, in this case we speak of polynomial regression.
$$ f(x) = a_0 x_0^n + a_1 x_1^n + a_2 x_2^n + …+ a_i x_i^n + b $$
Wether trends are linear or not, we can continue using the linear regression model by creating nonlinear trends as new polynomial features using either the provided function or by creating them manually.
The benefit of supervised learning models is the ability to validate the models, and there are several metrics available to us to do so.
MAE: the mean absolute value of the difference between prediction and actual
MSE: the mean squared difference of prediction and actual
Preparing data is equally as important as setting up our models, so before we can run any models we need to run a couple of steps:
Rescaling data between 0 and 1 makes the training less sensitive to the scale of features so we can solve for better coefficients.
For example, samples settings usually lie between 32spp to 1024spp, while pixelcounts can go as high as 8 million pixels.
Shifting the distribution to have a mean of 0 and a standard deviation of 1 unit variance is useful for attributes that rely on the distribution of their values.
The connection between features and target isn’t always linear but in a higher order. We can add polynomial features to specific degrees.
Some features hide in plain sight. Understanding your data can highlight connections between features, or show the necessity of transforming your data into new features.
For example, the total amount of primary rays from the pixel- and camera samples count, or colorspace transformation into HSV features.