Bias and Variance in Machine Learning
An intuitive guide.
Bias (in statistics) is anything that leads to a systematic difference between the true parameters of a population and the sampling statistics used to estimate those parameters.
Let’s make this a bit intuitive….
Bias arises due to poor or inadequate sampling where sample is not the proxy of population.
For instance, if the mean of the population (large size) is way off than what we have obtained from the sampling mean (let’s say small sub-set of the population), it would indicate the statistical bias. Sampling Bias is one of the many statistical biases to avoid.
Let’s say we have a Population of 10,000 users and two random samples — Sample A of size 100 and Sample B of size 1,000 are taken out of it. It is more likely that Sample A with size 100 might not represent the population of 10,000. It may have different sampling statistics (at a certain confidence level) than population statistics, and hence, Sample A will be called biased sample.
Similarly, if Sample B is a proxy of population then it would mean that statistics calculated on Sample B will be same or close enough to statistics calculated on Population. In this case, Sample B will be unbiased representation of Population at some statistically significant level.
So, Bias occur if the sample is not representation or proxy of the Population.
Now, in Machine Learning, if any model is trained in such ‘biased’ sample (Sample A) it won’t be a good model as it won’t be able to generalize results over other samples taken out of the population. If an ML model is not able to generalize or learn due to concerns over data sampling it would mean that there would be a different error or accuracy of prediction every-time when model is ran over other samples.
In short, ML model trained over Sample A will be called highly biased as the predictions made by the model are likely to be skewed in a certain direction away from the actual values.
What does Bias mean in ML model?
Bias measures how far off ML models’ predictions are from the correct values.
It could happen because training dataset (sample) is not the proxy of population or model is unable to capture the learnings from the data.
Highly biased ML model doesn’t learn enough or doesn’t capture the trends from the chosen training data and hence, leads to higher error and lower accuracy of predictions.
What does Variance mean in ML model?
If we get different results when the same model is run on different training data, then model is said to have variance.
The variance is a measure of how sensitive the model is to the specific data used during training. Higher the sensitivity, higher the variance of the model.
High variance in an ML model is bad because it can cause a model to fit the random noise in training data, rather than the intended outputs.
When would be a model able to capture random noise from training data? If the model is overly-complex it would try to fit the training data as closely as possible. It will fit random noise as well and therefore, it will also be very sensitive to minor fluctuation in the data as well.
In summary, we would want model to have low sensitivity to the varying training data sets.
Visualisation of Bias and Variance
Bias and variance can be best understood by analyzing the following visual representation.
Imagine that the center of the target or the bull’s-eye (in red color), perfectly predicts the correct value of your model. The blue dots marked on the target then represent an individual prediction (or realization) of the model based on the training data.
In certain cases, the dots will be densely positioned close to the bull’s-eye, ensuring that predictions made by the model are close to the actual data. In other cases, the training data will be scattered across the target. The more the dots deviate from the bull’s-eye (or the red center), the higher the bias and the less accurate the model will be in its overall predictive ability.
High bias is equivalent to aiming in the wrong place.
High variance is equivalent to having an unsteady aim or scattered predictions.
Let’s understand these four scenarios in more detail:Low bias, low variance: Aiming at the target and hitting it with good precision. In the first target, we can see an example of low bias and low variance. Bias is low because the hits are closely aligned to the center and there is low variance because the hits are densely positioned in one location.
- Low bias, high variance: Aiming at the target, but not hitting it consistently. The second target (located on the right of the first row) shows a case of low bias and high variance. Although the hits are not as close to the bull’s-eye, they are still near to the center and bias is therefore relatively low. However, there is high variance this time because the hits are spread out from each other.
- High bias, low variance: Aiming off the target, but being consistent. The third target (located on the left of the second row) represents high bias and low variance. Why? Since the individual predictions are away from the centre (high bias) and also densely placed (low variance).
- High bias, high variance: Aiming off the target and being inconsistent. the fourth target (located on the right of the second row) shows high bias and high variance. Why? Since the individual predictions are away from the centre (high bias) and also scattered (high variance).
Errors in Machine Learning Models
When building machine learning models, we want to keep error as low as possible.
In Machine Learning, the errors made by the model is the sum of three kinds of errors:
- Error due to bias in your model
- Error due to model variance
- Error that is irreducible
Let’s put an equation together to sum these errors (math is omitted) —
Total Error = Bias * Bias + Variance + Irreducible Error
Even if we had a perfect model, we might not be able to remove the errors made by a learning algorithm completely. This is because the training data itself may contain noise. This error is called Irreducible error or Bayes’ error rate or the Optimum Error rate.
While we cannot do anything about the Optimum Error Rate, we can reduce the errors due to bias and variance.
Two major sources of error are bias and variance, which are called prediction error.
If we managed to reduce these prediction, then we could build more accurate models.
Ishan is an experienced data scientist with expertise in building data science and analytics capabilities from scratch including analysing unstructured/structured data, building end-to-end ML-based solutions, and deploying ML/DL models at scale on public cloud in production.
You may find him on LinkedIn.