What is Complexity of a Machine Learning Model?
How complex should a machine learning model be?
Chances are high that you have come across under-fitting vs. overfitting figure — which states that as the model complexity increases overfitting is more likely and inverse is true for under-fitting.
In a time series forecasting, checking stationarity is one of the important step; and to check that we plot Autocorrelated Function (ACF) and Partial Auto-Correlated Function (PACF). After plotting PACF we take the least possible lag value that shows higher (or the highest) correlation to avoid making model complex.
Model complexity is so important to evaluate and understand that we use Akaike information criterion (AIC) and Bayesian information criterion (BIC) to measure tradeoff in between model performance and model complexity.
We use Adjusted Coefficient of Determination (Adjusted - R2) to measure effect of increase in feature space for a regression based prediction.
But what exactly is complexity of model?
How do we measure it?
And how complex should a machine learning model be?
Agenda
In order to address above mentioned questions, this article will cover following —
- Let’s define Model Complexity
- Why do we care about Model Complexity?
- Intuitive examples
- Model Complexity in Neural Nets
- Can we measure Model Complexity in the absolute sense?
- Summary
Let’s define Model Complexity
Complexity is relative.
An algorithm with complicated methods could have different model complexity. For instance, a Random Forest Algorithm is inherently complicated but it could have different complexity based-on the number of trees used. Higher the number of trees, higher will be the complexity. Hence, the model complexity is relative.
On the other hand, a linear regression algorithm is a simpler method, however its complexity also increase as the feature space in data increase.
In general, if we compare Random Forest Regressor with Linear Regressor (OLS), we can say RF Regressor model has more model complexity.
Does Model Complexity depend on the data?
Theoretically speaking, algorithm complexity doesn’t depend on data. However, data (or data size) at hand can also imply which model will be complex or simple.
For instance, a linear regression algorithm is a simpler method.. however, complexity of model, built on linear regression, increases as the feature space increases because more weight parameters need to be estimated.
Same algorithm could be complex or simple or just-balanced given the data and model structure.
Let’s consider two training datasets to understand this better —
- Sample 1 — has linear trend in between Target Y and Feature X (Linear Regression)
- Sample 2 — has cubic polynomial fit in between Target Y and Feature X
In this case, simple linear regression (OLS) will be just the right fit for Sample 1. On the other hand, it will be not-complex enough for Sample 2.
Now let’s consider polynomial regression with order 2 (quadratic). It would be complex for Sample 1 and not-complex enough for Sample 2.
Let’s look at polynomial regression with order 3 (cubic). It will be complex for Sample 1 and just the right fit for the Sample 2.
In general, model complexity often depend on the number of features or terms included, as well as whether the chosen model is linear, nonlinear, and so on.
Is Model Complexity same as Computational Complexity or Big O?
Not really. Not Exactly.
Model complexity is not the time complexity or the memory complexity of corresponding algorithms.
Let’s consider an example of linear regression (OLS) algorithm —
Big O for Linear Regression (OLS) with dataset size n*p
(n = observations and p = predictors or features)- Train-time complexity = O((p^2)*(n+p))
- Run-time complexity = O(p)
- Space complexity (during run time)= O(p)
- Sample 1 — Size (100 x 2) with linear relationship.
- Sample 2 — Size (100 x 2) with cubic relationship.
In this case, linear regression model will have the same time complexity and space complexity but it would be called just-the-right fit for the Sample 1 but for the Sample 2 it wouldn’t be the right fit (i.e, model complexity needs to increase).
Also, theoretically speaking, two different models might have the same time/space complexity yet their model complexity could be very different.
Model Complexity = The richness of the model space.
ML model complexity is measured by the number of parameters that model possess. A model is said to become more complex if the more parameters are added and conversely, if some parameters are removed model will become simpler (less-complex).
It does not imply a one-to-one relationship between the model complexity and the number of parameters since each parameter is not necessarily equally important.
It is not necessarily correct to say that of two models the one with more parameters is more complex (although this is very often the case).
Does it mean two models with same number of parameters have equal model complexity?
Well, not so fast…
Not all the parameter contribute to model complexity equally and some parameters could have tight coupling or dependence among each other. Dependence among parameters will typically make a model more complex.
Let’s take an example:
- Model 1: 5 independent parameter with no coupling in them.
- Model 2: 5 coupled or dependent parameters.
Theoretically speaking, in this case, Model 2 with tight backend coupling in the parameters will have more model complexity.
Nevertheless, the number of parameters is tentatively used as a measure for model complexity.
Model Complexity may be considered synonymous to model flexibility.
A flexible model has the capacity to capture a complex underlying process.
For instance, Linear Regressors are very restrictive and hence inflexible since they impose a linear structure onto what the true model looks like. On the other hand, Splines Regression Models allow non-linearity, and hence are more flexible. In this case model complexity can be considered in line (or synonymous) with model flexibility.
In case of clustering with kMeans algorithm, lower value of k means lesser-flexibility and lesser model complexity. Higher the k-value, more the model flexibility and model complexity would be….
Why do we care about Model Complexity?
A model should not be too simple, because then it would not be capable of capturing the process behavior with a reasonable degree of accuracy.
On the other hand, a model should not be too complex because then it would possess too many parameters to be estimated with the available finite data set.
Thus, it is clear that somewhere in between there must exist something like an optimal model complexity.
Of course the optimal model complexity depends on the available data, the specific model, etc.
Intuitive Examples
Let’s take this intuition further —
Case 1: The sample size is large and the number of predictors is small.
In general, a model with relatively higher complexity may perform better. Because of the large sample size, we’re less likely to overfit even when using a more flexible model. For instance, Random Forest algorithm could give better results over Linear Regression algorithms.
Case 2: The number of predictors is large and the sample size is small.
In general, a model with less complexity may perform better. A flexible model (high model complexity) may cause overfitting because of the small sample size.
Case 3: The relationship between the predictors and response is highly non-linear.
A model with relatively higher complexity (for instance, SVM with RBF kernels) will perform better in general because it’ll be necessary to use such flexible model to find the non-linear effect.
Case 4: The variance of the errors is large.
An inflexible model will perform better in general. Because a flexible model will capture too much of the noise in the data due to the large variance of the errors.
Model Complexity in Neural Nets
Model complexity in Neural Network is usually measured in terms of the number of hidden units and layers and their connectivity, the form of activation functions, and parameters of the learning algorithm itself.
Increasing the number and size of layers used in a neural network model, increases model complexity. Model complexity is higher if number of epochs are higher than an optimum value.
Overly complex models are less easily interpreted, at greater risk of overfitting and will likely be more computationally expensive.
So what methods can be used to reduce model’s complexity?
These methods include:
- Linear model and subset selection
- Shrinkage methods (including regularisation)
- Dimensionality reduction (for example, PCA)
Can we measure Model Complexity in the absolute sense?
There is no such thing as an absolute measure of complexity of a model.
However, there are many theories (and measure around them) focused on explaining or measuring model complexity or richness of model space.
Some of them are:
I will discuss some of these in future articles.
Summary
In all practical sense, we should potentially compare two models, assuming both have been constructed using same modelling paradigm and at the same level of abstraction, based on some suitable complexity metric etc.
About Author
Ishan is an experienced data scientist with expertise in building data science and analytics capabilities from scratch including analysing unstructured/structured data, building end-to-end ML-based solutions, and deploying ML/DL models at scale on public cloud in production.
You may find him on LinkedIn.
Follow📱 to stay updated on the upcoming articles! 🔔
If there is anything that is unclear feel free to leave a comment. Thank you for reading this article. 🙌 🙌