Residual Analysis | Lean Six Sigma Black Belt

Residual Analysis

Residuals are estimates of experimental error obtained by subtracting the observed response from the predicted response. The predicted response is calculated from the chosen model, after all the unknown model parameters have been estimated from the experimental data. Residuals can be thought of as elements of variation unexplained by the fitted model. Since this is a form of error, the same general assumptions apply to the group of residuals that one typically uses for errors-in general:

One expects them to be normally and independently distributed with a mean of 0 and some constant variance.

These are the assumptions behind ANOVA and classical regression analysis.This means that an analyst should expect a regression model to err in predicting a response in a random fashion; model should predict values higher and lower than actual, with equal probability.

In addition, the level of the error should be independent of when the observation occurred in the study, or the size of the observation being predicted, or even the factor settings involved in making the prediction. The overall pattern of the residuals should be similar to the bell-shaped pattern observed when plotting a histogram of normally distributed data. Departures from assumptions usually mean that the residuals contain structure that is not accounted for in the model. Identifying that structure, and adding a term representing it to the original model, leads to a better model. Any graph suitable for displaying the distribution of a set of data is suitable for judging the normality of the distribution of a group of residuals.

DATA TRANSFORMATION and BOX-COX

Box Cox transformation is a critical tool that is used to transform data from any distribution to normal distribution. It functions on a trial and error basis.

For example: If a given set of data does not follow a normal distribution, the data can be transformed using Box-Cox Transformation and checked if data is successfully transformed to Normal Distribution. In case, it has not done that, the data can again be further transformed using Box Cox Transformation and we can check if the data is now normally distributed.