Assumptions behind Regression
5 Assumptions of linear regressions are
1. A Line Describes the Data: the relationship really is linear (or, for practical purposes, approximately linear over the range of the population being studied).
2. Homoscedasticity: the standard deviations of the residuals does not vary with the values of the explanatory variables. In other words, the dispersion of the data around the regression line must be the same along the entire line.
3. Normally Distributed Residuals at a Given X: often difficult to ascertain because there usually isn't enough observations in MMA datasets with the same value of the explanatory variable to get a good look at the distribution of the residuals. This is typically true, due to the Central Limit Theorem, since the residual term is the total of a myriad of other, unidentified explanatory variables. Typically, this assumption is assessed by examining a histogram of all of the residuals. It must be remembered though that this is not an assessment of the actual assumption.
4. Independent Residuals at a Given X: independent residuals is essentially the same as an assumption about independent observations. Independent observations are usually a byproduct of a carefully designed, randomly-selected sample. However, data such as ours that is measured over time or over space often violates this assumption and must be checked to see whether different techniques (times series or spatial analysis) are more appropriate.
5. The Explanatory Variables are Measured Without Error: the explanatory variable on each individual is known without error, measurement, natural, or otherwise. This is often a difficult thing to ascertain and is usually, for better or worse, just assumed to be true.