Premium Resources

Simple Linear Regression

CORRELATION

The population linear correlation coefficient, The sample linear correlation coefficient, r, measures the strength of the linear relationship between the paired x and y values in a sample. R is a sample statistic.

Few Important Points

A positive value for r implies that the line slopes upward to the right. A negative value for r implies that the line slopes downward to the right.

Note that r = 0 implies no linear correlation, not simply “no correlation.” A pronounced curvilinear pattern may exist.

  • When r = 1 or r = -1, all points fall on a straight line;

  • When r = 0, they are scattered and give no evidence of a linear relationship.

  • Any other value of r suggests the degree to which the points tend to be linearly related.

Coefficient of Determination (R2) The coefficient of determination is R’. The square of the linear correlation coefficient is r2. It can be shown that: R2 = r2 Correlation versus Causation. There could a logical correlation such as car weight and gas mileage.

The student should be aware that a number of other factors (carburetor type, car design, air conditioning, passenger weights, speed, etc.) could also be important. The most important cause may be a different or a collinear variable.

For example, car and passenger weight may be collinear. There can also be such a thing as a nonsensical correlation I.e. it rains after my car is washed.

REGRESSION EQUATIONS

Regression analysis is used to construct relationships between a dependent or response variable (Y) and one or more independent or predictor variables (As). The goal is to determine the values of parameters for a function that cause that function to best fit a set of data observations.

Regression Approach

The dependent variable Y that needs to be predicted is identified. The multiple regression analysis that focuses on the independent variables it wants to use as predictors - the as - is carried out. Through the analysis, the relationship between the Y and the as is identified as a mathematical formula, a model.

Non-Linear Regression – Cluster Analysis

Cluster analysis is used to determine groupings or classifications for a set of data. A variety of rules or algorithms have been developed to assist in group formations. The natural groupings should have observations classified so that similar types are placed together. A file on attributes of high achieving students could be grouped or classified by IQ, parental support, school system, study habits and available resources. Cluster analysis is used as a data reduction method in an attempt to make sense of large amounts of data from:

  • Surveys

  • Questionnaires

  • Polls

  • Test questions

  • Scores

Canonical Correlation Analysis and MANOVA

Canonical analysis tests the hypothesis that effects can have multiple causes and causes can have multiple effects. This technique was developed by Hoteling in 1935, but was not widely used for over 50 years. The emergence of personal computers and statistical software has led to its fairly recent adoption. Canonical correlation analysis is a form ofmultiple regression to find the correlation between two sets of linear combinations.

Each set may contain several related variables. The relating of one set of independent variables to one set of dependent variables will form linear combinations. The largest correlation values for sets are used in the analysis. The pairings of linear combinations are called canonical variants, And the correlations are called canonical correlations (also called characteristic roots).

There may be more than one pair of linear combinations that could be applicable for an investigation. The maximum number of linear combinations would be limited by the number of variables in the smaller set. Most researchers involve only two sets. An analysis of variance is used for many independent X variables to solve one dependent Y variable. This method tests whether the mean differencesamong groups on a single dependent Y variable is significant. For multiple independent X variables and multiple dependent Y factors, (that is, two or more Yes and one or more as), the multiple analysis of variance is used.

MANOVA tests whether mean differences among groups of a combination of Yes are significant or not. The concept of various treatment levels and associated factors is still valid. The data should be normality distributed, have homogeneity of the covariance matrices, and have independence of observations.