# Key Points in Simple Linear Regression

Coefficient of Determination (R2)

The coefficient of determination is ‘R?. The square of the linear correlation coefficient is r2. It can be shown that: R2 = r2

Correlation versus Causation.

There could a logical correlation such as car weight and gas mileage. The student should be aware that a number of other factors (carburetor type, car design, air conditioning, passenger weights, speed, etc.) could also be important. The most important cause may be a different or a collinear variable.

For example, car and passenger weight may be co-linear.

There can also be such a thing as a nonsensical correlation, i.e. it rains after my car is washed.

Regression Equation

Regression analysis is used to construct relationships between a dependent or response variable (Y) and one or more independent or predictor variables (As). The goal is to determine the values of parameters for a function that cause that function to best fit a set of data observations.

Regression Approach

The dependent variable Y that needs to be predicted is identified. The multiple regression analysis that focuses on the independent variables it wants to use as predictors – the as – is carried out. Through the analysis, the relationship between the Y and the as is identified as a mathematical formula, a model. Regression analysis fits a straight line to data points so they are distributed evenly along the line. A curvilinear relationship is one that is described by a curve, not a straight line.

Non-Linear Regression – Cluster Analysis

Cluster analysis is used to determine groupings or classifications for a set of data. A variety of rules or algorithms have been developed to assist in group formations. The natural groupings should have observations classified so that similar types are placed together. A file on attributes of high achieving students could be grouped or classified by IQ, parental support, school system, study habits and available resources.

Cluster analysis is used as a data reduction method in an attempt to make sense of large amounts of data from:

• Surveys

• Questionnaires

• Polls

• Test questions

• Scores

Canonical Correlation Analysis and MANOVA

Canonical analysis tests the hypothesis that effects can have multiple causes and causes can have multiple effects. This technique was developed by Hoteling in 1935, but was not widely used for over 50 years. The emergence of personal computers and statistical software has led to its fairly recent adoption. Canonical correlation analysis is a form of multiple regression to ?nd the correlation between two sets of linear combinations. Each set may contain several related variables. The relating of one set of independent variables to one set of dependent variables will form linear combinations. The largest correlation values for sets are used in the analysis.

The pairings of linear combinations are called canonical variants and the correlations are called canonical correlations (also called characteristic roots). There may be more than one pair of linear combinations that could be applicable for an investigation. The maximum number of linear combinations would be limited by the number of variables in the smaller set. Most researchers involve only two sets.

An analysis of variance is used for many independent X variables to solve one dependent Y variable. This method tests whether the mean differences among groups on a single dependent Y variable is signi?cant. For multiple independent X variables and multiple dependent Y factors, (that is, two or more Yes and one or more as), the multiple analysis of variance is used. MANOVA tests whether mean differences among groups of a combination of Yes are significant or not. The concept of various treatment levels and associated factors is still valid. The data should be normality distributed, have homogeneity of the covariance matrices and have independence of observations.

Multiple-Linear Regression

Multivariate analysis is concerned with two or more dependent variables Y1, Y2, being simultaneously considered for multiple independent variables, X1, X2, etc. Recent advances in computer software and hardware have made it possible to solve more problems using multivariate analysis. Some of the software programs available to solve multivariate problems include: SPSS, S-Plus, SAS and Minitab. Multivariate analysis has found wide usage in the social sciences, psychology or educational ?elds. Applications for multivariate analysis can also be found in the engineering, technology and scientific disciplines.

The highlights of the following multivariate concepts or techniques:

• Principal components analysis

• Factor analysis

• Discriminant function analysis

• Cluster analysis

• Canonical correlation analysis

• Multivariate analysis of variance