# Hypothesis Testing

There could be a problem with centering where the process is not centered, it may be precise but not accurate. Processes may be accurate but not precise.

Null Hypothesis

This is the hypothesis to be tested. The null hypothesis directly stems from the problem statement and is denoted as H0.

Examples:

If one is investigating whether a modified seed will result in a different yield/acre, the null hypothesis (two-tail) would assume the yields to be the same H0: Ya = Yb. If a strong claim is made that the average of process A is greater than the average of process B, the null hypothesis (one-tail) would state that process A < process B. This is written as H0: A < B.

Significance: Practical vs Statistical

P-value

P-value is also known as probability value. It is obtained by use of statistical softwares and can be used to make statistical inferences. The general use of p-value is to infer the significance of the statistical test.

Here are a few characteristics of p-value:

Statistical Measure which indicates the probability of making an α error. The value ranges between 0 and 1. We normally work with 5% alpha risk Alpha should be specified before the hypothesis test is conducted.

• If the p-value is > 0.05…Then Ho is true and there is no difference in the groups (Accept Ho)

• If the p-value is < 0.05…Then Ho is false and there is a statistically significant difference in the groups (Reject Ho) .

Essentially, when we are comparing two data sets, we want to see whether these two data sets have the same characteristics. If we are comparing two samples (evidence), null hypothesis is that they belong to the same population (reality) i.e. there is no difference between the two sample characteristics. If we are comparing a sample (evidence) with a given population (reality), null hypothesis is that this sample belongs to the population. If we do happen to prove a difference, we are saying that there is more than (1 –α)% confidence that this difference is genuine, & not due to chance. The general rule is that null hypothesis advocates equality whereas the alternate hypothesis is the opposite of null hypothesis.

Acceptance/Rejection Criteria for Hypothesis

A null hypothesis can be accepted / rejected by using three methods. In critical value method, we reject null hypothesis when the calculated value is greater than the tabular critical value for the corresponding distribution. We do not reject null hypothesis when the calculated value is not greater than the tabular critical value for the corresponding distribution. In probability method, the null hypothesis is rejected when the p-value is less than alpha. We fail to reject null hypothesis when p-value is not lesser than alpha. In confidence interval method, we reject the null hypothesis when the hypothesized parameter value is not in the calculated confidence interval. Whereas, we do not reject the hypothesis when the hypothesized parameter value is within the calculated confidence interval.

Types of Errors

When formulating a conclusion regarding a population based on observations from a small sample, two types of errors are possible:

Alpha error or Type l error:

This error occurs when the null hypothesis is rejected when it is, in fact, true. The probability of making a type l error is called or (alpha) and is commonly referred to as the producer’s risk (in sampling).

Examples are: incoming products are good but called bad; a process change is thought to be different when, in fact, there is no difference.

Beta error or Type II error:

This error occurs when the null hypothesis is not rejected when it should be rejected. This error is called the consumer’s risk (in sampling) and is denoted by the symbol B (beta).

Examples are: incoming products are bad, but called good; an adverse process change has occurred but is thought to be no different. The degree of risk (or) is normally chosen by the concerned parties (α is normally taken as 5%) in arriving at the critical value of the test statistic. The assumption is that a small value for or is desirable. Unfortunately, a small or risk increases the risk.

For a fixed sample size, α (alpha) and β are inversely related. Increasing the sample size can reduce both the α (alpha) and β risks.

• Type I Error – P (Reject Ho when Ho is true) = α

• Type II Error - P (Accept Ho when Ho is false) = β

P Value – Statistical Measure which indicates the probability of making an alpha error. The value ranges between 0 and 1. We normally work with 5% alpha risk, a p value lower than 0.05 means that we reject the Null hypothesis and accept alternate hypothesis

Types of Hypothesis Test

The different types of Hypothesis tests depend upon the data types of output (dependent variable), Y and the input (independent variable), X. If both Y and X variables are continuous, we perform a Simple linear regression and/or a correlation analysis. If the Y variable is continuous and X variable is discrete, for normal data, we perform 1-sample t-test, 2-sample t-test, paired t-test, one-way ANOVA, F-test, Homogeneity of Variance (HOV), among others. Likewise, for non-normal data, we perform, Mann-Whitney Test, Kruskal Wallis test, Mood’s Median test, Friedman test, 1-Sample sign test, 1-Sample Wilcoxon test, among others. If Y variable is discrete and X variable is also discrete, we then perform Chi-Square test.

Hypothesis Testing with Normal Data – Sample Size Calculation

How many samples do I need?

The answer to this question is determined by the following factors:

• Type of Data – Discrete or Continuous

• SD or PD Value – What will be the Standard deviation or Proportion defectives

• Confidence Level – How confident you want to be

Sample Size formula for Continuous data

Sample size for continuous data can be calculated by n is equal to 1.96 multiplied by standard deviation divided by delta the whole square. Here, standard deviation is the estimated standard deviation of our population and delta is the precision or the level of uncertainty in your estimate that you are willing to accept.

Sample Size Calculation (Discrete Data)

Previously we had seen the sample size formula for Continuous data. Now, let us see the sample size formula for discrete data. Sample size for discrete data can be can be calculated by:

• n is equal to 1.96 divided by delta the whole square, p multiply by one minus p.

• Where p is the proportion defective that we are estimating

• And delta is the precision or the level of uncertainty in your estimate that you are willing to accept.