Premium Resources

Introduction on Six Sigma Statistics

Basic Terms

Measurements of process inputs and outputs can be used to optimize the process being measured. Process inputs may be raw materials, human resources or services. All inputs have some quantifiable measurement, including human effort and skill level. Process input requirements should be stated so that key measures of input quality can be controlled. Measurements within the process can also be used as effective controls. Once process capabilities are known, output measures can be used to monitor if the process has remained in control. Feedback from downstream process measurements can be used to improve an upstream process.

For example, electrical testing for solder shorts can be used to optimize a circuit board soldering operation even if it is several processes upstream from the testing operation. When considering the entire organizational feedback system, complex interrelationships are likely to exist. This is where planned experimentation and designing for six sigma comes into play.

Planned experimentation deals with isolating the effects of several different, independent variables on a process. Designing for six sigma includes eliminating potential sources of error.

Continuous Distribution: A distribution containing infinite (variable) data points that may be displayed on a continuous measurement scale.

Examples: normal, uniform, exponential, and Weibull distributions.

Discrete Distribution: A distribution resulting from countable (attribute) data that has a finite number of possible values.

Examples: binomial, Poisson, and hypergeometric distributions.

Parameter: The true numeric population value, often unknown, estimated by a statistic.

Population: All possible observations of similar items from which a sample is drawn.

Statistic: A numerical data value taken from a sample that may be used to make an inference about a population.


Descriptive statistics include measures of central tendency, measures of dispersion, probability density function, frequency distributions, and cumulative distribution functions.

Measures of Central tendency:

Measures of central tendency represent different ways of characterizing the central value of a collection of data. Three of these measures will be addressed here: mean, mode, and median.

Measures of Dispersion: The other important parameter to describe a set of data is spread or dispersion. Six Sigma Statistics Descriptive Statistics Measures of Central Tendency Measures of central tendency represent different ways of characterizing the central value of a collection of data.

Three of these measures will be addressed here:

  1. Mean

  2. Mode

  3. Median

  • Mean: Mean is the arithmetic average of all data-points in the data set

  • Median: Median is the middle most data point in the data set

  • Mode: Mode is the most frequently occurring data point in the data set

Mean The mean is the total of all data values divided by the number of data points. The arithmetic mean is the most widely used measure of central tendency.

Advantages of using the mean:

  1. It is the center of gravity of the data

  2. It uses all data

  3. No sorting is needed.

Disadvantages of using the mean:

  • Extreme data values may distort the picture.

  • It can be time-consuming.

  • The mean may not be the actual value of any data points.

Mode: The mode is the most frequently occurring number in a data set.

Advantages of using the mode:

  • No calculations or sorting are necessary.

  • It is not influenced by extreme values.It is an actual value.

  • It can be detected visually in distribution plots.

Disadvantage of using the mode:

The data may not have a mode, or may have more than one mode.

Median: The median is the middle value when the data is arranged in ascending or descending order.

For an even set of data, the median is the average of the middle two values.

Advantages of using the median:

  • Provides an idea of where most data is located.

  • Little calculation is required.

  • Insensitivity to extreme values.

Disadvantages of using the median:

The data must be sorted and arranged. Extreme values may be important, two medians cannot be averaged to obtain a combined distribution median.

The median will have more variation (between samples) than the average. Six Sigma Statistics Descriptive Statistics.

For Normal Distribution, Mean = Median = Mode.

Additionally, for normal distribution, the curve is a bell-shaped curve. Area under the curve is equal to 1 and the curve is symmetrical. For Skewed Distribution, Mean, Mode and Median are not equal.

Measures of Dispersion

Other than central tendency, the other important parameter to describe a set of data is spread or dispersion. Three main measures of dispersion will be reviewed: range, variance, and standard deviation.

Range: The range of a set of data is the difference between the largest and smallest values

Variance: The variance, σ2 or s2, is equal to the sum of the squared deviations from the mean, divided by the sample size

Standard Deviation: The standard deviation is the square root of the variance. Standard deviation is also the average distance of all data points from the mean of the data set