Hypothesis Testing and Types of Errors
 Summary

Discussion
 What are the math representations of population and sample parameters?
 Could you explain sampling error?
 What are some assumptions behind hypothesis testing?
 What are the types of errors with regard to hypothesis testing?
 How do we measure typeI or \(\alpha\) error?
 What are onetailed and twotailed tests?
 What's the relation between level of significance \(\alpha\) and pvalue?
 How do we determine sample size and confidenceinterval for sample estimate?
 How do we measure typeII or \(\beta\) error?
 How do we control \(\alpha\) and \(\beta\) errors?
 References
 Further Reading
 Article Stats
 Cite As
Suppose we want to study income of a population. We study a sample from population and draw conclusions. The sample should represent the population for our study to be a reliable one.
Null hypothesis \((H_0)\) is that sample represents population. Hypothesis testing provides us with framework to conclude if we have sufficient evidence to either accept or reject null hypothesis.
Population characteristics are either assumed or drawn from thirdparty sources or judgements by subject matter experts. Statistically, population data and sample data are characterised by moments of its distribution (mean, variance, skewness and kurtosis). We test null hypothesis for equality of moments where population characteristic is available and conclude if sample represents populations.
For example, given only mean income of population, we validate if mean income of sample is close to population mean to conclude if sample represents the population.
Discussion
What are the math representations of population and sample parameters? Population mean and population variance are denoted in Greek alphabets \(\mu\) and \(\sigma^2\) respectively, while sample mean and sample variance are denoted in English alphabets \(\bar x\) and \(s^2\) respectively.
Could you explain sampling error? Suppose we obtain a sample mean of \(\bar x\) from a population of mean \(\mu\). The two are defined by the relationship \(\bar x\)  \(\mu\) >=0:
 If the difference insignificant, we conclude the difference is due to sampling. This is called sampling error and this happens due to chance.
 If the difference is significant, we conclude the sample does not represent the population. The reason has to be more than chance for difference to be explained.
Hypothesis testing helps us to conclude if the difference is due to sampling error or due to reasons beyond sampling error.
What are some assumptions behind hypothesis testing? The important assumption is that population follows Normal distribution. The sample distribution may or may not be normal. We obtain mean of sample and conclude sample is part of population when sample mean is in the vicinity of population mean.
What are the types of errors with regard to hypothesis testing? In concluding whether sample represents population, there is scope for committing errors on following counts:
 Not accepting that sample represents population when in reality it does. This is called typeI or \(\alpha\) error.
 Accepting that sample represents population when in reality it does not. This is called typeII or \(\beta\) error.
For instance, granting loan to an applicant with low credit score is \(\alpha\) error. Not granting loan to an applicant with high credit score is (\(\beta\)) error.
How do we measure typeI or \(\alpha\) error? pvalue signifies probability of committing typeI error.
The observed sample mean \(\bar x\) is overlaid on population distribution of values with mean \(\mu\) and variance \(\sigma^2\). The proportion of values beyond \(\bar x\) and away from \(\mu\) (either in left tail or in right tail or in both tails) is pvalue. \(\alpha\) is the limit beyond which we reject null hypothesis; that is, if pvalue <= \(\alpha\) we reject null hypothesis.
The interpretation of pvalue is as follows:
 Whenever pvalue > 5%, we conclude the sample is highly likely to be drawn from population with mean \(\mu\) and variance \(\sigma^2\). We accept Null hypothesis \((H_0)\).
 Whenever pvalue < 5%, we conclude that sample does not show enough evidence to be part of population. i.e., probability of sample is drawn from population with mean \(\mu\) and variance \(\sigma^2\) is less than 5%. We do not accept null hypothesis \(H_0\).
What are onetailed and twotailed tests? When acceptance of \(H_0\) involves boundaries on both sides, we invoke the twotailed test. For example, if we define \(H_0\) as sample drawn from population with age limits in the range of 25 to 35, then testing of \(H_0\) involves limits on both sides.
Suppose we define the population as greater than age 50, we are interested in rejecting a sample if the age is less than or equal to 50; we are not concerned about any upper limit. Here we invoke the onetailed test. A onetailed test could be lefttailed or righttailed.
What's the relation between level of significance \(\alpha\) and pvalue? Predefined limit for \(\alpha\) error is referred to as level of significance. The standard for level of significance is 5% but in some studies it may be set at 1% or 10%. In the case of twotailed tests, it's \(\alpha/2\) on either side.
Level of significance (also referred to as \(\alpha\) because we are defining typeI error limits) limits pvalue, below which we reject Null Hypothesis. We can also state that pvalue is the probability of rejecting Null Hypothesis when it is true.
For instance, if pvalue is below 5% level of significance, we reject Null Hypothesis. If pvalue is above 5%, the conclusion is that we don't have sufficient evidence to reject Null Hypothesis.
In general, if pvalue less than \(\alpha\), the results are said to be statistically significant and not due to chance.
How do we determine sample size and confidenceinterval for sample estimate? Law of Large Numbers suggest larger the sample size, the more accurate the estimate. Accuracy means the variance of estimate will tend towards zero as sample size increases. Sample Size can be determined to suit accepted level of tolerance for deviation.
Confidenceinterval of sample mean is determined from sample mean offset by variance on either side of the sample mean.
The formulae for determining sample size and confidence interval depends on what we to estimate (mean/variance/others), sampling distribution of estimate and standard deviation of estimate's sampling distribution.
How do we measure typeII or \(\beta\) error? We overlay sample mean's distribution on population distribution, the proportion of overlap of sampling estimate's distribution on population distribution is \(\beta\) error.
The larger the overlap, the larger the chance the sample does belong to population with mean \(\mu\) and variance \(\sigma^2\). Incidentally, despite the overlap, pvalue may be less than 5%. This happens when sample mean is way off population mean, but the variance of sample mean is such that the overlap is significant.
How do we control \(\alpha\) and \(\beta\) errors? Errors \(\alpha\) and \(\beta\) are independent of each other. Increasing one does not decrease the other.
Similar to pvalue that manifests \(\alpha\), Power of Test manifests \(\beta\). Power of test indicates how confident we are in rejecting null hypothesis.
$$ Power\ of\ test = 1\beta $$
This can be interpreted as follows:
 Low pvalue and High power of test will help us decisively conclude sample does not belong to population.
 High pvalue and Low power of test will help us decisively conclude sample does belong to population.
 When we cannot conclude decisively, it is advisable to go for larger samples and multiple samples to ensure we take the right decision.
 The cost of committing \(\alpha\) error and the cost of committing \(\beta\) error are determined by the error tolerances we set.
References
 Gordon, Max. 2011. "How to best display graphically type II (beta) error, power and sample size?" August 11. Accessed 20180518.
 Heard, Stephen B. 2015. "In defence of the Pvalue" Types of Errors. February 9. Updated 20151204. Accessed 20180518.
 howMed. 2013. "Significance Testing and p value." August 4. Updated 20130808. Accessed 20180518.
 McNeese, Bill. 2015. "How Many Samples Do I Need?" SPC For Excel, BPI Consulting, June. Accessed 20180518.
 Nurse Key. 2017. "Chapter 15: Sampling." February 17. Accessed 20180518.
 Rolke, Wolfgang A. 2018. "Quantitative Variables." Department of Mathematical Sciences, University of Puerto Rico  Mayaguez. Accessed 20180518.
 SixSigmaMaterial.com. 2016. "Population & Samples." SixSigmaMaterial.com. Accessed 20180518.
 Wikipedia. 2018. "Margin of Error." May 1. Accessed 20180518.
Further Reading
 Foley, Hugh. 2018. "Introduction to Hypothesis Testing." Skidmore College. Accessed 20180518.
 Buskirk, Trent. 2015. "Sampling Error in Surveys." Accessed 20180518.
 Zaiontz, Charles. 2014. "Assumptions for Statistical Tests." Real Statistics Using Excel. Accessed 20180518.
 DeCook, Rhonda. 2018. "Section 9.2: Types of Errors in Hypothesis testing." Stat1010 Notes, Department of Statistics and Actuarial Science, University of Iowa. Accessed 20180518.
Article Stats
Cite As
Article Warnings
 Summary has no citations. Include at least one.
 Discussion answers at these positions have no citations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
 Following sections are empty: Milestones
 A good article must have at least 1.5 references per 200 words. This article has 1.3.
 A good article must have at least 1.5 inline citations per 100 words. This article has 0.7.