Hypothesis Testing and Types of Errors
Suppose we want to study income of a population. We study a sample from population and draw conclusions. The sample should represent the population for our study to be a reliable one.
Null hypothesis \((H_0)\) is that sample represents population. Hypothesis testing provides us with framework to conclude if we have sufficient evidence to either accept or reject null hypothesis.
Population characteristics are either assumed or drawn from third-party sources or judgements by subject matter experts. Statistically, population data and sample data are characterised by moments of its distribution (mean, variance, skewness and kurtosis). We test null hypothesis for equality of moments where population characteristic is available and conclude if sample represents populations.
For example, given only mean income of population, we validate if mean income of sample is close to population mean to conclude if sample represents the population.
What are the math representations of population and sample parameters?
Population mean and population variance are denoted in Greek alphabets \(\mu\) and \(\sigma^2\) respectively, while sample mean and sample variance are denoted in English alphabets \(\bar x\) and \(s^2\) respectively.
Could you explain sampling error?
Suppose we obtain a sample mean of \(\bar x\) from a population of mean \(\mu\). The two are defined by the relationship |\(\bar x\) - \(\mu\)| >=0:
- If the difference insignificant, we conclude the difference is due to sampling. This is called sampling error and this happens due to chance.
- If the difference is significant, we conclude the sample does not represent the population. The reason has to be more than chance for difference to be explained.
Hypothesis testing helps us to conclude if the difference is due to sampling error or due to reasons beyond sampling error.
What are some assumptions behind hypothesis testing?
The important assumption is that population follows Normal distribution. The sample distribution may or may not be normal. We obtain mean of sample and conclude sample is part of population when sample mean is in the vicinity of population mean.
What are the types of errors with regard to hypethesis testing?
In concluding whether sample represents population, there is scope for commiting errors on following counts:
- Not accepting that sample represents population when in reality it does. This is called type-I or \(\alpha\) error.
- Accepting that sample represents population when in reality it does not. This is called type-II or \(\beta\) error.
For instance, granting loan to an applicant with low credit score is \(\alpha\) error. Not granting loan to an applicant with high credit score is (\(\beta\)) error.
How do we measure type-I or \(\alpha\) error?
p-value signifies probability of committing type-I error.
The observed sample mean \(\bar x\) is overlayed on population distribution of values with mean \(\mu\) and variance \(\sigma^2\). The proportion of values beyond \(\bar x\) and away from \(\mu\) (either in left tail or in right tail or in both tails) is p-value. \(\alpha\) is the limit beyond which we reject null hypothesis; that is, if p-value <= \(\alpha\) we reject null hypothesis.
The interpretation of p-value is as follows:
- Whenever p-value > 5%, we conclude the sample is highly likely to be drawn from population with mean \(\mu\) and variance \(\sigma^2\). We accept Null hypothesis \((H_0)\).
- Whenever p-value < 5%, we conclude that sample does not show enough evidence to be part of population. i.e., probability of sample is drawn from population with mean \(\mu\) and variance \(\sigma^2\) is less than 5%. We do not accept null hypothesis \(H_0\).
What are one-tailed and two-tailed tests?
When acceptance of \(H_0\) involves boundaries on both sides, we invoke the two-tailed test. For example, if we define \(H_0\) as sample drawn from population with age limits in the range of 25 to 35, then testing of \(H_0\) involves limits on both sides.
Suppose we define the population as greater than age 50, we are interested in rejecting a sample if the age is less than or equal to 50; we are not concerned about any upper limit. Here we invoke the one-tailed test. A one-tailed test could be left-tailed or right-tailed.
What's the relation between level of significance \(\alpha\) and p-value?
Predefined limit for \(\alpha\) error is referred to as level of significance. The standard for level of significance is 5% but in some studies it may be set at 1% or 10%. In the case of two-tailed tests, it's \(\alpha/2\) on either side.
Level of significance (also referred to as \(\alpha\) because we are defining type-I error limits) limits p-value, below which we reject Null Hypothesis. We can also state that p-value is the probability of rejecting Null Hypothesis when it is true.
For instance, if p-value is below 5% level of significance, we reject Null Hypothesis. If p-value is above 5%, the conclusion is that we don't have sufficient evidence to reject Null Hypothesis.
In general, if p-value less than \(\alpha\), the results are said to be statistically significant and not due to chance.
How do we determine sample size and confidence-interval for sample estimate?
Law of Large Numbers suggest larger the sample size, the more accurate the estimate. Accuracy means the variance of estimate will tend towards zero as sample size increases. Sample Size can be determined to suit accepted level of tolerance for deviation.
Confidence-interval of sample mean is determined from sample mean offset by variance on either side of the sample mean.
The formulae for determining sample size and confidence interval depends on what we to estimate (mean/variance/others), sampling distribution of estimate and standard deviation of estimate's sampling distribution.
How do we measure type-II or \(\beta\) error?
We overlay sample mean's distribution on population distribution, the proportion of overlap of sampling estimate's distribution on population distribution is \(\beta\) error.
The larger the overlap, the larger the chance the sample does belong to population with mean \(\mu\) and variance \(\sigma^2\). Incidentally, despite the overlap, p-value may be less than 5%. This happens when sample mean is way off population mean, but the variance of sample mean is such that the overlap is significant.
How do we control \(\alpha\) and \(\beta\) errors?
Errors \(\alpha\) and \(\beta\) are independent of each other. Increasing one does not decrease the other.
Similar to p-value that manifests \(\alpha\), Power of Test manifests \(\beta\). Power of test indicates how confident we are in rejecting null hypothesis.
$$ Power\ of\ test = 1-\beta $$
This can be interpreted as follows:
- Low p-value and High power of test will help us decisively conclude sample does not belong to population.
- High p-value and Low power of test will help us decisively conclude sample does belong to population.
- When we cannot conclude decisively, it is advisable to go for larger samples and multiple samples to ensure we take the right decision.
- The cost of committing \(\alpha\) error and the cost of committing \(\beta\) error are determined by the error tolerances we set.
- Gordon, Max. 2011. "How to best display graphically type II (beta) error, power and sample size?" August 11. Accessed 2018-05-18.
- Heard, Stephen B. 2015. "In defence of the P-value" Types of Errors. February 9. Updated 2015-12-04. Accessed 2018-05-18.
- McNeese, Bill. 2015. "How Many Samples Do I Need?" SPC For Excel, BPI Consulting, June. Accessed 2018-05-18.
- Nurse Key. 2017. "Chapter 15: Sampling." February 17. Accessed 2018-05-18.
- Rolke, Wolfgang A. 2018. "Quantitative Variables." Department of Mathematical Sciences, University of Puerto Rico - Mayaguez. Accessed 2018-05-18.
- Six-Sigma-Material.com. 2016. "Population & Samples." Six-Sigma-Material.com. Accessed 2018-05-18.
- Wikipedia. 2018. "Margin of Error." May 1. Accessed 2018-05-18.
- howMed. 2013. "Significance Testing and p value." August 4. Updated 2013-08-08. Accessed 2018-05-18.
- Foley, Hugh. 2018. "Introduction to Hypothesis Testing." Skidmore College. Accessed 2018-05-18.
- Buskirk, Trent. 2015. "Sampling Error in Surveys." Accessed 2018-05-18.
- Zaiontz, Charles. 2014. "Assumptions for Statistical Tests." Real Statistics Using Excel. Accessed 2018-05-18.
- DeCook, Rhonda. 2018. "Section 9.2: Types of Errors in Hypothesis testing." Stat1010 Notes, Department of Statistics and Actuarial Science, University of Iowa. Accessed 2018-05-18.