Hypothesis Testing

Hypothesis testing is one of the basic inference method used in statistics and this post is intended to cover what hypothesis testing is on a high level, terms used, definitions and can be brief refresher . It is by no means a deep dive and reference section has some good links if you want a detailed study.

What is hypothesis testing ?

It is a part of statistical analysis, where we test the assumption made regarding a population parameter.  It is generally used when we want to compare a single group with an external standard or  two/more groups with each other.

Terms

Statistical Significance - It is the probability of how unlikely the outcome have been if it just happened by random choice.

Null Hypothesis - It is a statistical theory that suggests there is no statistical significance exists between the populations. It states that there is no relationship or no effect. It is denoted by H0 and read as H-naught.

Alternative Hypothesis - It suggests there is a significant difference between the population parameters. It could be greater or smaller, it is the contrast of Null hypothesis. It is denoted by Ha.

Level of Significance (alpha) - It is the probability of rejecting the null hypothesis when it is true.

P- Value - It is the probability that random chance generated the data or something else that is equal or rarer, assuming the truth of null hypothesis. It tells how likely it is that your data could fall under or closer to the null hypothesis.

Steps

  • Figure out the null hypothesis.
  • State your null hypothesis and alternate hypothesis.
  • Choose what kind of test your need to perform.
    • The first step in testing a hypothesis is to assume that the null hypothesis is true.
    • If p value < alpha, we reject the null hypothesis and assume alternate to be true.
    • If p value >= alpha, we fail to reject the null hypothesis.
  • Either support or reject the null hypothesis.

Types of Hypothesis Testing

  • Parametric and Non-Parametric
    • Parametric tests assume a normal distribution.
    • Non parametric doesn’t make any assumption on distribution in the data.
  • One-Tailed and Two-Tailed
    • A one-tailed test is where you are only interested in one direction. If a mean is x, you might want to know if a set of results is more than x or less than x. We have to be very careful when choose one tailed tests. For example you have a new drug that you want to test if it more effective than an existing drug. Doing a one-tailed test to test if a new drug is more effective, there is a consequence if it less effective than the existing drug. So, it is not the right test here. Same example if you want to test if the new drug is less effective than the existing drug, you don’t care if it is more effective and one-tailed test is appropriate here.
    • In two-tailed test, you are testing for the possibility of the relationship in both directions. If a mean is x, you might want to compare the mean of a sample to give value x. Our null hypothesis is that the mean is equal to x.

Cheat Sheet

Different kinds of Hypothesis Testing with a cheat sheat is below.

References

https://www.analyticsvidhya.com/blog/2021/07/hypothesis-testing-made-easy-for-the-data-science-beginners/
https://ravedata.in/statistics/hypothesis-testing/?fbclid=IwAR2ECbqR_GM7hlb_Y2vjeQ2r8_VbsfHLQAUndtq7rnwH1MLAseKyj2J37g8
https://leanmanufacturing.online/introduction-to-hypothesis-testing/
https://blog.minitab.com/en/adventures-in-statistics-2/understanding-hypothesis-tests-significance-levels-alpha-and-p-values-in-statistics
https://www.scribbr.com/statistics/p-value/
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/