Introduction
Wondering about what is hypothesis testing, how it works, and what is a hypothesis test in statistics? Hypothesis testing is an important part of data science and understanding it can be even more important if you are trying to brush up on your DS knowledge or simple statistics.
So if you want to understand its key concepts then read on. Today we’ll be breaking down its basics and concepts to guide you through everything you need to know about hypothesis testing statistics.
What Is Hypothesis Testing?
So what is hypothesis testing? With hypothesis test you can check if your ideas are true or not, these things can be people, events, or even objects. It is primarily used to find out if two things have a connection or not.
Let’s take a look at some of its examples: following a “3D” approach, according to a doctor, can be 90% effective for diabetic patients.
The formula of hypothesis testing statistics: Z=(X - 0) (/ n )
Where:
- 0 is the population mean (the average value you expect to find),
- is the standard deviation (how much the values vary),
- Xis the sample mean (the average value from your sample),
- n is the sample size (the number of items in your sample).
Now that you understand what is hypothesis testing. We’ll explore its two types.
How Hypothesis Testing Works?
Hypothesis testing is generally done by an analyst who starts by analyzing a sample of data. They do this to see and check the possibility of null hypothesis. Initially they take a small sample from the larger population that they are studying and using that sample they test the two different hypotheses i.e null hypothesis and the alternative hypothesis.
Defining Hypotheses: What is a Hypothesis Test in Statistics
In statistics, the hypothesis can be defined in two ways: Null hypothesis and Alternative hypothesis. These two can be further defined as:
Null hypothesis (H0): Null hypothesis can be easily described as a statement that there is no relationship between the two groups or measured things. In other words, it is another way to say there is nothing happening between the two things.
An example of a null hypothesis can be a company that believes its average daily productivity unit is 50. So, the null hypothesis is- H0: =50
Alternative hypothesis (H1): The alternative hypothesis can be described as the opposite of the null hypothesis. In alternative hypotheses, you’ll find a relationship or difference between the two components.
An example of an alternative hypothesis can be a company suspecting its daily production value is not 50 units. Which makes the alternative hypothesis H1:5o
Steps of Hypothesis Testing
There are mainly 5 steps in hypothesis test, although the specific details vary from time to time, the procedure will always be the same:
Step 1: State Your Alternative and Null Hypotheses
When you have a research question (a prediction you want to investigate), you need to clearly state it as two types of hypotheses: the null hypothesis (H0) and the alternative hypothesis (Ha). This helps you test your idea with data.
- Alternative Hypothesis (Ha): This is your original idea that predicts there is a relationship between the variables.
- Null Hypothesis (H0): This says there is no relationship between the variables.
An example of this include:
You want to find out if there's a difference in height between men and women. Based on what you know, you think men are generally taller than women. To test this, you write your hypotheses like this:
- H0: Men are not taller than women.
- Ha: Men are taller than women.
Step 2: Collect data
To make sure your statistical test is valid, you need to collect data in a way that accurately tests your hypothesis, you can also check our blog What Is Data Collection to learn data collection. If your data doesn't represent the whole population, you can't make reliable conclusions.
Hypothesis testing examples
If you want to test the difference in average height between men and women, your sample should include an equal number of men and women and represent different socio-economic classes and other factors that might affect height.
You should also decide on your scope: Are you looking at heights worldwide or just in one country? A good data source could be census data, as it includes information from various regions and social classes and is available for many countries.
Step 3: Perform a Statistical Test
To check if your hypothesis is correct, you need to perform a statistical test. This test compares the differences within each group to the differences between the groups. Check our blog on How Statistical Inference Like Terms Helps In Analysis? for more information:
- If the groups are quite different from each other with little overlap, you'll get a low p-value, meaning the differences are likely real and not due to chance.
- If there's a lot of variation within the groups and the groups aren't that different, you'll get a high p-value, meaning the differences are probably due to chance.
The test you choose depends on the type of data you have collected.
Hypothesis testing examples
To find out if men are taller than women, you use a one-tailed t-test. This test will:
- Estimate the average height difference between men and women.
- Provide a p-value to show how likely it is that the observed difference happened by chance if there is actually no difference.
Your t-test results show an average height of 175.4 cm for men and 161.7 cm for women. The estimated true difference is at least 10.2 cm, and the p-value is 0.002. This low p-value means it's unlikely the height difference is due to chance.
Step 4: Decide whether to reject or fail to reject your null hypothesis
You need to decide if you should reject or keep your null hypothesis after running your statistical test.
To make this decision, you'll need to use the p-value from your test. You reject the null hypothesis, Usually if the p-value is less than 0.05 (5%). This means there's less than a 5% chance that the results you see happened by chance if the null hypothesis is true.
Researchers use a stricter cutoff like 0.01 (1%) Sometimes to be extra sure they're not mistakenly rejecting the null hypothesis (this is called a Type I error).
Hypothesis testing examples
In your study on the height difference between men and women, you get a p-value of 0.002. Since this is less than 0.05, you reject the null hypothesis and conclude that there is a difference in average height between men and women.
Step 5: Present your findings
you'll share your results in the results when you're done with hypothesis testing statistics and discussion sections of your dissertation, research paper, or thesis.
- Results Section: Give a brief summary of your results and data from your statistical test, like the difference between the p-value and group averages.
- Discussion Section: Discuss whether your results support your initial hypothesis.
You'll likely do this in your statistics assignments, where you'll talk about "rejecting" or "not rejecting" the null hypothesis in the hypothesis test.
Hypothesis testing examples
We found an average difference of 13.7 cm and a p-value of 0.002 in our study comparing the average height of women and men. This means we can reject the null hypothesis that men are not taller than women and conclude that there is likely a difference in height between men and women.
How are One-Tailed and Two-Tailed Tests different?
One-tailed and two-tailed tests are methods to find out if there's a relationship between statistical variables.
One-Tailed Test
A one-tailed test looks for a relationship in one specific direction (either left or right). It's like asking, "Is this number bigger (or smaller) than the expected value?" The rejection area, where we decide if something is statistically significant, is only on one side of the graph. This test uses one critical value to make decisions.
Two-Tailed Test
A two-tailed test checks for relationships in both directions. It asks, "Is this number different from the expected value, either higher or lower?" The rejection areas are on both sides of the graph. This type of test is used when you want to see if the sample is significantly higher or lower than a certain range of values. It's often used for testing the null hypothesis.
What are Type 1 and Type 2 errors in Hypothesis Testing?
A hypothesis test can result in two types of errors:
- Type 1 Error: A Type 1 error happens when you think there's an effect or difference when there isn't one. It's like a false alarm.
- Type 2 Error: A Type 2 error happens when you miss an effect or difference that really exists. It's like not hearing an alarm when you should.
Hypothesis testing examples:
Imagine a teacher grading exams to decide if a student passes or fails.
- H0: The student has passed.
- H1: The student has failed.
- A Type 1 error is when the teacher fails the student (rejects H0) even though the student actually passed (H0 was true).
- A Type 2 error is when the teacher passes the student (does not reject H0) even though the student actually failed (H1 was true).
Different types of Hypothesis Testing
There are mainly three types of hypothesis test that includes:
Chi-Square Test
A Chi-Square test is used to check if your data matches what you expected. It looks at the differences between observed and expected results to see if they fit well. The basic idea is to compare what you actually see in your data with what you would expect to see if the null hypothesis were true.
Z Test
A Z test is used in hypothesis test to see if a finding or relationship is statistically significant. It typically checks if two averages (means) are the same (the null hypothesis). You can use a Z test when you know the population standard deviation and have a sample size of 30 or more.
T Test
A T test is a statistical test used to compare the averages of two groups. It's commonly used in hypothesis testing to see if there's a difference between the two groups or if a treatment has an effect.
When Did Hypothesis Testing Begin?
Some statistics experts are convinced that John Arbuthnot—a satirical poet—was the first person ever to conduct hypothesis tests in 1710. His analysis of birth records from England indicated that male births in most years exceeded female ones. However, Arbuthnot computed the probability of such an outcome randomly occurring and found out that it was quite low; thus he suggested that chance could not have been responsible for what appeared to be divine intervention.
Why Is Hypothesis Testing Important in Research Methodology?
Hypothesis testing statistics is really important in research for a few key reasons:
- Gives You Solid Evidence: It helps researchers draw objective conclusions from data, showing whether their ideas hold up or not.
- Helps With Decisions: It’s useful for making informed choices, like whether to go with a new treatment, change a policy, or try new methods.
- Adds Credibility: It makes research more reliable by using statistical methods to back up conclusions with strong evidence.
- Drives Knowledge Forward: Testing hypotheses helps expand what we know by confirming old theories or discovering new trends and connections.
Some Real Life Applications of Hypothesis Testing
In everyday life, hypothesis test is valuable because it allows individuals to make judgments based on available information. In medical research, to test whether new treatments are effective; in market research, to assess the potential success of proposed products; and in environmental research, to determine the impact of pollution on wildlife, among other things.
In Medicine:
Hypothesis testing helps in medical research to see if new treatments are better than old ones. For example, when a company creates a new drug, they test it to find out if it's more effective and safe compared to current treatments.
In Market Research:
Companies use hypothesis testing to check if new products or ads will be successful. For instance, they might ask people if they prefer a new product and use hypothesis testing to analyze the results.
In Environmental Studies:
Scientists use hypothesis testing to see if pollutants are harming the environment. For example, they might test whether a chemical spill is causing fish populations to drop and use hypothesis test to see if the data supports this.
Do Data Analysts do Hypothesis Testing?
Data analysts use basic statistics to make sense of data, find key numbers, and explore trends. They also use methods like hypothesis test and regression analysis for straightforward models.
Limitations of Hypothesis Testing
Hypothesis testing has some limitations that can impact the quality of the results:
- P-Value Issues: Interpreting a p-value can be tricky because it depends on how you decide when to stop testing and how you handle multiple comparisons. Different rules and interpretations can make it hard to calculate and understand p-values accurately.
- Conceptual Confusion: Problems can come up if a researcher mixes different methods, like Fisher’s and Neyman-Pearson’s approaches, which are based on different concepts.
- Focus on Significance: Sometimes, researchers might focus too much on whether results are statistically significant and ignore other important factors like estimating and repeating experiments.
- Publication Bias: Hypothesis testing can lead to publication bias if only studies with significant results get published, which can skew the overall view of the research.
- Reliability Issues: When trying to find differences between groups, hypothesis testing might lead to unreasonable assumptions that can affect the reliability of the results.
Conclusion
Statistical hypothesis testing, an integral component of Statistics, is an evaluative process for interpreting data. The approach consists of making two conjectures and conducting tests to ascertain the right one. For example, it describes Type I and Type II mistakes with a practical instance of discovering that taking a specific drug could reduce hypertension.
By now, you should have a clearer grasp of hypothesis testing, a key concept in Data Science. Hypotheses often stem from speculations about observed behaviors, natural events, or established theories.
For those interested in the statistical aspects of Data Science and the skills required for this field, consider exploring Data Science online Certification course now.
If you have any questions regarding this topic, please leave them in the comments section. Our experts are ready to help. Enjoy your learning journey!
FAQs
Q1: What is the significance level?
A: The significance level (usually set at 0.05) is the threshold for deciding whether to reject the null hypothesis. It represents the probability of making a Type I error.
Q2: What is a Type II error?
A: A Type II error happens when you fail to reject the null hypothesis when it is actually false. It’s also known as a "false negative."
Q3: What is a p-value?
A: A p-value measures the probability of observing your results, or something more extreme, if the null hypothesis were true. A low p-value indicates strong evidence against the null hypothesis.
Q4: How does sample size affect hypothesis testing?
A: A larger sample size generally provides more reliable results and can help detect smaller effects, while a smaller sample size may lead to less precise estimates and higher chances of errors.
Q5: What is a confidence interval?
A: A confidence interval is a range of values that is likely to contain the true parameter of interest. It provides an estimate of the uncertainty around your sample results.
Data Science Course
Upcoming Batches
Trending Courses
Cyber Security
- Introduction to cybersecurity
- Cryptography and Secure Communication
- Cloud Computing Architectural Framework
- Security Architectures and Models
Upcoming Class
-1 day 23 Nov 2024
QA
- Introduction and Software Testing
- Software Test Life Cycle
- Automation Testing and API Testing
- Selenium framework development using Testing
Upcoming Class
-1 day 23 Nov 2024
Salesforce
- Salesforce Configuration Introduction
- Security & Automation Process
- Sales & Service Cloud
- Apex Programming, SOQL & SOSL
Upcoming Class
-1 day 23 Nov 2024
Business Analyst
- BA & Stakeholders Overview
- BPMN, Requirement Elicitation
- BA Tools & Design Documents
- Enterprise Analysis, Agile & Scrum
Upcoming Class
-1 day 23 Nov 2024
MS SQL Server
- Introduction & Database Query
- Programming, Indexes & System Functions
- SSIS Package Development Procedures
- SSRS Report Design
Upcoming Class
-1 day 23 Nov 2024
Data Science
- Data Science Introduction
- Hadoop and Spark Overview
- Python & Intro to R Programming
- Machine Learning
Upcoming Class
-1 day 23 Nov 2024
DevOps
- Intro to DevOps
- GIT and Maven
- Jenkins & Ansible
- Docker and Cloud Computing
Upcoming Class
3 days 27 Nov 2024
Hadoop
- Architecture, HDFS & MapReduce
- Unix Shell & Apache Pig Installation
- HIVE Installation & User-Defined Functions
- SQOOP & Hbase Installation
Upcoming Class
12 days 06 Dec 2024
Python
- Features of Python
- Python Editors and IDEs
- Data types and Variables
- Python File Operation
Upcoming Class
6 days 30 Nov 2024
Artificial Intelligence
- Components of AI
- Categories of Machine Learning
- Recurrent Neural Networks
- Recurrent Neural Networks
Upcoming Class
-1 day 23 Nov 2024
Machine Learning
- Introduction to Machine Learning & Python
- Machine Learning: Supervised Learning
- Machine Learning: Unsupervised Learning
Upcoming Class
33 days 27 Dec 2024
Tableau
- Introduction to Tableau Desktop
- Data Transformation Methods
- Configuring tableau server
- Integration with R & Hadoop
Upcoming Class
12 days 06 Dec 2024