STAT1013: Practical Assignment Part 2: Exploring and Analyzing Data (80 Points)
STAT1013: Practical Assignment Part 2: Exploring and Analyzing Data (80 Points)
Part 2 of the practical assignment will involve writing a brief introduction in which you talk about your idea and how you went about gathering your data. You will also take some time to explore your data by graphing it and obtaining summary statistics using Python. You will use the output from your statistical Python package to write about and describe the data you obtained. You will then proceed to test necessary assumptions and conduct the appropriate type of hypothesis test. This will be followed by a conclusion to your project.
Please include all relevant statistical output you generate WITHIN your project document (just as you have done for other assignments in the course). We want you to simply turn in ONE document that has all of the elements we have requested below, rather than submit any statistical output you generate in a separate file. Also, if you work with a group, all members of your team need to submit a completed assignment with all names and sections listed. If you encounter problems copying and pasting your output into your project document, please consult with the TAs as soon as possible!
Your practical assignment document should be formatted by Markdown (you can use HackMD), and you should carefully label each section of your project. There is no minimum or maximum length, but you should be sure to include all the required elements listed below to ensure that you earn as much credit as possible on this assignment. The document does not have to be in APA format.
There are five main parts to this part of your project, and each is described below. You will need to address the following things in your final write-up, and your grade will be based on including of these parts:
Introduction (16 points)
In your introduction, you should talk about your idea. How did you come up with the idea?
You should also put forward your hypotheses (e.g., how you think the two samples will compare) and the reason for your hypotheses. Finally, you should explain how you gathered your data. This part of the practical assignment will be graded as follows:
- How did you come up with the idea? (4 points)
- What are your hypotheses (e.g., how you think the two samples will compare)?
Please write these hypotheses out in words (1 point) AND using appropriate statistical symbols (1 point). You should include both a null and alternative hypothesis. (2 points)
What is the reason for your hypotheses (e.g., why do you think the samples will differ in the way you predict)? (4 points)
How did you gather your data? (4 points)
Graphs and Descriptive Statistics (20 points)
Regardless of whether you are working with independent samples or paired samples, you should take some time to graph your data and generate summary statistics that will allow you to comment on characteristics of the data such as SHAPE, CENTER/LOCATION, and VARIABILITY.
If you have two independent samples, it will be important to graph each sample separately and generate separate tables of descriptive statistics for each sample (e.g. if you are comparing males and females on GPA, one graph should portray the male GPA distribution and one should portray the female GPA distribution, and you should obtain separate summary statistics for each of these groups).
If you are conducting a paired t-test, you will have what we call paired samples. You have two sets of data values, but each case in one set is matched or paired with one particular case in the other set (i.e., the “samples” are not independent in this case). Here, too, we can create graphs and summary statistics for each sample (e.g., if you are comparing prices at Fusion and Taste, one graph should display all the prices from Fusion and the other should display prices from Taste; you will also end up with a set of summary statistics for the prices at Fusion and another set for the prices at Taste).
This part of the practical assignment (where you graph and describe the data) will be graded as
follows:
- At least two appropriate graphs (boxplot, violinplot, …) have been created and interpreted for each sample (6 points)
- Appropriate summary statistics (measurements of center and spread) are provided for each sample and the summary statistics are used to describe the data in words (10 points)
- Similarities and differences between the samples are discussed (4 points)
Verifying Necessary Data Conditions (8 points)
Discuss the necessary data conditions you would need to verify when conducting a two-sample t-test for difference of means or paired t-test for mean of difference and explain whether or not you have satisfied these conditions. Keep in mind that if you are using paired samples, you should plan to compute and graph the differences variable in order to test one of your necessary data conditions. To receive full credit for this part of the project, you should talk about all necessary data conditions and whether or not you feel you are violating any of these conditions, and you should plan to share any output from your chosen software package that you relied on to help you verify necessary data conditions. Please note, for the purpose of this project, it is okay if you violate any of the data conditions. You should just explain why you feel you are violating these conditions and how you think it will affect any conclusions you might draw from your analysis.
Conducting a Hypothesis Test (20 points)
Conduct the appropriate hypothesis test (e.g., either a two-sample t-test or a paired t-test) and discuss the results of your hypothesis test. You will be graded based on the following rubric:
- You have conducted the appropriate hypothesis test (e.g., either a two-sample t-test or a paired t-test) given the kind of data you are working with. (4 points)
- Include Python command and output of hypothesis test. (2 points)
- Discuss the results of your hypothesis test. What was the p-value? Interpret the p- value in your own words (this does not mean draw the conclusion “reject/fail to reject”. Explain what p-value tells you in the context of your data). (4 points)
- Based on the results of the hypothesis test, do you reject or fail to reject Ho? Why? Are the results statistically significant? Are the results practically significant (i.e., are the observed differences meaningful)? Explain. (4 points)
- Based on the results of your hypothesis test, what kind of error could you have made? Please explain. (2 points)
Conclusion and Summary (16 points)
Summarize what you did for this project and what you found. Your summary should include some mention of how you came up with your idea, how you collected your data, and what you found when you explored and analyzed your data. Discuss any shortcomings of the methods you used to gather data. Did you discover anything that surprised you when you analyzed the data? Do you think the results would have been different if you had bigger sample sizes? If you had to do the project again, how would you do it differently? This part of the project will be graded according to the following rubric:
- You summarize your project and include some mention of how you came up with your idea, how you collected your data, and what you found when you explored and analyzed your data. (6 points)
- You discuss any shortcomings of the methods you used to gather data and why you feel these are shortcomings. (6 points)
- You talk about how you would do the project differently if you were to do it over again. (4 points).