1 Math 140 Exam 3 COC Spring 2022 150 Points Question 1 (30 points) Match the following vocabulary words in the table below with the corresponding definitions. Slope Histogram of the residuals Correlation Coefficient (r) Contingency Table Conditional Percentage (Conditional Proportion) Explanatory Variable Scatterplot R-squared Response Variable Sampling Variability Significance Level Type II Error Standard Deviation of the Residual Errors Quantitative Data y-intercept Categorical Data Critical Value Regression Regression Line Census Type I Error Residual Correlation Beta Level Marginal Percentage (Marginal Proportion) Residual Plot P-value Joint Percentage (Joint Proportion) a. A number we compare our test statistic to in order to determine significance. In a sampling distribution or a theoretical distribution approximating the sampling distribution, the critical value shows us where the tail or tails are. The test statistic must fall in the tail to be significant. b. Also called the Alpha Level. If the P-value is lower than this number, then the sample data significantly disagrees with the null hypothesis and is unlikely to have happened by random chance. This is also the probability of making a type 1 error. c. A percentage or proportion involving two variables being true about the person or object, but does not have a condition. There are generally two types (AND, OR). d. The vertical distance between the regression line and a point in the scatterplot. e. Statistical analysis that determines if there is a relationship between two different quantitative variables. f. When biased sample data leads you to support the alternative hypothesis when the alternative hypothesis is actually wrong in the population. g. A graph for visualizing the relationship between two quantitative ordered pair variables. The ordered pairs (?, ?) are plotted on the rectangular coordinate system. 2 h. Data in the form of numbers that measure or count something. They usually have units and taking an average makes sense. i. Also called the line of best fit or the line of least squares. This line minimizes the vertical distances between it and all the points in the scatterplot. j. Collecting data from everyone in a population. k. Statistical analysis that involves finding the line or model that best fits a quantitative relationship, using the model to make predictions, and analyzing error in those predictions. l. The probability of getting the sample data or more extreme because of sampling variability (by random chance) if the null hypothesis is true. m. The predicted y-value when the x-value is zero. n. A statistic between −1 and +1 that measures the strength and direction of linear relationships between two quantitative variables. o. Data in the form of labels that tell us something about the people or objects in the data set. p. Another name for the y-variable or dependent variable in a correlation study. q. A single percentage or proportion without any conditions. In a contingency table, this can be found with numbers in the margins. r. Also called the coefficient of determination. This statistic measures the percent of variability in the y-variable that can be explained by the linear relationship with the x-variable. s. When biased sample data leads you fail to reject the null hypothesis when the null hypothesis is actually wrong in the population. t. Another name for the x-variable or independent variable in a correlation study. u. Also called a two-way table. This table summarizes the counts when comparing two different categorical data sets each with two or more variables. v. The probability of making a type 2 error. w. The amount of increase or decrease in the y-variable for every one-unit increase in the xvariable. x. Random samples values and sample statistics are usually different from each other and usually different from the population parameter. y. A statistic that measures how far points in a scatterplot are from the regression line on average and measures the average amount of prediction error. z. The percentage or proportion calculated from a particular group or if a particular condition was true. These are the very important when studying categorical relationships. aa. A graph that pairs the residuals with the x values. This graph should be evenly spread out and not fan shaped. bb. A graph showing the shape of the residuals. This graph should be nearly normal and centered close to zero. 3 Question 2 (40 Points) ANOVA Mean Hypothesis Test Directions: Use the printouts to answer the following questions. a) Give the null and alternative hypothesis. b) Check the assumptions for a One-Way ANOVA test. c) Write a sentence to explain the F test statistic. d) Use the F test statistic and Critical Value to determine if the sample data significantly disagrees with the null hypothesis. Explain your answer. e) Use the P-value and Significance Level to answer the following: -Write the P-value sentence. -Could the sample data or more extreme have occurred because of sampling variability or is it unlikely that the sample data occurred because of sampling variability? Explain your answer. f) Should we reject the null hypothesis or fail to reject the null hypothesis? Explain your answer. g) Write a conclusion for the hypothesis test addressing evidence and the claim. h) What is the variance between the groups? What is the variance within the groups? Was the variance between significantly higher than the variance within? Explain how you know. i) Was the categorical and quantitative variables related or not. Explain your answer. The Scenario: A census of Math 075 pre-stat students was taken in the fall 2015 semester. The students were separated into three sleep groups: low amount of sleep, moderate amount of sleep, high amount of sleep. They were also asked how many total units they have completed at the college. Though the data was not random, you can assume it was representative of Math 075 students at COC. Use a 10% significance level and the following statistics, graphs and ANOVA printout to test the claim that sleep is not related to the total number of units completed. 4 ANOVA Information: Source of Variation Degrees of Freedom Sum of Squared Mean Sum of Squares F Test Statistic F Critical Value p-Value Treatment (Between Groups) 2 2822.35625 1411.17813 1.83387 2.3133 0.16087 Error (Within Groups) 497 382446.38503 769.50983 Total 499 385268.74128 Descriptive Statistics: Variable Mean Standard Deviation N total Low Sleep Group 32.952 28.586 42 Medium Sleep Group 32.990 27.585 398 High Sleep Group 25.675 28.178 60 Question 3 (40 Points) Chi-Squared Goodness of Fit Hypothesis Test Directions: Use the printouts to answer the following questions. a) Write the null and alternative hypothesis. Include relationship implications. Assume the same proportions for the null. b) Check the assumptions for a Goodness of Fit test. See the notes below. Notes: 1. Assume that we have a census. 2. Assume we have a StatKey Chi-Square Goodness-of-Fit randomization dotplot. 3. Consider whether independence is met or not given our census in this situation. c) What is the Chi-squared test statistic? Write a sentence to explain the test statistic. d) Did the Chi-squared test statistic fall in the tail determined by the critical value? e) Does the sample data significantly disagree with the null hypothesis? Explain your answer. 5 f) What was the P-value? Write a sentence to explain the P-value. Is there significant evidence? g) Use the P-value and significance level to determine if the sample data could have occurred by random chance (sampling variability) or is it unlikely to random chance? Explain your answer. h) Should we reject the null hypothesis or fail to reject the null hypothesis? Explain your answer. i) Write a conclusion for the hypothesis test. Explain your conclusion in plain language. j) Is the population proportion related to the categorical variable or not? Explain your answer. The Scenario: It is a big job to write and grade the AP-statistics exam for high school students each year. It is a difficult multiple-choice exam. All quest
ions have five possible answers A-E. Use a 5% significance level to test the claim that percent of A answers is the same as the percent of B answers which is the same as C, D and E. This would indicate that the letter of the answer is not related to the percentage of times it happens. Generated Samples = 6000 Sample Size = 400 Chi-Squared Statistic = 3.426 Critical Value = 10.125 6 P-Value = 0.495 Question 4 (40 Points) Chi-Squared Categorical Association Test Directions: Use the information provided to answer the following questions. 7 a) Write the null and alternative hypothesis. Make sure to label which one is the claim. Define your populations. b) The Chi-squared test statistic is 357.362? Write a sentence to explain the test statistic. c) Does the test statistic fall in the tail determined by the critical value (13.881)? d) Does the sample data significantly disagree with the null hypothesis? Explain your answer. e) The P-value is 0%? Write a sentence to explain the P-value. f) Compare the P-value to the significance level. Should we reject the null hypothesis or fail to reject the null hypothesis? Explain your answer. g) If the null hypothesis was true, could the sample data or more extreme have occurred by sampling variability or is it unlikely to be sampling variability? Explain your answer. h) Write a conclusion for the test addressing evidence and the claim. Explain your conclusion in nontechnical language. i) Are the categories related or not? Explain your answer. (Hint: This is the second type of goodness of fit test. Is it designed to explore relationships?) j.) Describe the implications of a Type I error for this scenario. k.) Describe the implications of a Type II error for this scenario. The Scenario: Juries are required to meet the racial demographic of the county they represent. Here is the racial demographic for Alameda county: 54% Caucasian, 18% African American, 12% Hispanic American, 15% Asian American, and 1% other. We are worried that the juries in Alameda County may not be representing these percentages. Using a 1% significance level, test the claim that the juries do not represent the demographic of the county. Assume the assumptions are met.
Maths Test