Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Instructor: Danny Tran Math 10 – Statistics Term Project (160 pts)

You will conduct statistical surveys in which you attempt to make a conclusion about multiple population parameters. Here are the project guidelines:

· You will use the De Anza College student body as your population

· You must create 2 questions, and we will be sampling our class.

o Question #1: This will be a proportion / 1-population question

o Question #2: This will be an average / 2–population question

· You must work in groups of 1 to 3 people (You only have to turn in 1 copy / group). Please contact me if you would like me to help you find a group.

· The project must be typed. (Show all calculations. Calculations may be hand-written) Please complete the project by filling in the blanks in this file.

· The project is due Mar 27 11:59pm  

Goals:

- To prepare for the final exam by reviewing many of the key concepts in this course.

o Confidence intervals, hypothesis tests, histograms, box plots, sampling

- To gain experience conducting, analyzing, & interpreting your very own statistical project.

Part I: Data Collection 

1.)  Submit your 2 questions on Canvas by Mar 2.

2.) Danny will then compile everyone’s questions into a survey and then ask you to complete it.

3.) (10 points extra credit) If interested, you can supplement your data set by taking a sample on campus (at least 30 students). Please detail how you took your sample and how you used at least 1 random sampling technique. Also, include your raw data from your in-person sample.

Part II: One-Population, Proportion Question (80 total pts)

1.) Question:

a) (1pt) What is the question you asked the student?

b) (1pt) Why are you interested in this question? For example, is it a related to a particular hobby, job, or general interest of yours, and how? (There is no right or wrong answer)

2.) Confidence Interval:

a)  (4pt) What are the values of the following statistics / parameters (n, X, p’)? How did you calculate each value (explain with formulas or specific calculator commands used)?

b) (1pt) What is the value of the point estimate of your population parameter?

c) (6pt) Describe your sample statistic in words. What is the distribution of the sample statistic? Explain why it has this distribution.

d) (6pt) For c = 0.9, 0.95, & 0.99, calculate the error bound. Show all work, using the error bound formula.

e) (2pt) Use the error bounds to construct a 90%, 95%, and 99% confidence interval for the population parameter.

f) (3pt) What calculator command did you use to check your work for the confidence interval? Explain why.

g) (2pt) Interpret the meaning of each confidence interval in the context of the problem.

h) (2pt) If you are interested in decreasing the error bound while keeping the confidence level constant, what will happen to the sample size? Explain.

i) (2pt) If you are interested in increasing the confidence level while keeping the sample size constant, what will happen to the error bound? Explain.

j) (2pt) If you are interested in increasing the sample size while keeping the error bound constant, what will happen to the confidence level? Explain.

k) (7pt) How many additional students must you sample for the error bound to be half its size? Do this for c = 0.95. Keep t-score or z-score the same as part d. Show all steps.

3.) Hypothesis Testing:

a) (10pt) Decide on a claim to test about your 1-population proportion question, meaning construct Ho and HA using the population parameter symbols. (Find data online about the U.S. or California parameter to construct Ho) Note that even though you are finding data about another population, you are just using this value to compare to your De Anza population, so this is still a 1-population test. Cite your source by providing the website citing the parameter. Also, explain in words the meaning of your population parameter. This part may be tough. Ask Danny if you need any help.

For example, if your 1 population proportion question is, "What's the proportion of De Anza students that have access to Netflix?" go online and find some statistic that mentions the proportion of people that have access to Netflix. It could be for students or working professionals. It could be for Californians or all Americans. Any statistic would helpful to use as a benchmark to compare your data such as:

https://lendedu.com/blog/netflix-millennials-viewers-not-subscribers/

From here, you can choose a left-tail, right-tail, or 2-tail test (up to you). If you chose a 2-tail in this example, your null hypothesis would be that the DA proportion is 0.92, and the alternative hypothesis would be that the DA proportion is not 0.92.

If you are having difficulty finding a statistic online, please contact me. Please at least do a preliminary search of at least 20 minutes. 

b) (1pt) Is your test a left tail, right tail, or 2-tail test? How can you tell? Why did you decide on this (Note: Many times, this will be up to you as a statistician & what you believe are the appropriate inequalities.)

c) (1pt) What is your significance level? (You can choose your own)

d) (4pt) What are the type I & type II errors? Explain them in the context of the problem.

e) (4pt) What is the distribution of the sample statistic? Explain why it has this distribution.

f) (4pt) What is the value of the test statistic? Show formula & calculations.

g) (8pt) Calculate is the p-value. Show work. Sketch a graph of the region whose area the p-value represents. Label the values of the horizontal axis with the appropriate values.

h) (3pt) Define the p-value in the context of the problem.

i) (2pt) What calculator command did you use to check your work for test statistic & p-value? Explain why.

j) (2pt) Do you reject or fail to reject the null hypothesis? Why or why not?

k) (2pt) Interpret your conclusion in the context of the problem.

Part III: Two-Population, Average Question (50 total pts)

1.) Question & Data:

a) (1pt) What is the question you asked the student?

b) (3pt) Why are you interested in this question? For example, is it a related to a particular hobby, job, or general interest of yours, and how? (There is no right or wrong answer)

c) (4pt) What are the values of your statistics (n1, X1-bar, s1, n2, X2-bar, s2)? In words, what does each statistic represent in the context of the problem? 

2.) Hypothesis Testing:

a) (7pt) Decide on a claim to test, meaning construct Ho and HA using appropriate parameter symbols. Also, explain in words the meaning of your population parameters.

b) (2pt) Is your test a left tail, right tail, or 2 tail test? Why did you decide on this (Note: Many times, this will be up to you as a statistician & what you believe are the appropriate inequalities to use.)

c) (1pt) What is the value of your significance level? You can choose your own.

d) (4pt) What are the type I & type II errors? Explain them in the context of the problem.

e) (6pt) Describe the sample statistic in words. What is the distribution of the sample statistic? Explain why it has this distribution.

f) (8pt) Use your graphing calculator to calculate the test statistic & p-value. State which calculator command you used and explain why.

g) (6pt) Sketch a graph of the region whose area the p-value represents. Label the values of the horizontal axis with the appropriate values.

h) (3pt) Define the p-value in the context of the problem.

i) (3pt) Do you reject or fail to reject the null hypothesis? Why or why not?

j) (2pt) Interpret your conclusion in the context of the problem.

Part IV: Histograms & Plots (Only for Average Question from Part III) (30 total pts)

1.) Histogram

a) (7pt) Construct a histogram for each sample (There should be 2). You can do this by hand (please use a ruler) or on a computer.

b) (1pt) What does the horizontal axis represent? What does the vertical axis represent?

c) (1pt) How many bars are there? (Note: It is completely up to you)

d) (1pt) What is the width of each bar?

2.) Box Plot

a) (6pt) Construct a box plot for each sample (There should be 2). You can do this by hand (please use a ruler) or on a computer.

b) (4pt) What are the 5 key values AND what do these values represent?

c) (2pt) What is the IQR?

d) (2pt) What is the range?

e) (6pt) Are there any outliers? If so, how many? Explain why or why not using the appropriate calculations.