PSY2116 Assignment
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
PSY2116 Assignment
1 Handing in the assignment
Please submit your assignments on BrightSpace before the deadline.
2 Instructions
This assignment will allow you to practice conducting statistical analyses as a researching psychologist. As you recall from class, prior to conducting a study you need to ask a question that interests you. I provide an example below, but you are instructed to create your own versions of interesting questions to ask and to test using the supplied data.
You have been provided unique data sets so your statistics and answers must be done on the data set that is assigned to you (i.e., the filename will have your student number). A unique answer key will be made for each data set and available to the TA to help them mark the assignments. If your answers do not match the answer key, you will not receive a full mark.
As discussed in class, hypothesis testing involves 4 steps.
1. Ask a question about the population and state the hypothesis
2. Use hypothesis to predict sample characteristics
3. Obtain sample, collect data, and perform statistical analyses
4. Compare result to prediction and make a decision
3 Statistical tests you need to conduct
You are required to come up with interesting questions/hypotheses for each of the tests below and perform the following statistical tests to address the questions. Please make sure that you are creating a unique question for each question. Get creative and make this activity fun for yourself. You can come up with silly and even unrealistic scenarios.
4 Questions
1. Perform a 1 sample t-test, 2-tailed, α = 0.05, and summarize your results. Make sure to use V1 column in your dataset to answer this ques- tion. For this test, pick a meaningful population mean to which you will be comparing your scores. (5 points)
2. Perform a repeated samples t-test, 2-tailed, α = 0.05, and summarize your results; Calculate Cohen’s d and report it in the conclusion. Make sure to use V1 and V2 columns in your dataset to answer this question. (10 points)
3. Perform a 2-sample independent samples t-test, 2-tailed, α = 0.05, and summarize your results; Calculate the 95% Confidence Intervals around the mean difference and report them in the conclusion. Make sure to use V1 and V2 columns in your dataset to answer this question. (10 points)
4. Perform a 1-way ANOVA, α = 0.05, on the 3 variables and summarize your results; show results for pairwise comparisons (if they are needed). Make sure to use V1, V2 and V3 columns in your dataset to answer this question. (10 points)
For each question, show plot(s) of your data and write your conclusion/summary in proper APA format. For examples of proper APA format, please refer to the lectures. If you are asked to report Cohen’s d or CIs, you may need to calculate these manually (or in R or any other statistical package).
5 Example – 2-sample independent t-test, 2-tailed
5.1 Question/hypothesis
We would like to determine whether the resting heart rate (RHR) of students in PSY2116 is normal compared to the general population of University of Ottawa students. The reason that we are asking this question is because we want to know whether Dr. Konar is causing PSY2116 students to have heart condi- tions that may result in a different RHR compared to the population of all the University of Ottawa students.
To conduct the study and to test our hypothesis, we take a sample of 20 students from each population, n1 = 20 from PSY2116 and n2 = 20 from stopping students on campus at various locations and at different times to get a good representative random sample. We measure the RHR of each participant from both samples and record it (this is equivalent to the data stored under ‘V1’ for sample 1 and ‘V2’ for sample 2).
Having measured the RHRs for all participants, we want to test our hypoth- esis that PSY2116 students have a different mean RHR compared to the general population because we suspect that Dr. Konar is causing students stress, which may present as a different RHR.
When writing up my results, it helps to organize my thoughts using the sequence of steps we learned when conducting a hypothesis test.
5.2 Hypothesis test
Hence, my first step is to state my null and alternative hypotheses. The null hypothesis in this study is that Dr. Konar is not causing heart issues in his students and that the average RHR of PSY2116 students is not different from the general population of students at the University of Ottawa. The alternative hypothesis is that Dr. Konar is causing heart issues in PSY2116 students and thus their RHRs are different from the mean of the general population. Given that we are conducting a 2-tailed test, note the way I structured the formulaic version of the hypotheses:
● H0 : µ 1 - µ2 = 0 or µ 1 = µ2
● H1 : µ 1 µ2
You have the option of stating your hypotheses in words, as in the previous paragraph, or formulaically, as is seen right above this sentence.
For the 2nd step, I want to figure out the criteria for accepting or reject- ing my null hypothesis. For this, I will determine my tcritical using a table. Alternatively, I can rely on R’s built-in function that I will show below.
For the 3rd step, we collect data (this was done for us already) and run the appropriate statistical analyses. Here, we calculate whether there is a significant difference between our sample data and the population, whether this difference
is meaningful (e.g., Cohen’s d), and use an interval estimate of the population mean (e.g., Confidence Intervals) instead of a point estimate (i.e., sample mean).
For the 4th and final step, we make a decision after comparing tobserved to tcritical , and write our results in an APA format. Further, we report our effect size to supplement our t-test in order to determine whether the difference (if significant), was meaningful. Further, using Confidence Intervals, we can state how likely our mean is in relation to the population mean.
6 Your data set
Once you download the file ‘PSY2116-ClassData.zip ’ from BrightSpace, you have to unzip it.
● On Windows, note the directory where you are saving the file. In that directory, you can right-click on the file and choose Extract All; then pick a folder where you want the data to be placed. Pressing ‘Enter’ will just unzip in the same folder.
● On OSX, note the directory where you are saving the file. If you double- click on the ‘PSY2116-ClassData.zip ’ file, it should extract in the same directory in the Finder.
● I haven’t done this on Linux in a long time, so if you cannot figure this out, please come see me and we’ll sort it out. I’m guessing if you are using Linux, this step is a non-issue.
Find your student number among the files. This PSY2116 *.CSV file is your data set (your student # is in place of the *, obviously).
Once you open your data file, it will include 3 columns of randomly generated numbers. The column names are ‘V1’, ‘V2’, and ‘V3’ . There are 20 rows of data in each column. If you are asked to conduct an analysis on 1 sample, then use column 1 titled ‘V1’ . If you are asked to conduct an analysis on 2 samples, then use columns 1 and 2 titled ‘V1’ and ‘V2’ . Finally, when you are asked to do a 1-way ANOVA, you will use all 3 columns for this analysis.
The ‘PSY2116-ClassData.zip ’ file also includes the file ‘PSY2116 1234567.csv’ . I will be using the data from this file to demonstrate analyses in R. You can mimic how I do the analyses on this file and cater it to your data. Do NOT use the data from ‘PSY2116 1234567.csv’ as your own data. This file is for demonstrative purposes only.
7 Getting data into R
To import your data set into R, you have to follow a few steps that I will outline below.
Assume that I saved the data file (‘PSY2116 1234567.csv’) on my Desktop. To import it into RStudio, open RStudio first. Then choose File, New File,
New R Script. All your work will be entered and stored here. Make sure to save this R script as something you’ll recognize, e.g., ‘PSY2116 Assignment.R’ . Now enter the following commands into RStudio. Remember, to execute each line of code in RStudio. Windows: use Ctrl-Enter on the line of code that you would like to execute; OSX: use Command-Enter. Linux: hopefully you can figure this out; if not, talk to me and we’ll figure it out together.
# Create a variable 'file . location' where you will specify the file's # location:
# on Windows, uncomment the following line (but make sure
# to comment out the next line, which only works on OSX)
#
# file . location = 'c:/Users/yaro/Desktop/PSY2116_ 1234567. csv' file .location = '~/Desktop/PSY2116_1234567DEMO .csv' # <- on OSX
# Please note that the user 'yaro' is specific to my computer ONLY. # Use your own user ID that you created in Windows in place of 'yaro'
# Now import the data from your * .CSV file into R . # Create an object 'mydata' where the data will be stored:
# This function is telling RStudio that the file has a header and # data are separated by commas; all data will be stored in 'mydata' mydata = read .table(file .location, header = TRUE , sep = ',')
# View the data:
mydata
## V1 V2 V3
## 1 41 25 20
## 2 43 28 31
## 3 57 19 38
## 4 43 31 38
## 5 51 27 35
## 6 33 27 34
## 7 62 28 30
## 8 52 31 23
## 9 55 28 35
## 10 47 29 40
## 11 39 28 27
## 12 43 18 31
## 13 50 31 21
## 14 43 34 24
## 15 55 28 31
## 16 53 31 29
## 17 46 38 30
## 18 45 28 30
## 19 51 29 21
## 20 51 26 28
# Check your data, it should be a data .frame:
class(mydata)
## [1] "data .frame"
Now spend some time familiarizing yourself with the data. You can do quick descriptive stats and some plots to visualize the data.
# Calculate mean of V1, the first column of data:
mean(mydata$V1)
## [1] 48
# same for V2:
mean(mydata$V2)
## [1] 28 .2
# OR you can use a one - liner
with(mydata, mean(V1)); with(mydata, mean(V2))
## [1] 48
## [1] 28 .2
# Calculate the variance, standard deviation, etc
with(mydata, var(V1)); with(mydata, var(V2))
## [1] 48 .21053
## [1] 19 .43158
with(mydata, sd(V1)); with(mydata, sd(V2))
## [1] 6 .94338
## [1] 4 .408126
# Plot boxplots of the data:
# Look for whether the data are normally distributed, whether they # are skewed
with(mydata, boxplot(V1,V2))
Next, we want to do a 2-sample t-test on the imaginary data. For that I will use the first 2 columns, where V1 will be the data for PSY2116 students and V2 will be data for the sample from the general University of Ottawa population.
So, for my dataset, this is how I would do a 2-sample t-test, 2-tailed:
# The following lines of code show how to do a 2-sample t - test .
# Make sure to select the following 4 lines of code,
# then run the Cmd-Enter command:
with(mydata, # this is your data
t .test(V1, V2, # these are the 2 samples that you are comparing alternative = 'two .sided' , # specify that test is 2- tailed var .equal = TRUE)) # assume that variances are equal
##
## Two Sample t-test
##
## data: V1 and V2
## t = 10 .766, df = 38, p-value = 4 .215e-13
## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval:
## 16 .07704 23 .52296
## sample estimates:
## mean of x mean of y
## 48 .0 28 .2
# Pay attention to the t . test output because it tells you # whether to reject or fail to reject null . In this case, # the result is 'reject null of no difference' . This is based on:
# ``alternative hypothesis: true difference in means is not equal to 0'' #
# The CIs of 16 and 23 do NOT overlap 0, which indicates that the # mean difference is significant between V1 and V2 .
#
# Here is a visualization of the scale between -10 and +30:
# -10 . . . . . . . . . . -1 . . . 0 . . . +1 . . . . . . . . .{+16 . . . . . . . +23} . . . . . +30
# Notice that in the above scale, 0 is not inside the brackets # which include 16 and 23 .
# You can confirm that the CIs are done correctly .
# Below I show how to manually calculate the CIs around the # Sample Mean:
# Recall that the CIs for 2-sample independent samples t - test are: # mu1 - mu2 = (M1 - M2) +/- t . critical*SE
# the 't . critical*SE' is an 'error' term that we can easily # calculate in R:
n1 = with(mydata, length(V1))
n2 = with(mydata, length(V2))
df = n1 + n2 - 2
# Means of the data:
M1 = with(mydata, mean(V1))
M2 = with(mydata, mean(V2))
# Since we have equal 'n', I will use the simple formula:
pooled .variance = with(mydata, (var(V1) + var(V2) ) /2 )
t .critical = qt(0.975, df) # or you can look this up in a table SE = sqrt(pooled .variance/n1 + pooled .variance/n2)
error = t .critical * SE
# Finally, the lower CI is
(M1-M2) - error
## [1] 16 .07704
# Upper CI is
(M1-M2) + error
## [1] 23 .52296
# To avoid this long process, you can simply use the result from # 't . test' function to save time .
Let’s recap what happened. In the above R code, ‘mydata’ is your data from the .CSV file after you load it into R and save it into the variable ‘mydata’ . The next lines of code tell R to conduct a t-test on 2 variables that are inside ‘mydata’, V1 and V2. A quick way to test what names your columns have is to run ‘names(mydata)’ in R. As a default, the t-test function calculates a 2-sided test, so you can remove this line (or keep it for completion). You have to tell the t-test function that the variances between your 2 data vectors are assumed to be equal. As we have done in class, you can run a confidence interval manually using a table in the book to look up tcritical . Alternatively, we can tell the t-test
function to do this for us. You just have to supply the level of confidence that you want it to calculate.
For your report, I want you to report the tcritical . Instead of doing this the hard way, by getting a textbook with t-tables, we can do this easily and efficiently in R. However, it is useful to make sure your R value corresponds to the one in the t-tables from a textbook. For example, for an α = 0.05 and df = 10, the tcritical can be calculated with the following R code:
# Calculating t - critical for alpha=0 . 05, 2- tailed, df=10:
alpha=0.05
df=10
qt(1-alpha/2, df)
## [1] 2 .228139
# What will the t - critical be for alpha=0 . 05, 2- tailed test, for # your data?
# HINT: substitute df=10 with the correct value for degrees of # freedom . What do you get?
For my data set, using an α = 0.05, df = n1 + n2 - 2 = 20 + 20 - 2 = 38, the tcritical = 2.024. Notice that the table in the book does NOT have df = 38, but tcritical for df = 40 is 2.021, which is very close to my tcritical with a slightly smaller df . Although it is standard procedure to use the smaller degrees of freedom and not the larger one (like here), for the sake of this assignment we’ll all choose the larger value. Using R, you are able to calculate the exact tcritical for any df, even the ones that do not show up in the textbook tables.
Next, we want to calculate Cohen’s d.
# To calculate Cohen's d, we need to know the sample means # and pooled variance .
# Sample means:
M1 = with(mydata, mean(V1))
M2 = with(mydata, mean(V2))
# Degrees of freedom:
df1 = with(mydata, length(V1)-1)
df2 = with(mydata, length(V2)-1)
# Variance:
variance1 = with(mydata, var(V1))
variance2 = with(mydata, var(V2))
# Pooled Variance:
pooled .variance = (df1 * variance1 + df2 * variance2) / (df1 + df2)
# Alternative way, using Sums of Squares:
# This requires creating new columns where we calculate the # squared deviations from the mean, and then summing each column: mydata$SquaredDeviation1 = with(mydata, (V1-mean(V1))^2) mydata$SquaredDeviation2 = with(mydata, (V2-mean(V2))^2)
head(mydata) # shows 1st 6 rows of your data
## V1 V2 V3 SquaredDeviation1 SquaredDeviation2
## 1 41 25 20 49 10 .24
## 2 43 28 31
## 3 57 19 38
## 4 43 31 38
## 5 51 27 35
## 6 33 27 34
25
81
25
9
225
0 .04
84 .64
7 .84
1 .44
1 .44
SS1 = with(mydata, sum(SquaredDeviation1))
SS2 = with(mydata, sum(SquaredDeviation2))
pooled .variance2 = (SS1 + SS2) / (df1 + df2)
# Compare the 2 methods of calculating pooled variance:
pooled .variance
## [1] 33 .82105
pooled .variance2
## [1] 33 .82105
# Cohen's d:
(cohen .d = (M1-M2)/sqrt(pooled .variance))
## [1] 3 .404643
7.1 Summary/Conclusion
The final step of the process requires us to report our result in an APA format.
We found that the resting heart rate (RHR) of PSY2116 students (M = 48) was significantly different from that of the general population of University of Ottawa students (M = 28.2), t (38) = 10.77, p < 0.001, Cohen’s d = 3.4, 95% CI [16.08, 23.52]. In other words, we found evidence that Dr. Konar is causing PSY2116 students to have different heart rates than those found in the general population of the University of Ottawa students. In fact, he is causing students to have bradycardia, a lower heart rate of M = 48 compared to the general student population of M = 28.2.
We also found a large effect size which indicates that the class RHR is different enough from the general population of students at the University of Ottawa to indicate that taking PSY2116 with Dr. Konar is affecting the health of PSY2116 students.
2022-07-18