Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Project Proposal (Video Presentation)

Literature Review and EDA

Due: March 15, 2024 by 11:59PM ET

Goal of the Assessment:

This assignment consists of two parts. The goal of the first part is to give you ahead start with your final project. This will be accomplished by finding an area of interest to study and real- world  data  to  work  with.  The  second  part  of  this  assignment  will  provide  you  with  an opportunity to conduct research in an area you’re interested in. Conducting research will help you determine what has been accomplished regarding your question and to highlight the importance of your proposal.

The steps involved in completing this assignment encompass the general process of proposing a research question and will help to form the basis for a strong introduction section in your final project report. Your task for this assignment is to prepare a video presentation that describes your dataandresearchtopicofinterest. Completing this assignment will also give you the chance to think about the appropriateness of generalized linear models (GLM) as a tool for answering your proposed  research  question  using  your  chosen  data.  Lastly,  this  assignment  provides  an opportunity to get some feedback on your research question that can be used to improve your final report using peer reviews.

Assignment Instructions :

1.   Decide on one (or a few possible) areas of interest that you may want to explore. These areas of interest can be anything that matters or is of interest to you. Some examples could be (but are not limited to) sports, medicine, public health, economics, videogames, literature, etc. Pick something that you truly care about.

2.   Next, think about possible research questions you may want to study regarding these areas  of  interest.  What  do you  want  to  know  about these  areas  of  interest?  For example, you want to make sure that your question can be answered/studied using generalized linear models. GLM is not applicable for all datasets. So, you’ll want to frame your question to be something related to modelling a relationship or classifying a categorical outcome based on this relationship. You’ll also want to consider whether the variable of interest would allow the assumptions of GLM to hold.

3.   After producing aresearch question, you will need to find some open-source data that you may use in your data analysis. You want to make sure that the data you find has both:  1) your response variable of interest (or has variables that could be used to create that variable), and 2) any other variables you may want to use as predictors. By looking for data online, you may realize you need to modify your research question slightly  or  pick  another  one  if  you  can’t  quite  find  the  data  you’re   looking  for. Alternatively, if you are having trouble finding data online but want to stick with this research question, be sure to mention that you expect there to be many limitations to the dataset because it doesn’t quite meet your needs. Step 4 can also help you decide what predictors might be needed for you to answer your question.

To help you identify data for your research question, some examples of open data sources are listed below:

o https://open.toronto.cafor freely accessible data from Toronto

o https://data.ontario.cafor freely accessible data from Ontario

o https://www150.statcan.gc.ca/n1/en/type/data?MM=1for data collected by Statistics Canada

o https://sports-statistics.com/sports-data/for various sports-related datasets

o https://data.oecd.orgfor data on various country-level variables

o https://mdl.library.utoronto.ca for links to many other data portals through the University of Toronto library

4.   Once you’ve found your dataset and have decided on your research question, perform a  literature search about to  learn  more about your research question. A  literature search can be done by performing a search on the University of Toronto library website (https://onesearch.library.utoronto.ca/) or  other  databases  that  feature  scholarly articles to learn about anything related to your area of interest and research question. Look for academic papers or published reports (i.e., preferably peer-reviewed work that has been published in reputable scholarly journals, not websites, blogs, or news articles,  etc.)  that  studied  the  same  research  question  or  something  related, that describes you more about what you may need to consider in your analysis. In your literature  review,  include  academic  papers  or  reports to justify why your  research question  is  important.  Some  other  suggestions  on  performing  a  literature  review include:

o Focus on giving your reader a rough idea of how many academic papers have studied your research topic (or closely related concepts to your topic). This process of looking at the number of academic papers which describe a specific topic tells your audience how popular the area of research is and how much research has been done.

o Give  examples  from  a  few  important  papers  about  what  was  found  or discovered to be important in relation to your question. This can be important variables, important results, surprising results, etc. The process of identifying and describing important papers tells your audience that you are aware of prior results and that you will be using these to plan your analysis.

o Think about how your research question fits into the general area of research about  your  topic.   For   example,  is  your   research  question  different  from research questions in other studies? If so, how?  A novel research question consists of either something that:  1)  nobody has studied before, 2) studied using a different methodology/study design, or 3) studied in a different population. The process of examining if your research question is novel tells your audience that you see the importance of what you are researching and can frame it against what has already been done.

Attached here are some additional library resources which may be helpful for performing a literature review::

o https://guides.library.utoronto.ca/librarysearchtips/gettingstarted for more


details about searching for articles related to your question

o https://guides.library.utoronto.ca/citing for details about why and how to cite your references

o https://guides.library.utoronto.ca/c.php?g=251103&p=1673071 for help getting the correct citation format

5.   After completing a literature review, perform a short exploratory data analysis of your chosen dataset. You will want to focus on identifying anything that you may  need to consider moving forward. This includes identifying in your dataset:

a.   potential confounders,

b.   statistical outliers of the exposure and the confounders of interest (if continuous),

c.   variables with high spread or observations that don’t make sense, and d.   missing data

For section 5, you want to make sure you specifically mention the presence of any of the characteristics in 5a-d (or lack thereof) and what this means for the analysis you will eventually  perform.  For example, this  may  include  describing  how  any of the characteristics  in  5a-d  might  cause  problems  (or  not)  with  the  results  of  GLM  or generalizability. You will  need to  present  univariate or  bivariate  numerical and/or graphical summaries describing the variables. Choose the options that highlight the features of the data that you want to point out but will also let your reader clearly understand the data that you will be working with.

Guidelines for Picking a Dataset

o Government data portals often contain many datasets about diverse topics – if one dataset  doesn’t  have  all  the  variables  you  might  want  to  consider,  feel  free  to combine different datasets together

o When combining datasets, make sure that each unit being measured is the same in both datasets (i.e., it’s reasonable that both measurements are on the same unit)

o There are many data repositories online – if you find a dataset there that is of interest to you, you MUST ensure that your question is different than what the dataset was originally used for.

o YOU  MAY NOT use any dataset that is part of any R package or library, or that is contained in a textbook. If you’re not sure, please ask the instructor.

o You will need to make sure you have enough variables to be able to showcase the statistical methods that you will learn later in the course.

Some topics the teaching team will require include model validation and model refinement so please ensure your dataset has at least 5 predictor variables.

o You will also need to make sure you have enough observations to be able to validate your model, which will involve splitting your dataset into two roughly equal parts.


Presentation Content Requirements:

Your presentation should satisfy the following requirements:

o The presentation should be organized clearly (consider using headings or sections) and include the following information:

a.   Your research question, why you chose it (i.e., why it’s of interest to you), and why it may be of interest to others.

b.   Summaries of academic papers related to your question or topic, highlighting similarities/differences to what you propose, and how you will incorporate this knowledge into your model/project.

c.   Details   and   summaries   on   your   chosen   dataset   including  the  variables collected, the number of observations and anything that stands out in the data that would need to be addressed/investigated further in your analysis.

d.   A discussion about how and why a GLM fits your chosen data. This will allow you  to  answer  your  proposed  research  question,  as  well  as  whether  you anticipate any problems that may arise in your analysis from EDA.

e.   References for where you located the data, and your background research on your topic

o The  presentation  should  be  presented  for  an  audience  that  has  some  statistics background but is not necessarily familiar with the area of your research question or GLMs.

o The presentation should contain figures and/or tables with proper labels/titles as appropriate in your Data Description - Exploratory Data Analysis section

o The presentation should have references listed in proper APA format, and

o The presentation itself should not contain R codes

Technical Requirements:

Your submission to Quercus should include the following:

1.   A video that presents your proposed research area and question, the dataset you have chosen, and the exploration of your dataset.

o The video should be no more than 5 minutes in length

o You must display your U of T Student ID card (or other valid government-issued photo ID) at the beginning of your video The presenter’s face must be visible throughout the video

o The presentation should include an appropriate visual medium (e.g., slides) to display important information in an easily readable way.

o The video  should  be  hosted  on  a  video-sharing  service  (e.g.,  MS  Streams, MyMedia are supported by the university)

2.   The proposed dataset you will use in your Final Project, as a csv or xlsx file, or if too large, as a link to cloud storage where the dataset is saved in csv or xlsx.

3.   A copy of the slides/visual aids used in your presentation saved as a PDF document.

4.   The  R  Markdown  file  containing  the  code  used  to  produce  your  exploratory  data analysis and tables/figures.

How to upload different components of this assignment:

o A link to your video should be added as a comment to your submission. This can be

done via MS Stream or MyMedia.

o Instructions for uploading to MS Stream:https://learn.microsoft.com/en- us/stream/portal-upload-video

o Instructions for uploading to MyMedia:https://ito-

engineering.screenstepslive.com/s/ito_fase/a/1291600-how-do-i-upload-a- video-or-audio-file-to-mymedia

o Both require you to log in with your UofT credentials.

o The R Markdown File should be added as a file upload on the assignment page on Quercus

o The slides used in presentation should be added as a file upload on the assignment page on Quercus

o The dataset you chose to work with should be uploaded  either as a file upload to the assignment page on Quercus OR as an attachment to a comment on your assignment submission. Attaching the file as a comment is best if the dataset is large (>3Mb in size)