关键词 > SOST20131&30031

Answering Social Research Questions with Statistical Models (SOST20131&30031), 2023-24


Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Answering Social Research Questions with Statistical Models (SOST20131&30031), 2023-24

Assessment, Part 2 2023

Submission deadline: 2 PM, Tuesday 16th January 2024, via Blackboard/turnitin
Word limit: up to 2,000 words

There are two questions set. Create a report in which you will provide the required answers for both questions.

Q1) Coursework question1 is based around the following data analysis scenario:

Employee Performance Study in a Pharmaceutical Company

The human resource department of a pharmaceutical company has been involved in appraising the performance of people in the Business Analysis area. As part of the process, a random sample (n=48) of employees was selected and various test results and relevant information was recorded. The data collected was as follows:

Variable Name Variable Description

Sales_Perf A score* of the employee’s overall business effectiveness

Creativity A score* based on a creativity test

Mechanical A score* based on a logical reasoning test

Abstract A score* based on an English test

Maths A score* based on a numeracy test

Type A variable coded:

0 if the employee has a Science background

1 if the employee has a Business background

Level A variable recording the level of education as:

1: Degree if the employee has a first degree

2: Masters if the employee has a Masters degree

3: Doctorate if the employee has a PhD 

Expertise                               A variable recording the employee’s main functional area as:

1: Logistics if employed in operations management

2: Promotion if employed in product promotion

3: Sales if employed in product sales

4: Strategy if employed in business strategy

The sample data is available for download as the csv file `EmployeePerformance.csv` from this link


1. By using an appropriate mix of tests develop a multiple regression model that explains Sales Performance variable. Give a full explanation of your model fitting procedure:

- define research questions

- justify your model fitting procedure and findings by appropriate graphs and methods      (40)

2. Use your best fit model to predict Sale_Perf for the following information:

a employees, with a score for Creativity test of 14, score for Mechanical test of 18, Abstract test score of 15 and Match test score of 60; employee has a Science background and a PhD and is employed in product promotion.               

  Comment on how good you think your prediction is.         (10)

Q2) Diabetes is among the most prevalent chronic diseases in the world, impacting millions of people each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood which can lead to reduced quality of life and life expectancy. While there is no cure for diabetes, strategies like losing weight, eating healthily, being active, and receiving medical treatments can mitigate the harms of this disease in many patients. While there are different types of diabetes, type II diabetes is the most common form, and its prevalence varies by age, education, income, location, race, and other social determinants of health.

The Behavioural Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the Centres for Disease Control and Prevention CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviours, chronic health conditions, and the use of preventative services. 

A sample of 150 patients from the data collected in 2015 has been made available for your analysis of the factors linked to the risk of developing the disease.

Diabetes health indicators:  

explanatory variables

HighBP: 0 - no high blood pressure

   1 - high blood pressure

HighChol: 0 - no high cholesterol

           1 - high cholesterol

BMI: Body Mass Index

Smoker: 0 - no smoker

        1 - smoker

Fruits: Consume fruit 1 or more times per day:

0 - no

  1 – yes

Age: Age of the patient

and a binary response variable: 

   Diabetes: 0 - no diabetes

          1 - prediabetes or diabetes

The sample data is available for download as the csv file `Diabetes.csv` from this link 


1. Using appropriate data analysis discuss the importance of the 6 explanatory variables  (15)

2. Using Logistic Regression fit the appropriate model to enable the prediction of the binary response variable Diabetes.

Give a full explanation of your model fitting procedure:

- define research questions

- justify your findings by appropriate graphs and methods    (30)

3. Discuss the accuracy of the final fitted model    (5)

Guidance notes

· Your submission should answer each of the 2 questions above.

· Good answers are those that clearly address all parts of the question, supporting and justifying your answers with references to the lecture notes that you do not have to cite.

· The word limit for your submission is up to 2,000 words. Note that this is a limit, not a target; you are permitted to use fewer than 2,000 words, but not more.

· Both questions are equally weighted.