Hello, dear friend, you can consult us at any timeif you have any questions, add WeChat: daixieit


MCD2080 Business Statistics

Trimester 1, 2026: Group Assignment
Problem background: Convenience Store

This data is about a Convenience Store, with 14059 customers surveyed. Each row represents a customer, capturing demographics such as gender, marital status, home ownership, number of dependent children, number of units sold mapped against the amount spent by the customer at the convenience store.

Data definition:

In the file “MCD2080 T126 Group Assignment Data.xlsx”, you are provided with both numeric and categorical data. Note that this data has already been cleaned for you, and any missing records are removed. The following table contains the data definitions.

Column
Column Name
Data Definition
A
Customer ID
The unique identifier for the customer
B
Gender
The gender of the customer: Male (M) or Female (F)
C
Marital Status
The marital status of the customer: Single (S) or Married (M)
D
Homeowner
Does the customer own their home? Yes (Y) or No (N)
E
Children
The number of dependent children the customer has
F
Units Sold
The number of units sold in their most recent transaction
G
Amount Spent
The amount spent by the customer in their transaction at the convenience store

Purpose:
We wish to explore the relationships between the revenue earnt by the transaction and other independent variables. This is done by utilising the following statistical tools:
1. Pivot Tables and Charts
2. Summary Statistics
3. Confidence Intervals
4. Hypothesis Testing
5. Regression Analysis

These statistical tools are learned weekly as we progress from Week 1 to Week 9

Assignment questions:
Answer all questions.
1. a)i). Objective: To analyse the provided dataset, your initial task is to create several pivot tables using Excel's PivotTable feature.
Instructions:
Pivot Table 1: Generate a pivot table that displays the percentage distribution of customers categorized by their ownership status.

Pivot Table 2: Generate a pivot table illustrating the percentage distribution of customers based on their gender.

Pivot Table 3: Generate a pivot table showing the percentage distribution of customers according to their number of dependent children.

Ensure that each pivot table accurately reflects the proportion of customers within each category.

Presentation Requirement: Ensure that each pivot table is formatted clearly and professionally to enhance readability and effectively communicate the data insights. (3 marks)

a)ii) Choose an appropriate chart and visualize the information from each pivot table.

Presentation Requirement: Ensure that each chart is formatted clearly and professionally to enhance readability and effectively communicate the data insights. (3 marks)

a)iii) Briefly explain the insights gained from each percentage distribution. (7 marks)

b)i). Now expand your analysis by including amount spent. For this task create three new pivot tables:

- i. Average amount spent by ownership status. The table should show the average amount spent for each group (Yes, No)
- ii. Average amount spent by gender. The table should show the average amount spent for each group (Male, Female)
- iii. Average amount spent by the number of dependent children. The table should show the average amount spent for each group (0, 1, 2, 3, 4, 5)

Leave your answers to two decimal places. (6 marks)

b)ii) Compare the distributions and averages to uncover any meaningful patterns, trends, or anomalies. Focus on how ownership, gender, and number of dependent children relate to amount spent. (7 marks)

2. We wish to compare the distribution of the amount spent by gender.

• Generate Summary statistics and histograms and use them to compare the distributions. For the histograms, use an interval width of 5 and start at 0 for both female and male customers. Use percentage frequency.
• In your summary statistics include Q1 ,Q3 and IQR.
• Discuss your findings. In your discussion, include measures of central tendency, variability and shape.
• When discussing, include contextual interpretations of each of the measures used. Also, include a conclusion.
(2+2+10 = 14 marks)

3. Explore the relationship between the amount spent for females and males.

a). Calculate the 95% Confidence Interval estimate of the true average amount spent for females and for males.

Report your results using the table below and give values to two decimal places.

Confidence Interval Estimate of Average amount spent for females and males
Gender
Lower Boundary / Limit
Upper Boundary / Limit
F (Female)


M (Male)


Give interpretations of the confidence intervals for each gender and use the results to comment on any differences in population average amount spent between the two genders.

(2+6=8 marks)

b) The convenience store is concerned that the rising cost of living has decreased the average amount spent by its customers. Use the knowledge of Hypothesis testing you have learned in class to answer the following question.

Using data for all customers, test whether the average amount spent for a customer is significantly less than $13.20?

Hint: For the test, state the hypotheses, test statistic, p-value and conclusion in the context of the question.

Use a 1% significance level. (7 marks)

4. We wish to analyse proportions of customers that own and do not their homes based on their marital status. To begin your analysis, report your results using the following format:

Count of Homeowner Marital Status
Homeowner
N (No)
Y (Yes)
Grand Total
M (Married)


100.00%
S (Single)


100.00%
Grand Total


100.00%

Calculate the 95% Confidence Interval estimate of the true proportion of Married customers who do not own their home and the true proportion of Married customers who own their home. Report your answers to four decimal places.

Confidence Interval Estimate of Proportion of Married customers

Lower Boundary / Limit
Upper Boundary / Limit
N (No)


Y (Yes)


Calculate the 95% Confidence Interval estimate of the true proportion of Single customers who do not own their home and the true proportion of Single customers who own their home. Report your answers to four decimal places.

Confidence Interval Estimate of Proportion of Single customers

Lower Boundary / Limit
Upper Boundary / Limit
N (No)


Y (Yes)


(2+4 = 6 marks)

Discuss the results from your confidence interval estimations. Your discussion should include an interpretation of each confidence interval and a comparison across the marital statuses and homeownership types. (14 marks)

5. (a) (i). Obtain the correlation matrix between amount spent and all other variables. (Hint: create dummy variables for the Gender, Marital Status and Homeowner categorical variables, use Male, Married and Homeowner=Yes responses to take a 1 when using the IF function and the Female, Single and Homeowner=No as the respective base categories taking a 0). Please round your values to four decimal places. (2 marks)

(ii). Give a brief discussion on the variable that has most impact on the amount spent. Provide your answer to 4 decimal places. (2 marks)

(b). Estimate a multiple regression model to analyse the relationship between:

Amount spent and all other variables listed below.

Female

Single

Owns home

Children

Units Sold

(i) You are required to produce one multiple regression output. Please report values to four decimal places. (3 marks)

(ii) In this section, analyse the impact of the following variables on amount spent. Support your findings with appropriate hypothesis tests:

• Examine whether “Children” has a positive effect on amount spent.
• Examine whether males spend less than females.
• Examine whether being single influences the amount spent.

Conduct your tests using a 10% significance level. Additionally, discuss potential improvements to the model. (10.5+2=12.5 marks)

(iii) Interpret the estimated slope coefficients for “Units Sold” and “Female” in the context of their impact on amount spent. (3+3=6 marks)

(iv) Use R² and standard error to evaluate how effectively the model explains variation in amount spent based on the independent variables. Hint: You need to find the average amount spent. (5 marks) 

6. Based on your responses to Questions 1 through 5, answer the following:

a). Write a short overview of what you discovered. (6 marks)
b). Discuss the factors that seem to influence the amount spent, based on your analysis. (6 marks)

Assignment marks: The maximum total mark for the assignment is 252.5. Your total score will be composed of the following parts:

(i). Week 5 check point – 30 marks (staff: 26 marks & peer to peer evaluation: 4 marks)
(ii). Week 8 check point – 35 marks (staff: 29 marks & peer to peer evaluation: 6 marks)
(iii). Week 11 check point – 70 (staff: 62.5 marks & peer to peer evaluation: 7.5 marks)
(iv). Final Report submission – 117.5 marks (marked by staff) Each group to submit one report.

Please note the following


  1. Weeks 5, 8 and 11, the teacher will provide a brief verbal feedback to each group after the presentations. Students can seek additional feedback by attending Math Hub. In week 11, students will be given until Sunday 5.00 pm to finalise their report including the feedback given by the teacher and upload. Students must bring their draft report to the Week 11 class.
  2. Any group member who will not give feedback to other group members will be awarded zero marks. No extensions will be granted.
  3. You will be required to fill in the peer evaluation on Teammates to be eligible for this component.
  4. In addition, the Unit Leader reserves the right to adjust individual report marks based on the peer evaluation. Should the feedback indicate that an individual did not contribute to the group assignment, the report mark for that student will be adjusted to zero or any other marks accordingly.
  5. The final report should contain the answers to all questions, with screenshots of Excel work. It should include a cover page including names of all group members. The final report should exclude the names of any students who did not contribute. Their marks will be zero. The uploaded final report file can be either a Word document or a PDF file.
  6. Presentations must be submitted via email to the tutorial teacher a day in advance.


Report requirements:

  • All answers should be in font size 12pt and 1.5 spacing.
  • Plots and tables must be legible, with appropriate labels to aid readers.
  • Statistical results need to be summarised in succinct table formats.
  • You will lose marks for poor presentation.

Presentation:

Use PowerPoint or other cloud-based apps eg Google slide, Prezi or Visme, etc. All calculations and justifications must be included in presentation slides.
Week 11 Final Assignment submission guidelines


  • The link is set up using an Assignment Tool on Moodle. Please submit the group Report/Answers in Word document or PDF.
  • If the question has sub-parts, for example, (a), (b)…, please indicate the labels for each part clearly.
  • DO NOT click on "submit all and finish" before you finish all questions.


ONLY 1 attempt is allowed for the Assignment. Group members should appoint one member to submit on behalf of the group.