Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ALY6000: Data Analysis

Overview and Rationale

Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the data sets and to get you thinking about key business questions you can answer from this data.

Module Outcomes

This assignment is directly linked to the following learning:

•   Investigate impacts of big data on industry

•   Describe the evolution of big data

•   Analyze data to complete a data rich and visually appealing report

Assignment Instructions

Find one dataset that is of interest to you. Some places to find datasets include:

•  The R Project for Statistical Computing

•  Kaggle

•  U.S. Governments Open Data

•   or your own data.

Your data set should have at least 700, but less than 6000, records and eight (8) attributes and the data should not be “clean” . Part of this assignment will require you to clean the      data yourself.

Please see any accompanying Data Dictionary to understand the fields and values in your chosen dataset is available.

The assignment has three parts.

Part I

Please review the Data Dictionary document as you review the datasets if one is provided.

In order to understand the data, we first need to run some descriptive statistics on the data set.

Start by providing the following for each appropriate variable in the dataset:

1.   Summarize the data in a table.

2.   Graphs that help visualize the data. These can be bar charts, histograms, pie charts, etc. Be sure the chosen graph best represents the information you want to highlight.

3.   Explain the story the data is telling you.

•    What business question do your descriptive analyses answer? Provide a brief discussion of the findings.

•     If there are any unusual values, discuss them. If data values are “out of range,” clean the data as needed. Delete the out of range values and run the analysis again.

•     If you remove out of range values for any of the variables, present both the analysis with the out of range values and the analysis without the out of range value(s).

•     Identify additional questions that the data is leading you to ask. What new attributes are needed to answer those questions?

Part II

Create new attributes based on the data and the questions you identified in Part 1.

For your data set, compute differences between appropriate variable values and create a new variable. For examples, if the data shows yearly sales for different years, by month,  calculate the increase or decrease in sales from month to month.

Then, compute the mean and median for each of the variables you have computed.

Part III

Now that you have worked with the data, what is the data saying to you? What have you learned about the attributes? What are some follow-up questions you would like to have answered? Identify 3-5 observations or follow-up questions that you have.

What to Submit

A presentation slide deck (5-8 slides not including Title and reference list slide) with your findings.

Submit a single file with the following filename: <LastName>_FinalProject.pptx or <LastName>_FinalProject.pdf

Format

Your presentation must:

•    Tell the story of your data through the use of descriptive statistics and visualizations.

o  Remember your visualizations are the primary vehicle you'll use to convey information in an analytics presentation.

o  Include very concise with written information that is highly connected to the points made in the visualizations as a Notes section on each slide.

•    Properly cite all sources using APA citation rules.

Appendix

Assignment Part I Section Example

Business Question:

What is the distribution of the status of the 2017 GxP Audits?

Analysis:

Descriptives Table

Audit Status

Frequency

Percent

Valid Percent

 

 

 

 

Valid    Closed

Completed In Progress Scheduled

Pending        Not In Scope Cancelled

Total

19

19.8

19.8

4

4.2

4.2

18

18.8

18.8

11

11.5

11.5

14

14.6

14.6

26

27.1

27.1

4

4.2

4.2

96

100.0

100.0

Audit Status Count

 

Audit Status Percentages

 

Discussion:

The data file includes information on 96 audits in 2017 for GxP areas. It is unclear if the

data file includes all the known GxP audits in 2017 or if it only includes a subset. A large percentage of all GxP Audits (27. 1%) are not in scope.

19.8% of audits are closed and 4.2% of audits are completed. It is unclear what the            difference between closed” and completed” audits is. We should perhaps ask the client. Do we really need two distinct values?

18.8% of the audits are in progress, 11.5% are scheduled and 14.6% are pending. For the pending audits, the dates of the audit process have not been established.

4.2% of the audits were canceled. It may be interesting to have a notes field where the reasons for cancelation are noted.