Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CP2403/CP3413: Assignment – Part 1 – 15%

Data Exploration, Management & Visualization

Due: End of Week 6 (Friday, 31 March 2023)

In Assignment Part 1, you will required to apply appropriate data management and data visualization techniques for a given scenario to create charts.  The techniques required to complete this assignment are covered in Module 1 to 4 of the subject. You will have to explain what conclusions you draw from the charts.

Scenario

The California Cooperative Oceanic Fisheries Investigations (CalCOFI) was formed in 1949 to study the ecological aspects of the sardine population collapse off California. CalCOFI conducts quarterly cruises off southern & central California, collecting a suite of hydrographic and biological data on station and underway. The CalCOFI data set represents the longest (1949 present) and most complete (more than 50,000 sampling stations) time series of oceanographic in the world.

The physical, chemical, and biological data collected at regular time and space intervals quickly became valuable for documenting climatic cycles in the California Current and a range of biological responses to them. Data collected at depths down to 500 m include: temperature, salinity, oxygen, phosphate, silicate, nitrate and nitrite, chlorophyll, transmissometer, PAR and C14 primary productivity.

You are provided with the following:-

1.    bottle.csv

2.    CalCOFI Database Tables Description - Bottle Table.pdf

The following website provides relevant information and documents about the CalCOFI project and data source.

https://calcofi.org/data/oceanographic-data/bottle-database/

Using the dataset and codebook provided, apply appropriate data management techniques. For this assessment, complete the following tasks.

1.    Select one quantitative variable from the dataset to draw a histogram. What is conclusion can you draw from the histogram?

2.    Select one categorical variable (as the explanatory variable) and one quantitative variable (as the   response variable) from the dataset to draw a box plot. What is conclusion can you draw from the box plot?

Note: Instead of selecting an existing categorical variable, you can generate the categorical variable by transforming (grouping) the original quantitative variable into a categorical variable.

3.    Select one quantitative variable from the dataset to draw a line chart. For this, you need to select one other variable as the corresponding explanatory variable. What is conclusion can you draw from the line chart?

4.    Select three quantitative variables from the dataset to draw a bubble chart. What is conclusion can you draw from the bubble chart?

5.    Go to the link below. Go through the different charts and the corresponding code provided there.

Top 50 matplotlib Visualizations – The Master Plots (with full python code)

https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots- python/

Then select one chart/plot from the 50 available and appropriate variable(s) from the dataset

provided (bottle.cvs). Then create the selected chart using/modifying the corresponding code

provided by the website for the variable data you selected from the dataset (bottle.cvs).  What is conclusion can you draw from the chart you have created?

(Note: you are required to select variable(s) which are different from what you select for previous   tasks (1-4). If you still want to select a variable which you’ve already selected for any previous task, you can do but you are required to manipulate the original data in different ways e.g. different

subsampling (filtering) or grouping.

Hint: Refer to Modules 1, 2, 3, and 4 and Practicals 1, 2, 3, and 4 for help on data management and data visualization

Ensure you complete, zip and submit both the ‘Assignment- Part 1 -FirstNameLastName.docx’ and ‘Assignment- Part1-FirstNameLastName.ipynb’ files to LearnJCU. Ensure you add your

FirstName and LastName inside the files and to the file names.

Assignment – Part 1 (15%) Marking Criteria (Rubric) – Total Raw Marks: 100 – 20 full marks for each task.

For each task (20 full marks), the following marking criteria is applied.

Criteria

Exemplary (10-9)

Good (8-7)

Satisfactory (6-5)

Limited (4-3)

Very Limited (2-0)

Formulate

Investigative

Question.

10% (2 marks)

The investigative question is well-formed so that the

purpose of the investigation can be implied clearly and reasonably.

 

 

 

 

 

 

 

 

 

 

 

 

 

Exhibits

aspects of  exemplary (left) and

satisfactory (right)

The investigative question    is formed but not fully clear or reasonable

 

 

 

 

 

 

 

 

 

 

 

 

 

Exhibits

aspects of   satisfactory (left) and

very

limited

(right)

Investigative

question is

meaningless.

 

 

 

 

Data Selection and

Management

(Data

Preprocessing)

 

40% (8 marks)

Select appropriate and valid variables from the dataset    and apply appropriate data management techniques to   connect well to the investigative question you set and to make the dataset fully ready for the next further

processing (visualisation). Jobs required for this include:

-      Determine categorical or quantitative variables appropriately

-      Determine explanatory and respond variables appropriately

-      Apply appropriate subsampling

-      Apply appropriate operations to transform the original data type into the different type (e.g. quantitative to

categorical) if needed.

-      Apply appropriate operations for recoding labels or handling missing or invalid data

Select appropriate/valid variables and apply

appropriate data

management techniques,

but not completely

desirable or missing to

apply some necessary

operations.

Applied limited or no data management

techniques to the

dataset provided

 

Data

Visualizations

using a chart

 

20% (4 marks)

Use appropriate techniques to create the chart as required.

-      All the charts are easy to read.

-      The title, legends and labels are provided.

-      Apply correct use of Python commands and arguments.

Apply techniques to

generate relevant chart result but not fully

desirable with some

incorrect.

Minor missing in title or axis labels.

Visualization

techniques are

applied wrongly or poorly. OR Missing necessary title and axis labels mostly

Interpretation of

charts and

conclusions.

 

20% (4 marks)

Provide appropriate and logical interpretation of the

generated chart to elicit useful/correct conclusions so that the investigative question can be answered.

Interpretation/conclusion is elicit but not fully correct or not fully answering to the

investigative question

Limited or no

interpretation of the generated chart

Notebook

Presentation

 

10% (2 marks)

All contents included in the Jupyter Notebook are well

readable and understandable by adding appropriate

section titles (using Markdown sections) and useful inline comments.

 

Contents included in the

Notebook are arranged

properly but not fully ideal.

 

Notebook contents are not poorly

arranged or not

readable