CP2403/CP3413: Assignment – Part 1 – 15% Data Exploration, Management & Visualization
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CP2403/CP3413: Assignment – Part 1 – 15%
Data Exploration, Management & Visualization
Due: End of Week 6 (Friday, 31 March 2023)
In Assignment Part 1, you will required to apply appropriate data management and data visualization techniques for a given scenario to create charts. The techniques required to complete this assignment are covered in Module 1 to 4 of the subject. You will have to explain what conclusions you draw from the charts.
Scenario
The California Cooperative Oceanic Fisheries Investigations (CalCOFI) was formed in 1949 to study the ecological aspects of the sardine population collapse off California. CalCOFI conducts quarterly cruises off southern & central California, collecting a suite of hydrographic and biological data on station and underway. The CalCOFI data set represents the longest (1949 present) and most complete (more than 50,000 sampling stations) time series of oceanographic in the world.
The physical, chemical, and biological data collected at regular time and space intervals quickly became valuable for documenting climatic cycles in the California Current and a range of biological responses to them. Data collected at depths down to 500 m include: temperature, salinity, oxygen, phosphate, silicate, nitrate and nitrite, chlorophyll, transmissometer, PAR and C14 primary productivity.
You are provided with the following:-
1. bottle.csv
2. CalCOFI Database Tables Description - Bottle Table.pdf
The following website provides relevant information and documents about the CalCOFI project and data source.
https://calcofi.org/data/oceanographic-data/bottle-database/
Using the dataset and codebook provided, apply appropriate data management techniques. For this assessment, complete the following tasks.
1. Select one quantitative variable from the dataset to draw a histogram. What is conclusion can you draw from the histogram?
2. Select one categorical variable (as the explanatory variable) and one quantitative variable (as the response variable) from the dataset to draw a box plot. What is conclusion can you draw from the box plot?
Note: Instead of selecting an existing categorical variable, you can generate the categorical variable by transforming (grouping) the original quantitative variable into a categorical variable.
3. Select one quantitative variable from the dataset to draw a line chart. For this, you need to select one other variable as the corresponding explanatory variable. What is conclusion can you draw from the line chart?
4. Select three quantitative variables from the dataset to draw a bubble chart. What is conclusion can you draw from the bubble chart?
5. Go to the link below. Go through the different charts and the corresponding code provided there.
Top 50 matplotlib Visualizations – The Master Plots (with full python code)
https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots- python/
Then select one chart/plot from the 50 available and appropriate variable(s) from the dataset
provided (bottle.cvs). Then create the selected chart using/modifying the corresponding code
provided by the website for the variable data you selected from the dataset (bottle.cvs). What is conclusion can you draw from the chart you have created?
(Note: you are required to select variable(s) which are different from what you select for previous tasks (1-4). If you still want to select a variable which you’ve already selected for any previous task, you can do but you are required to manipulate the original data in different ways e.g. different
subsampling (filtering) or grouping.
Hint: Refer to Modules 1, 2, 3, and 4 and Practicals 1, 2, 3, and 4 for help on data management and data visualization
Ensure you complete, zip and submit both the ‘Assignment- Part 1 -FirstNameLastName.docx’ and ‘Assignment- Part1-FirstNameLastName.ipynb’ files to LearnJCU. Ensure you add your
FirstName and LastName inside the files and to the file names.
Assignment – Part 1 (15%) Marking Criteria (Rubric) – Total Raw Marks: 100 – 20 full marks for each task.
For each task (20 full marks), the following marking criteria is applied.
Criteria |
Exemplary (10-9) |
Good (8-7) |
Satisfactory (6-5) |
Limited (4-3) |
Very Limited (2-0) |
Formulate Investigative Question. 10% (2 marks) |
The investigative question is well-formed so that the purpose of the investigation can be implied clearly and reasonably. |
Exhibits aspects of exemplary (left) and satisfactory (right) |
The investigative question is formed but not fully clear or reasonable |
Exhibits aspects of satisfactory (left) and very limited (right) |
Investigative question is meaningless. |
Data Selection and Management (Data Preprocessing)
40% (8 marks) |
Select appropriate and valid variables from the dataset and apply appropriate data management techniques to connect well to the investigative question you set and to make the dataset fully ready for the next further processing (visualisation). Jobs required for this include: - Determine categorical or quantitative variables appropriately - Determine explanatory and respond variables appropriately - Apply appropriate subsampling - Apply appropriate operations to transform the original data type into the different type (e.g. quantitative to categorical) if needed. - Apply appropriate operations for recoding labels or handling missing or invalid data |
Select appropriate/valid variables and apply appropriate data management techniques, but not completely desirable or missing to apply some necessary operations. |
Applied limited or no data management techniques to the dataset provided |
||
Data Visualizations using a chart
20% (4 marks) |
Use appropriate techniques to create the chart as required. - All the charts are easy to read. - The title, legends and labels are provided. - Apply correct use of Python commands and arguments. |
Apply techniques to generate relevant chart result but not fully desirable with some incorrect. Minor missing in title or axis labels. |
Visualization techniques are applied wrongly or poorly. OR Missing necessary title and axis labels mostly |
||
Interpretation of charts and conclusions.
20% (4 marks) |
Provide appropriate and logical interpretation of the generated chart to elicit useful/correct conclusions so that the investigative question can be answered. |
Interpretation/conclusion is elicit but not fully correct or not fully answering to the investigative question |
Limited or no interpretation of the generated chart |
Notebook Presentation
10% (2 marks) |
All contents included in the Jupyter Notebook are well readable and understandable by adding appropriate section titles (using Markdown sections) and useful inline comments. |
|
Contents included in the Notebook are arranged properly but not fully ideal. |
|
Notebook contents are not poorly arranged or not readable |
2023-07-19