PROJECT STAGE 2: Development of Data-Driven (Artificial Neural Network) Salinity Model
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
PROJECT
STAGE 2: Development of Data-Driven (Artificial Neural Network) Salinity Model
Group Size: 1
1. Background |
Please refer to the separate “Overview” document for the entire project.
2. Introduction |
The stretch of the Williams River under consideration is subject to large inflows of saline groundwater at the start of Reach 12 (Figure 1). The resulting increased levels of salinity at the point where water is extracted for Williamsburg and the irrigation district adjacent to Williamsburg at the end of Reach 12 are the cause of economic losses for domestic, industrial and agricultural water users.
Your task is to develop an artificial neural network (ANN) model that enables the impact of different levels of the saline groundwater inflows on salinity levels at the point where water is extracted for Williamsburg and the surrounding irrigation district to be simulated. This will require an appropriate model structure (i.e. number of hidden nodes) and the unknown model parameters (i.e. ANN connection weights) to be determined. In addition, the calibrated model needs to be validated to ensure it can be used with confidence in practice.
Figure 1: Schematic plan view of stretch of Williams River of interest
3. Design Information |
3.1 ANN Model Development
The ANN model architecture to be used is a multi-layer perceptron (MLP). Previous studies have found that the following variables have the strongest relationship with salinity at the end of Reach 12, which should therefore be used as inputs to the MLPANN model:
• Salinity at the beginning of Reach 12 (just downstream of the saline groundwater inflows)
• Flow at the beginning of Reach 12
Consequently, the model has 2 inputs and 1 output (i.e. the salinity at the end of Reach 12). However the optimal model structure (i.e. number of hidden nodes) has to be determined by trial-and-error. An Excel spreadsheet for developing the ANN model is provided on MyUni.
3.2 Available Data
The calibration and validation data are available on MyUni.
4. Assessment Tasks |
TASK 1: CHECK NUMERICAL SPECIFICATION OF MLPANN MODEL [10 Marks]
Task 1.1: Enter Calibration Data into MLPANN Model Spreadsheet [2 Marks]
Enter the relevant calibration data into the appropriate cells in the ANN Model Spreadsheet from the data
spreadsheet provided on MyUni. Please note an ANN model with two hidden nodes is provided for this Task.
Task 1.2: Check Numerical Specification of MLPANN Model (Equations) [8 Marks] In order to ensure that the correct equations are implemented, independently check that all calculations for the MLP ANN model with two hidden nodes are correct for one of the calibration input-output data samples (please refer to
the information provided on MyUni for which model inputs you should use for this purpose), including those for the
scaling of the input and output data and the implementation of the MLP model itself. You will need to describe the process you have adopted for achieving this using the .docx form provided and show annotated check calculations in the submitted supporting material / spreadsheets.
TASK 2: CALIBRATE / DETERMINE OPTIMAL STRUCTURE OF MLPANN MODEL [50 Marks]
Using the calibration data provided, determine appropriate values of the unknown parameters of the MLPANN model by means of model calibration for models with different structures (i.e. different numbers of hidden nodes), thereby also determining which model structure is most appropriate using an iterative approach. MLPs with a maximum of 1 hidden layer should be used and the number of hidden nodes considered should include 0, 1 and 3. You have been provided with the model templates that you will need to use for this purpose (available on MyUni).
Please ignore the templates for ANNs with 2 and 4 hidden nodes. Unless stated otherwise, please use the following settings in MS Solver for all calibration runs (using both the GRGand EA options):
Task 2.1: Calibrate ANN Models with Different Numbers of Hidden Nodes (Gradient Method) [25 Marks] Calibrate ANN models with 0, 1 and 3 hidden nodes using the gradient-based method in Solver (i.e. GRG Nonlinear) in order to determine the “best” model parameters (i.e. bias values and connection weights). Use the Root Mean
Square Error (RMSE) as the objective function and ensure all decision variables are constrained to be >= -5 and <= 5. Repeat the calibration trials from 5 different starting positions in parameter space, using the starting positions
provided in the Excel spreadsheet on MyUni. Consequently, the total number of calibration trials is 15 [5 starting positions x 3 different model structures (i.e. 0, 1, 3 hidden nodes)].
• Record the parameter and objective function values at each of the five starting positions, as well as at the end of each optimisation run.
• What is the shape of the error surface (e.g. smooth or many local optima, etc.) for each model? What is the impact of the number of hidden nodes on this? Please note that the shape of the error surface is the same for a model with the same structure for a particular data set and error function, but is likely to be different for models with different structures (i.e. different numbers of hidden nodes).
• How does the shape of the error surface relate to the ease of the calibration process? Why?
Task 2.2: Calibrate ANN Models with Different Numbers of Hidden Nodes (Evolutionary Method) [20 Marks] Calibrate ANN models with 0, 1 and 3 hidden nodes using the evolutionary-based method in Solver in order to
determine the “best” model parameters (i.e. bias values and connection weights). Use the Root Mean Square Error (RMSE) as the objective function and ensure all decision variables are constrained to be >= -5 and <= 5. Repeat the calibration trials from 5 different starting positions in parameter space, using the starting positions provided in the
Excel spreadsheet on MyUni. Consequently, the total number of calibration trials is 15 [5 starting positions x 3 different model structures (i.e. 0, 1, 3 hidden nodes)].
• Record the parameter and objective function values at each of the five starting positions, as well as at the end of each optimisation run.
• How do the two calibration methods considered (i.e. gradient (Task 2.1) and evolutionary (Task 2.2)) respond to search spaces with different properties? Please note that the error surface for a model with a given
structure (i.e. number of hidden nodes) is the SAME, regardless of which optimisation method (i.e. GRGor EA) is used for calibration. Please also note that it is only possible to determine reliable information on the shape of the error surface using the GRG method. Consequently, you can assess how well the GRG and EA methods perform on the same error surface, the characteristics of which were determined in Task 2.1.
• What are the relative advantages and disadvantages of the two calibration methods considered?
Task 2.3: Select Model Structure and Calibrated Model Parameters [5 Marks]
• Based on the results from Tasks 2.1 and 2.2, which model structure and calibrated model parameter are you selecting?
• What criteria did you use to make these decisions? Please note that as the purpose of calibration is to obtain the set of model parameters that result in the best overall model performance, the method that was used to obtain these parameters (e.g. GRGor EA) is irrelevant.
TASK 3: VALIDATE THE CALIBRATED MLPANN MODEL [30 Marks]
Task 3.1: Check Replicative Validity of Calibrated Model
Use the calibration data to replicatively validate the calibrated MLPANN model from Task 2.
• Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 2.3 and enter your calibrated model parameters.
• Do the model residuals approximate white noise? Does the model behaviour make sense?
• Is the model replicatively valid? If the model is not replicatively valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in order to improve model
performance, potentially considering:
o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.) .
o Starting the search from different starting positions than the 5 starting positions provided.
o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the
parameters values at the end of the EA run as starting values of a GRG run). This cycle could potentially be repeated.
If applicable, discuss which alternative calibrated models you have used and why.
Task 3.2: Check Predictive Validity of Calibrated Model
Use the validation data to predictively validate the calibrated MLPANN model from Task 3.1.
• Enter the validation data into the spreadsheet provided (In Task 1.1. Tab).
• Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 3.1 and enter your calibrated model parameters. Please note that you only have to do this for the ANN with the number of hidden nodes you think performs best and not all ANNs.
• Is there any indication of the model being over-calibrated or that the way the data were split into calibration and validation subsets is inappropriate based on the similarities / differences in the error metrics for the validation and calibration data? Is this a reasonable result, given:
o The ratio of the number of model parameters to the number of calibration data points.
o The statistics of the calibration and validation data.
• Is the model predictively valid? If the model is not predictively valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in order to improve model performance, potentially considering:
o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.).
o Starting the search from different starting positions than the 5 starting positions provided.
o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the parameters values at the end of the EA run as starting values of a GRG run). This cycle could potentially be repeated.
If applicable, discuss which alternative calibrated models you have used and why.
Task 3.3: Check Structural Validity of Calibrated Model
• Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 3.2 and enter your calibrated model parameters. Please note that you only have to do this for the ANN with the number of hidden nodes you think performs best and not all ANNs.
• Do the overall and relative connection weights make physical sense?
• Is the model structurally valid? To what degree can you determine this? If the model is not structurally valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in order to improve model performance, potentially considering:
o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.).
o Starting the search from different starting positions than the 5 starting positions provided.
o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the
parameters values at the end of the EA run as starting values of a GRG run). This cycle could potentially be repeated.
If applicable, discuss which alternative calibrated models you have used and why.
TASK 4: COMPARE STAGE 1 AND 2 MODEL DEVELOPMENT PROCESSES [10 Marks]
Reflect on the similarities and differences between the calibration and validation processes of the process-driven DO model (Stage 1) and the data-driven salinity model (Stage 2).
5. Submission |
You are required to submit three files.
1. Written responses for each task are required to be submitted using the .docx form provided. The responses should include critical discussion and reflection of the results based on the results presented in the supporting material. This form can be downloaded from MyUni. Each response is character limited, so you need to think carefully about which information to include in your response and to present your arguments clearly and succinctly. Include your student number in the filename: axxxxxxx_WrittenResponses.docx. The following character limits apply:
- Task 1: 2,000 characters (WITH SPACES) (10%)
- Task 2: 8,000 characters (WITH SPACES) (50%)
- Task 3: 4,500 characters (WITH SPACES) (30%)
- Task 4: 2,000 characters (WITH SPACES) (10%)
2. Supporting material, including any relevant figures, tables and references that provide evidence to backup the points made in the written responses, must be included in a single, clearly-labelled document saved as a pdf. Include your student number in the file name: axxxxxxx_SupportingMaterial.pdf
3. Supporting calculations/models in a Microsoft Excel file, with all links working so that the markers can run your spreadsheet. Include your student number in the filename: axxxxxxx_calculations.xls. If you require more than one Microsoft Excel file then you may place them in a zipped folder: axxxxxxx_calculations.zip.
The above files should be submitted electronically using the submission portal in MyUni (no hardcopies should be
submitted). Please follow the submission instructions on MyUni carefully and ensure that you leave sufficient time to enable you to have all documents in the required format for submission and to submit them electronically. Ensuring this is the case is your responsibility.
6. Assessment Rubric |
The following rubric will be used to assess the written submission, supporting material and calculations / models for each task:
|
FAIL < 50% |
PASS 50-64% |
CREDIT 65-74% |
DISTINCTION 75-84% |
HIGH DISTINCTION 85-100% |
Written Responses and Supporting Material |
Appropriate figures and tables in supporting information, but lack of description
OR
Incomplete / inappropriate supporting information |
Comprehensive, coherent presentation and description of all required findings, clearly linked to and backed up by supporting information |
As Pass PLUS: Comprehensive, coherent explanation and critical discussion of findings, clearly linked to and backed up by supporting information |
As Credit PLUS: Use of additional / higher order analysis to support critical discussion of appropriate results to demonstrate deeper understanding of and greater insight into the topic |
As Distinction PLUS: Use of external sources to support critical discussion of appropriate results and to demonstrate ability to consider results in the broader context of the discipline, beyond the scope of the current project |
Calculations |
Major errors in or incomplete calculations. |
Complete and correct calculations / models, but lack formatting, comments and explanation |
Complete and correct calculations / models that are formatted well, but lack comments and explanation |
Complete and correct calculations / models that are formatted and commented well, but lack explanation |
Complete and correct calculations / model that are formatted, commented and explained well |
Exemplars of written responses, supporting information and calculations / models that correspond to HD and P grades are provided on MyUni for reference.
2023-10-16