Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PROJECT

STAGE 2: Development of Data-Driven (Artificial Neural Network) Salinity Model

Group Size: 1

1. Background

Please refer to the separate “Overview” document for the entire project.

2. Introduction

The stretch of the Williams River under consideration is subject to large inflows of saline groundwater at the start of  Reach 12 (Figure 1).  The resulting increased levels of salinity at the point where water is extracted for Williamsburg and the irrigation district adjacent to Williamsburg at the end of Reach 12 are the cause of economic losses for domestic, industrial and agricultural water users.

Your task is to develop an artificial neural network (ANN) model that enables the impact of different levels of the saline groundwater inflows on salinity levels at the point where water is extracted for Williamsburg and the surrounding irrigation district to be simulated.  This will require an appropriate model structure (i.e. number of hidden nodes) and the unknown model parameters (i.e. ANN connection weights) to be determined. In addition, the calibrated model needs to be validated to ensure it can be used with confidence in practice.

 

Figure 1: Schematic plan view of stretch of Williams River of interest

3. Design Information

3.1 ANN Model Development

The ANN model architecture to be used is a multi-layer perceptron (MLP). Previous studies have found that the   following variables have the strongest relationship with salinity at the end of Reach 12, which should therefore be used as inputs to the MLPANN model:

•    Salinity at the beginning of Reach 12 (just downstream of the saline groundwater inflows)

•   Flow at the beginning of Reach 12

Consequently, the model has 2 inputs and 1 output (i.e. the salinity at the end of Reach 12). However the optimal model structure (i.e. number of hidden nodes) has to be determined by trial-and-error.  An Excel spreadsheet for   developing the ANN model is provided on MyUni.

3.2 Available Data

The calibration and validation data are available on MyUni.

4. Assessment Tasks

TASK 1: CHECK NUMERICAL SPECIFICATION OF MLPANN  MODEL                                          [10 Marks]

Task 1.1: Enter Calibration Data into MLPANN Model Spreadsheet                                                [2 Marks]

Enter the relevant calibration data into the appropriate cells in the ANN Model Spreadsheet from the data

spreadsheet provided on MyUni.  Please note an ANN model with two hidden nodes is provided for this Task.

Task 1.2: Check Numerical Specification of MLPANN Model (Equations)                                        [8 Marks] In order to ensure that the correct equations are implemented, independently check that all calculations for the MLP   ANN model with two hidden nodes are correct for one of the calibration input-output data samples (please refer to

the information provided on MyUni for which model inputs you should use for this purpose), including those for the

scaling of the input and output data and the implementation of the MLP model itself. You will need to describe the process you have adopted for achieving this using the .docx form provided and show annotated check calculations in the submitted supporting material / spreadsheets.

TASK 2: CALIBRATE / DETERMINE OPTIMAL STRUCTURE OF MLPANN MODEL                         [50 Marks]

Using the calibration data provided, determine appropriate values of the unknown parameters of the MLPANN model by means of model calibration for models with different structures (i.e. different numbers of hidden nodes), thereby also determining which model structure is most appropriate using an iterative approach. MLPs with a maximum of 1 hidden layer should be used and the number of hidden nodes considered should include 0, 1 and 3. You have been provided with the model templates that you will need to use for this purpose (available on MyUni).

Please ignore the templates for ANNs with 2 and 4 hidden nodes.  Unless stated otherwise, please use the following settings in MS Solver for all calibration runs (using both the GRGand EA options):

 


 

Task 2.1: Calibrate ANN Models with Different Numbers of Hidden Nodes (Gradient Method)              [25 Marks] Calibrate ANN models with 0, 1 and 3 hidden nodes using the gradient-based method in Solver (i.e. GRG Nonlinear)  in order to determine the “best” model parameters (i.e. bias values and connection weights). Use the Root Mean

Square Error (RMSE) as the objective function and ensure all decision variables are constrained to be >= -5 and <= 5. Repeat the calibration trials from 5 different starting positions in parameter space, using the starting positions

provided in the Excel spreadsheet on MyUni.  Consequently, the total number of calibration trials is 15 [5 starting positions x 3 different model structures (i.e. 0, 1, 3 hidden nodes)].

•    Record the parameter and objective function values at each of the five starting positions, as well as at the end of each optimisation run.

•    What is the shape of the error surface (e.g. smooth or many local optima, etc.) for each model? What is the impact of the number of hidden nodes on this? Please note that the shape of the error surface is the same  for a model with the same structure for a particular data set and error function, but is likely to be different for models with different structures (i.e. different numbers of hidden nodes).

•    How does the shape of the error surface relate to the ease of the calibration process? Why?

Task 2.2: Calibrate ANN Models with Different Numbers of Hidden Nodes (Evolutionary Method)       [20 Marks] Calibrate ANN models with 0, 1 and 3 hidden nodes using the evolutionary-based method in Solver in order to

determine the “best” model parameters (i.e. bias values and connection weights). Use the Root Mean Square Error  (RMSE) as the objective function and ensure all decision variables are constrained to be >= -5 and <= 5. Repeat the calibration trials from 5 different starting positions in parameter space, using the starting positions provided in the

Excel spreadsheet on MyUni.  Consequently, the total number of calibration trials is 15 [5 starting positions x 3 different model structures (i.e. 0, 1, 3 hidden nodes)].

•    Record the parameter and objective function values at each of the five starting positions, as well as at the end of each optimisation run.

•    How do the two calibration methods considered (i.e. gradient (Task 2.1) and evolutionary (Task 2.2)) respond to search spaces with different properties?  Please note that the error surface for a model with a given

structure (i.e. number of hidden nodes) is the SAME, regardless of which optimisation method (i.e. GRGor EA) is used for calibration. Please also note that it is only possible to determine reliable information on the  shape of the error surface using the GRG method. Consequently, you can assess how well the GRG and  EA methods perform on the same error surface, the characteristics of which were determined in Task 2.1.

•    What are the relative advantages and disadvantages of the two calibration methods considered?

Task 2.3: Select Model Structure and Calibrated Model Parameters                                                [5 Marks]

•    Based on the results from Tasks 2.1 and 2.2, which model structure and calibrated model parameter are you selecting?

•    What criteria did you use to make these decisions? Please note that as the purpose of calibration is to obtain the set of model parameters that result in the best overall model performance, the method that was used to   obtain these parameters (e.g. GRGor EA) is irrelevant.

TASK 3: VALIDATE THE CALIBRATED MLPANN MODEL                                                           [30 Marks]

Task 3.1: Check Replicative Validity of Calibrated Model

Use the calibration data to replicatively validate the calibrated MLPANN model from Task 2.

•    Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 2.3 and enter your calibrated model parameters.

•    Do the model residuals approximate white noise? Does the model behaviour make sense?

•     Is the model replicatively valid?  If the model is not replicatively valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in order to improve model

performance, potentially considering:

o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.) .

o Starting the search from different starting positions than the 5 starting positions provided.

o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the

parameters values at the end of the EA run as starting values of a GRG run). This cycle could potentially be repeated.

If applicable, discuss which alternative calibrated models you have used and why.

Task 3.2: Check Predictive Validity of Calibrated Model

Use the validation data to predictively validate the calibrated MLPANN model from Task 3.1.

•    Enter the validation data into the spreadsheet provided (In Task 1.1. Tab).

•    Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 3.1 and   enter your calibrated model parameters.  Please note that you only have to do this for the ANN with the number of hidden nodes you think performs best and not all ANNs.

•    Is there any indication of the model being over-calibrated or that the way the data were split into calibration and validation subsets is inappropriate based on the similarities / differences in the error metrics for the validation    and calibration data? Is this a reasonable result, given:

o The ratio of the number of model parameters to the number of calibration data points.

o The statistics of the calibration and validation data.

•    Is the model predictively valid? If the model is not predictively valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in order to improve model performance, potentially considering:

o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.).

o Starting the search from different starting positions than the 5 starting positions provided.

o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the parameters values at the end of the EA run as starting values of a GRG run). This cycle could potentially be repeated.

If applicable, discuss which alternative calibrated models you have used and why.

Task 3.3: Check Structural Validity of Calibrated Model

•    Select the ANN model template for the model structure (i.e. number of hidden nodes) selected in Task 3.2 and  enter your calibrated model parameters. Please note that you only have to do this for the ANN with the number of hidden nodes you think performs best and not all ANNs.

•    Do the overall and relative connection weights make physical sense?

•    Is the model structurally valid? To what degree can you determine this? If the model is not structurally valid, you might need to try other calibrated models from Task 2 or you might like to perform additional calibration trials in   order to improve model performance, potentially considering:

o Changing some of the optimisation settings given in the dialogue boxes above (e.g. running the EA for longer, changing the mutation rate to achieve a greater degree of exploration etc.).

o Starting the search from different starting positions than the 5 starting positions provided.

o Using a hybrid approach, as part of which the EA could be used to find a “good” region in the error surface and the GRG method could be used to find local optima in this region (e.g. use the

parameters values at the end of the EA run as starting values of a GRG run).  This cycle could potentially be repeated.

If applicable, discuss which alternative calibrated models you have used and why.

TASK 4: COMPARE STAGE 1 AND 2 MODEL DEVELOPMENT PROCESSES                                  [10 Marks]

Reflect on the similarities and differences between the calibration and validation processes of the process-driven DO model (Stage 1) and the data-driven salinity model (Stage 2).

5. Submission

You are required to submit three files.

1.   Written responses for each task are required to be submitted using the .docx form providedThe responses   should include critical discussion and reflection of the results based on the results presented in the supporting material. This form can be downloaded from MyUni.  Each response is character limited, so you need to think carefully about which information to include in your response and to present your arguments clearly and succinctly. Include your student number in the filename: axxxxxxx_WrittenResponses.docx. The following character limits apply:

-     Task 1: 2,000 characters (WITH SPACES) (10%)

-     Task 2: 8,000 characters (WITH SPACES) (50%)

-     Task 3: 4,500 characters (WITH SPACES) (30%)

-     Task 4: 2,000 characters (WITH SPACES) (10%)

2.   Supporting material, including any relevant figures, tables and references that provide evidence to backup the points made in the written responses, must be included in a single, clearly-labelled document saved as a pdf.  Include your student number in the file name: axxxxxxx_SupportingMaterial.pdf

3.   Supporting calculations/models in a Microsoft Excel file, with all links working so that the markers can run your spreadsheet. Include your student number in the filename: axxxxxxx_calculations.xls. If you require more than one Microsoft Excel file then you may place them in a zipped folder: axxxxxxx_calculations.zip.

The above files should be submitted electronically using the submission portal in MyUni (no hardcopies should be

submitted).  Please follow the submission instructions on MyUni carefully and ensure that you leave sufficient time to enable you to have all documents in the required format for submission and to submit them electronically. Ensuring  this is the case is your responsibility.

6. Assessment Rubric

The following rubric will be used to assess the written submission, supporting material and calculations / models for each task:

 

FAIL

< 50%

PASS

50-64%

CREDIT

65-74%

DISTINCTION 75-84%

HIGH

DISTINCTION

85-100%

Written

Responses

and

Supporting

Material

Appropriate

figures and

tables in

supporting

information, but

lack of

description

 

OR

 

Incomplete /

inappropriate

supporting

information

Comprehensive,

coherent

presentation and

description of all

required findings,

clearly linked to

and backed up by

supporting

information

As Pass PLUS:

Comprehensive,

coherent

explanation and

critical discussion

of findings, clearly

linked to and

backed up by

supporting

information

As Credit

PLUS:

Use of

additional /

higher order

analysis to

support critical

discussion of

appropriate

results to

demonstrate

deeper

understanding

of and greater

insight into the

topic

As Distinction

PLUS:

Use of external

sources to

support critical

discussion of

appropriate

results and to

demonstrate

ability to

consider results

in the broader

context of the

discipline,

beyond the

scope of the

current project

Calculations

Major errors in or incomplete  calculations.

Complete and

correct

calculations /

models, but lack

formatting,

comments and

explanation

Complete and

correct calculations

/ models that are

formatted well, but

lack comments and

explanation

Complete and

correct

calculations /

models that are

formatted and

commented

well, but lack

explanation

Complete and

correct

calculations /

model that are

formatted,

commented and

explained well

Exemplars of written responses, supporting information and calculations / models that correspond to HD and P grades are provided on MyUni for reference.