*3.2. The EDSS Algorithm*

For demonstration purposes, the example implemented algorithm was a simple recursive grid search, which used two types of inputs as shown in Figure A2 in the Appendix A: (1) *model\_run* that specified the parameters needed for the algorithm to set the different run permutations; (2) *model\_analysis* that specified the parameters for finding the best run. The JSON format for the input file was chosen over other formats, such as CSV or TXT, due to its flexibility. Under the *model\_run* section, there is an option to define any number of model input files that need to be changed in the *input\_files* section. For each input file, the user needs to define the following: (1) *name*—Name of the input file; (2) *col\_name*—Name of the CSV column that needs to be changed (the parameter that is changed); (3) *min\_val*—The minimum value of the parameter that is being changed; (4) *max\_val*—The maximum value of the parameter that is being changed; (5) *steps*—The increase interval in the parameter value between the minimum and maximum definitions. This parameter can hold a list of values. If more than one value is defined, a recursive run is conducted in smaller intervals for further rounds of model simulations around the previous result that best matched the target. Although the user can define the smallest interval in the first run and obtain the same results faster, the recursive option was added to allow cloud computing cost reduction by reducing the overall number of permutations without significantly impacting the accuracy of the result. In the example shown in Figure A2, in the first pass of model execution, nine different permutations will be executed as there are (2 − 1) / 0.5 + 1 = (34 − 30) / 2 + 1 = 3 different parameter values to set in each of the 2 input files. In the second pass of model execution, the minimum and maximum range are set from the previous round best-run parameter value +/− the previous step value divided by two. In this case, there are another 0.5 / 0.05 + 1 = 2 / 0.2 + 1 = 11 different parameter values to set in each of the 2 input files.

Under the *model\_analysis* section, the model output file that needs to be analyzed is defined in the *output\_file* field. Under the section of *parameters*, any number of parameters can be defined. Each parameter has the following definitions: (1) *name***—**Column name in the output csv file; (2) *target***—**The target value of the parameter we want to reach; (3) *weight***—**This is a relative parameter that allows setting a priority between the different defined parameters; (4) *score\_step***—**This is defined in order to unify the units of different parameters. It is defined as the deviation from the target per unit score. For example, in Figure A2, a deviation of 0.1 g/m<sup>3</sup> in total nitrogen (TN) is considered 1 score, while a deviation of 0.5 g/m3 in dissolved oxygen (DO) is considered 1 score. As such, dividing the deviations by the corresponding *score\_step* will facilitate summing the deviations of different parameters in score units, as shown in Equation (1). Equation (1) defines what score each model simulation will receive according to the distance from the target.

$$Score = \sum\_{i=1}^{n} \left( \frac{|\text{target}\,\,i-\text{actual value}\,\,|\,\,/\,score\,\,step\,\,}{weight} \right) \tag{1}$$

Since the score represents deviations from targets, we seek the model simulation that minimizes the score. For example, for the run in Figure A2, we seek the model simulation that minimizes the score defined in Equation (2).

$$Score = \left(\frac{|0.6 - Simulated \text{ TN}| \mid 0.1}{4}\right) + \left(\frac{|11 - Simulated \text{ DO}| \mid 0.6}{2}\right) \tag{2}$$
