*2.3. Cloud Computing Resources Scaling*

When using cloud resources with this EDSS, the scaling is performed in two levels: (1) The horizontal pod autoscaling (HPA), which determines how many different Docker pods (images) are brought up in order to receive a request for model execution. The Docker images are configured to run a single model each. The Helm "values.yaml" file has an "hpa" section with an option to configure the minimum and the maximum number of pods brought up. In addition, the CPU target utilization percentage is also defined in this section. Once the CPU usage crosses this limit, a new pod is created; (2) The cloud provider K8s cluster auto-scaling. Once there are more CPU usage requests than cloud computing provides, the cluster will automatically bring up additional computer resources to run the pods according to the limits defined in the cluster settings.

Multiple factors will affect the scaling efficiency, starting from the chosen type of computer node that brings a different type of CPU generation. The run time of a single model can be cut in half between the newest CPU generation and old CPU generations. As mentioned above, different configurations of the K8s cluster and HPA will also have a significant effect on optimizing the scaling, where optimized scaling is defined as running all the model permutations in parallel. However, having optimized scaling comes with having more cloud computing nodes ready for execution, which results in a higher idle time cost since the user needs to pay for the running computing nodes even if almost no CPU is utilized. Thus, having optimized scaling is not necessarily a primary goal.

Nonetheless, we want to benchmark an optimal scaling performance for reference. The "Spokane river example" was used to demonstrate the scaling. This example is shipped with the CE-QUAL-W2 version 4.1 package. We changed the simulation days from 200-205 to 200-300 to create a longer run and highlight the benefit of the scaling. The Google Cloud provider was chosen to deploy the EDSS. The e2-standard-2 (2 vCPUs and 8GB memory) were used for the computing nodes. To perform ten parallel simulations, both the HPA and K8s cluster configurations were made such that all the needed computing nodes and pods were already up and running at the beginning of the execution. That is ten pods and five computing nodes. This run of ten model permutations took 26 min. Using the same configuration, only changing the HPA to allow a single pod (i.e., ten serial runs without parallelization) took 3 h and 50 min (23 × 10 = 230 min).
