Figure 1.
Structure of box plots: (
A) A box plot depicting the distribution using five statistical measures: minimum, first quartile (
, 25th percentile), median (
, 50th percentile), third quartile (
, 75th percentile), and maximum. The whiskers extend to the smallest and largest data points within the lower and upper fences (
and
). Data points beyond these limits are considered outliers and are marked with asterisks (*). The notch around the median represents the 95% confidence interval for the median; (
B) A violin plot, a variant of the box plot that provides additional insights into the data distribution. It visualizes the probability density distribution along the y-axis. The central box denotes the interquartile range (IQR), and the white dot marks the median.
Note: Red-colored text highlights key statistical measures (minimum, maximum, quartiles, and median), adapted from [
45] (p. 3).
Figure 1.
Structure of box plots: (
A) A box plot depicting the distribution using five statistical measures: minimum, first quartile (
, 25th percentile), median (
, 50th percentile), third quartile (
, 75th percentile), and maximum. The whiskers extend to the smallest and largest data points within the lower and upper fences (
and
). Data points beyond these limits are considered outliers and are marked with asterisks (*). The notch around the median represents the 95% confidence interval for the median; (
B) A violin plot, a variant of the box plot that provides additional insights into the data distribution. It visualizes the probability density distribution along the y-axis. The central box denotes the interquartile range (IQR), and the white dot marks the median.
Note: Red-colored text highlights key statistical measures (minimum, maximum, quartiles, and median), adapted from [
45] (p. 3).
Figure 2.
Box plot of the minimum values and execution times obtained by various algorithms for the Rastrigin function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 2.
Box plot of the minimum values and execution times obtained by various algorithms for the Rastrigin function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 3.
Box plot of the minimum values and execution times obtained by various algorithms for the Ackley function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 3.
Box plot of the minimum values and execution times obtained by various algorithms for the Ackley function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 4.
Box plot of the minimum values and execution times obtained by various algorithms for the Goldstein–Price function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 4.
Box plot of the minimum values and execution times obtained by various algorithms for the Goldstein–Price function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 5.
Box plot of the minimum values and execution times obtained by various algorithms for the Levy–n13 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 5.
Box plot of the minimum values and execution times obtained by various algorithms for the Levy–n13 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 6.
Box plot of the minimum values and execution times obtained by various algorithms for the Himmelblau function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 6.
Box plot of the minimum values and execution times obtained by various algorithms for the Himmelblau function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 7.
Box plot of the minimum values and execution times obtained by various algorithms for the three-hump camel function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 7.
Box plot of the minimum values and execution times obtained by various algorithms for the three-hump camel function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 8.
Box plot of the minimum values and execution times obtained by various algorithms for the Griewank function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 8.
Box plot of the minimum values and execution times obtained by various algorithms for the Griewank function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 9.
Box plot of the minimum values and execution times obtained by various algorithms for the cross-in-tray function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 9.
Box plot of the minimum values and execution times obtained by various algorithms for the cross-in-tray function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 10.
Box plot of the minimum values and execution times obtained by various algorithms for the EggHolder function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 10.
Box plot of the minimum values and execution times obtained by various algorithms for the EggHolder function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 11.
Box plot of the minimum values and execution times obtained by various algorithms for the Michalewicz function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 11.
Box plot of the minimum values and execution times obtained by various algorithms for the Michalewicz function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 12.
Box plot of the minimum values and execution times obtained by various algorithms for the Alpine function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 12.
Box plot of the minimum values and execution times obtained by various algorithms for the Alpine function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 13.
Box plot of the minimum values and execution times obtained by various algorithms for the e Schaffer 2 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 13.
Box plot of the minimum values and execution times obtained by various algorithms for the e Schaffer 2 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 14.
Box plot of the minimum values and execution times obtained by various algorithms for the e Schaffer 4 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 14.
Box plot of the minimum values and execution times obtained by various algorithms for the e Schaffer 4 function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 15.
Box plot of the minimum values and execution times obtained by various algorithms for the Easom function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 15.
Box plot of the minimum values and execution times obtained by various algorithms for the Easom function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 16.
Box plot of the minimum values and execution times obtained by various algorithms for the Shubert function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 16.
Box plot of the minimum values and execution times obtained by various algorithms for the Shubert function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 17.
Box plot of the minimum values and execution times obtained by various algorithms for the Schwefel function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 17.
Box plot of the minimum values and execution times obtained by various algorithms for the Schwefel function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 18.
Box plot of the minimum values and execution times obtained by various algorithms for the Zettl function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 18.
Box plot of the minimum values and execution times obtained by various algorithms for the Zettl function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 19.
Box plot of the minimum values and execution times obtained by various algorithms for the Leon function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 19.
Box plot of the minimum values and execution times obtained by various algorithms for the Leon function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 20.
Box plot of the minimum values and execution times obtained by various algorithms for the Drop-Wave function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 20.
Box plot of the minimum values and execution times obtained by various algorithms for the Drop-Wave function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 21.
Box plot of the minimum values and execution times obtained by various algorithms for the Langermann function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 21.
Box plot of the minimum values and execution times obtained by various algorithms for the Langermann function. The left plot shows the best cost, while the right plot represents execution time. Different colors correspond to different algorithms, and diamond markers indicate outliers.
Figure 22.
Convergence patterns of mathematical functions: Group 1.
Figure 22.
Convergence patterns of mathematical functions: Group 1.
Figure 23.
Convergence patterns of mathematical functions: Group 2.
Figure 23.
Convergence patterns of mathematical functions: Group 2.
Figure 24.
Convergence patterns of mathematical functions: Group 3.
Figure 24.
Convergence patterns of mathematical functions: Group 3.
Figure 25.
Convergence patterns of mathematical functions: Group 4.
Figure 25.
Convergence patterns of mathematical functions: Group 4.
Figure 26.
Convergence patterns of mathematical functions: Group 5.
Figure 26.
Convergence patterns of mathematical functions: Group 5.
Figure 27.
Trend of testing error variance as a function of training error variance in neural network training tasks.
Figure 27.
Trend of testing error variance as a function of training error variance in neural network training tasks.
Table 1.
Benchmark functions for optimization: formulas, ranges, and minimum values .
Table 1.
Benchmark functions for optimization: formulas, ranges, and minimum values .
Function Name | Formula | Domain Range | |
---|
Rastrigin Function | | | 0 |
Ackley Function | | | 0 |
Goldstein–Price Function | | | 3 |
Levy Function N.13 | | | 0 |
Himmelblau Function | | | 0 |
Three-Hump Camel Function | | | 0 |
Griewank Function | | | 0 |
Cross-in-Tray Function | | | −2.06261 |
Eggholder Function | | | −959.6407 |
Michalewicz Function | | | −1.8013 (for n = 10) |
Alpine Function | | | 0 |
Schaffer Function N.2 | | | 0 |
Schaffer Function N.4 | | | 0 |
Easom Function | | | −1 |
Shubert Function | | | −186.7309 |
Schwefel Function | | | 0 |
Zettl Function | | | −0.003791 |
Leon Function | | | 0 |
Drop-Wave Function | | | −1 |
Langermann Function | | | −4.155 |
Table 2.
Hyperparameters setting for All Algorithms in mathematical benchmark functions experiments.
Table 2.
Hyperparameters setting for All Algorithms in mathematical benchmark functions experiments.
Algorithm | | | | | | |
---|
GD | 1 | 0.01 | - | - | | - |
Adam | 1 | 0.01 | 0.9 | 0.999 | | |
FMA | 100 | 0.01 | 0.9 | 0.999 | | |
MAGD | 100 | 0.01 | 0.9 | 0.999 | | |
SBGD | 100 | 3.00 | - | - | | - |
Table 3.
Setting of the exclusive Hyperparameters for MAGD in mathematical benchmark functions experiments.
Table 3.
Setting of the exclusive Hyperparameters for MAGD in mathematical benchmark functions experiments.
Parameter | Value |
---|
| 1 |
| 3 |
Table 4.
Setting of the exclusive Hyperparameters for SBGD in mathematical benchmark functions experiments.
Table 4.
Setting of the exclusive Hyperparameters for SBGD in mathematical benchmark functions experiments.
Parameter | Value |
---|
| |
| |
| 0.2 |
| 0.9 |
Table 5.
Median of the absolute difference between the true global minima and the found minima across various optimization algorithms for standard mathematical benchmark functions over 10 different random seeds. Bold values indicate the best-performing algorithms.
Table 5.
Median of the absolute difference between the true global minima and the found minima across various optimization algorithms for standard mathematical benchmark functions over 10 different random seeds. Bold values indicate the best-performing algorithms.
Function Name | MAGD | FMA | SBGD | Adam | GD |
---|
Rastrigin | 1.3930 | 0.4975 | 0.4975 | 22.9834 | 22.9877 |
Ackley | 7.5376 | 7.5376 | 0.0001 | 19.3172 | 19.3171 |
Goldstein–Price | 0.0000 | 0.0000 | 0.0000 | 186.3059 | 113.5452 |
Levy-n13 | 0.0000 | 0.0000 | 0.0995 | 4.7850 | 4.8066 |
Himmelblau | 0.0001 | 0.0000 | 0.0000 | 0.0023 | 0.0000 |
Three-Hump Camel | 0.0597 | 0.0000 | 0.0017 | 0.2700 | 0.2091 |
Griewank | 1.6096 | 1.6096 | 1.6219 | 75.7318 | 75.7329 |
Cross-in-Tray | 2.0604 | 2.0604 | 2.0604 | 2.0608 | 2.0611 |
EggHolder | 87.6805 | 79.2623 | 322.1294 | 844.0750 | 873.2270 |
Michalewicz | 0.0001 | 0.0000 | 0.0001 | 0.8996 | 1.3997 |
Alpine | 0.0046 | 0.0001 | 0.0012 | 0.0127 | 0.0559 |
Schaffer 2 | 0.1183 | 0.1182 | 0.1169 | 0.4865 | 0.4989 |
Schaffer 4 | 0.1742 | 0.1743 | 0.1756 | 0.1939 | 0.2063 |
Easom | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Shubert | 6.3178 | 0.0015 | 0.0000 | 157.7766 | 157.7990 |
Schwefel | 81.5770 | 77.6650 | 193.4469 | 666.3456 | 692.4238 |
Zettl | 0.0000 | 0.0008 | 0.0470 | 0.0912 | 0.8460 |
Leon | 0.0004 | 0.0012 | 0.0000 | 0.0003 | 0.0353 |
Drop-Wave | 0.1026 | 0.1026 | 0.0065 | 0.8057 | 0.8054 |
Langermann | 0.2211 | 0.0258 | 0.0003 | 4.7501 | 4.7487 |
Table 6.
Hit rate (in %) of various optimization algorithms across mathematical benchmark functions. The hit rate indicates the percentage of seeds for which each algorithm successfully located the global minimum. Bold values represent the best performance.
Table 6.
Hit rate (in %) of various optimization algorithms across mathematical benchmark functions. The hit rate indicates the percentage of seeds for which each algorithm successfully located the global minimum. Bold values represent the best performance.
Function Name | MAGD | FMA | SBGD | Adam | GD |
---|
Rastrigin | 10 | 50 | 50 | 0 | 0 |
Ackley | 0 | 0 | 100 | 0 | 0 |
Goldstein–Price | 100 | 100 | 100 | 30 | 0 |
Levy-n13 | 100 | 100 | 70 | 10 | 0 |
Himmelblau | 100 | 100 | 100 | 80 | 100 |
Three-Hump Camel | 80 | 100 | 50 | 10 | 30 |
Griewank | 0 | 0 | 0 | 0 | 0 |
Cross-in-Tray | 0 | 0 | 0 | 0 | 0 |
EggHolder | 0 | 0 | 0 | 0 | 0 |
Michalewicz | 100 | 100 | 100 | 10 | 0 |
Alpine | 30 | 100 | 70 | 0 | 0 |
Schaffer 2 | 0 | 0 | 0 | 0 | 0 |
Schaffer 4 | 0 | 0 | 0 | 0 | 0 |
Easom | 0 | 0 | 0 | 0 | 0 |
Shubert | 10 | 60 | 100 | 0 | 0 |
Schwefel | 10 | 10 | 0 | 0 | 0 |
Zettl | 100 | 80 | 60 | 20 | 0 |
Leon | 80 | 70 | 80 | 90 | 0 |
Drop-Wave | 10 | 10 | 90 | 0 | 0 |
Langermann | 90 | 90 | 100 | 0 | 0 |
Table 7.
p-values comparing MAGD hit rates with other algorithms for mathematical benchmark functions. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Table 7.
p-values comparing MAGD hit rates with other algorithms for mathematical benchmark functions. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Comparison | p-Value | Significant Difference | MAGD Performance |
---|
MAGD vs. FMA | 0.602 | - | - |
MAGD vs. SBGD | 0.377 | - | - |
MAGD vs. Adam | 0.019 | Yes | ↑ |
MAGD vs. GD | 0.004 | Yes | ↑ |
Table 8.
Median execution time (in seconds) across 10 random seeds for different algorithms on various mathematical benchmark functions. Bold values indicate the best performance.
Table 8.
Median execution time (in seconds) across 10 random seeds for different algorithms on various mathematical benchmark functions. Bold values indicate the best performance.
Function Name | MAGD | FMA | SBGD | Adam | GD |
---|
Rastrigin | 0.3666 | 5.9666 | 0.9698 | 0.0614 | 0.5393 |
Ackley | 0.2912 | 4.7500 | 0.5302 | 0.0529 | 0.0253 |
Goldstein–Price | 1.2312 | 12.4433 | 1.4645 | 0.3007 | 1.1632 |
Levy-N13 | 0.8149 | 15.9472 | 1.1628 | 0.1855 | 0.1802 |
Himmelblau | 0.3184 | 4.0358 | 0.6075 | 0.1071 | 0.1223 |
Three-Hump Camel | 0.5151 | 5.7206 | 0.3478 | 0.1715 | 0.3120 |
Griewank | 0.7627 | 17.4497 | 0.3462 | 0.2798 | 0.7080 |
Cross-in-Tray | 0.1701 | 5.8178 | 0.8469 | 0.2595 | 0.0011 |
Egg Holder | 1.1273 | 49.7233 | 0.5644 | 0.5661 | 0.5376 |
Michalewicz | 0.1725 | 2.9696 | 0.2883 | 0.0313 | 0.0021 |
Alpine | 0.4941 | 5.2489 | 0.6447 | 0.2768 | 0.2611 |
Schaffer N2 | 0.8606 | 50.4603 | 0.3433 | 0.0251 | 0.5355 |
Schaffer N4 | 0.8669 | 50.1864 | 0.3567 | 0.0259 | 0.5462 |
Easom | 0.0405 | 0.0404 | 0.0627 | 0.0008 | 0.0012 |
Shubert | 0.9665 | 23.1907 | 1.9983 | 0.1549 | 1.2867 |
Schwefel | 0.4966 | 22.5983 | 0.2993 | 0.2717 | 0.2537 |
Zettl | 0.5216 | 6.2368 | 0.8068 | 0.4134 | 0.4000 |
Leon | 0.4899 | 7.5925 | 0.5555 | 0.3356 | 0.3564 |
Drop-Wave | 0.2668 | 3.2161 | 0.3625 | 0.0337 | 0.0283 |
Langermann | 1.7498 | 33.9237 | 3.0435 | 0.2352 | 0.2415 |
Table 9.
Statistical comparison of execution duration between algorithms for mathematical benchmark functions.
Table 9.
Statistical comparison of execution duration between algorithms for mathematical benchmark functions.
Comparison | MAGD | FMA | SBGD | Adam | GD |
---|
mean ± std | 0.63 ± 0.43 | 16.5 ± 16.74 | 0.76 ± 0.71 | 0.2 ± 0.19 | 0.39 ± 0.38 |
median | 0.516 | 7.516 | 0.534 | 0.204 | 0.293 |
95% HDI | | | | | |
Table 10.
p-values comparing MAGD execution time with other algorithms for mathematical benchmark functions. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Table 10.
p-values comparing MAGD execution time with other algorithms for mathematical benchmark functions. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Comparison | p-Value | Significant Difference | MAGD Performance |
---|
MAGD vs. FMA | 0.0000 | Yes | ↑ |
MAGD vs. SBGD | 0.0245 | Yes | ↑ |
MAGD vs. Adam | 0.0000 | Yes | ↓ |
MAGD vs. GD | 0.0000 | Yes | ↓ |
Table 11.
Experimental datasets.
Table 11.
Experimental datasets.
No. | Dataset Name | Features | Samples | Task |
---|
1 | Iris | 4 | 150 | Classification |
2 | Wine | 13 | 178 | Classification |
3 | Breast-Cancer-Wisconsin | 30 | 569 | Classification |
4 | Digits | 64 | 1797 | Classification |
5 | Pima-Indians-Diabetes | 8 | 768 | Classification |
6 | Connectionist Bench | 8 | 208 | Classification |
7 | Glass Identification | 9 | 214 | Classification |
8 | Balance Scale | 4 | 625 | Classification |
9 | Banknote Authentication | 4 | 1372 | Classification |
10 | Adult | 14 | 48,842 | Classification |
11 | Diabetes | 10 | 442 | Regression |
12 | California Housing | 8 | 20,640 | Regression |
13 | Boston Housing | 13 | 506 | Regression |
14 | Auto MPG | 7 | 398 | Regression |
15 | Concrete Compressive Strength | 8 | 1030 | Regression |
16 | Energy Efficiency | 8 | 768 | Regression |
17 | Yacht Hydrodynamics | 7 | 308 | Regression |
18 | Forest Fires | 12 | 517 | Regression |
19 | Airfoil Self-Noise | 5 | 1503 | Regression |
20 | Concrete Slump Test | 7 | 103 | Regression |
Table 12.
Hyperparameters settings for neural network training across different algorithms. The symbol “-” indicates that the parameter is absent in the respective algorithm.
Table 12.
Hyperparameters settings for neural network training across different algorithms. The symbol “-” indicates that the parameter is absent in the respective algorithm.
Algorithm | | | | | | | |
---|
MAGD | 10 | 0.03 | 500 | | 0.9 | 0.999 | |
FMA | 10 | 0.03 | 500 | | 0.9 | 0.999 | |
SBGD | 10 | 3 | 500 | | - | - | - |
Adam | 1 | 0.03 | 500 | | 0.9 | 0.999 | |
GD | 1 | 0.03 | 500 | | - | - | - |
Table 13.
MAGD exclusive hyperparameters settings for neural network training.
Table 13.
MAGD exclusive hyperparameters settings for neural network training.
Table 14.
SBGD exclusive hyperparameters settings for neural network training.
Table 14.
SBGD exclusive hyperparameters settings for neural network training.
| | | |
---|
0.9 | 0.2 | | |
Table 15.
Training error values obtained by MAGD, FMA, SBGD, Adam, and GD for each dataset. Bolded values indicate the best result. ‘*’ denotes the best result among the algorithms excluding FMA, as FMA is not always the best.
Table 15.
Training error values obtained by MAGD, FMA, SBGD, Adam, and GD for each dataset. Bolded values indicate the best result. ‘*’ denotes the best result among the algorithms excluding FMA, as FMA is not always the best.
Dataset Name | MAGD | FMA | SBGD | Adam | GD |
---|
Iris | 0.2114 | 0.2084 | 0.5458 | 0.2084 * | 0.6506 |
Wine | 0.0872 | 0.0869 | 0.3065 | 0.0869 * | 0.3303 |
Breast Cancer Wisconsin | 0.0688 * | 0.0688 | 0.3058 | 0.0689 | 0.3588 |
Digits | 0.2574 * | 0.2574 | 1.5106 | 0.2599 | 1.7374 |
Pima Indians Diabetes | 0.4480 * | 0.4470 | 0.5293 | 0.4510 | 0.5441 |
Connectionist Bench | 0.0855 * | 0.0852 | 1.3007 | 0.0863 | 1.0639 |
Glass Identification | 0.6890 * | 0.6890 | 1.2473 | 0.7306 | 1.2543 |
Balance Scale | 0.3320 * | 0.3320 | 0.5087 | 0.3521 | 0.7724 |
Banknote Authentication | 0.1027 | 0.1021 | 0.3473 | 0.1023 * | 0.5598 |
Adult | 0.3333 * | 0.3333 | 2.5611 | 0.3333 * | 2.4446 |
Diabetes | 0.4137 * | 0.4022 | 0.6237 | 0.4194 | 0.5368 |
California Housing | 0.3351 | 0.3328 | 0.5682 | 0.3335 * | 0.4378 |
Boston Housing | 0.1267 | 0.1252 | 0.5145 | 0.1256 * | 0.3396 |
Auto MPG | 0.1332 * | 0.1324 | 0.2308 | 0.1337 | 0.2324 |
Concrete Compressive Strength | 0.1678 * | 0.1660 | 0.4276 | 0.1682 | 0.3844 |
Energy Efficiency | 0.0667 * | 0.0651 | 0.1787 | 0.0753 | 0.1316 |
Yacht Hydrodynamics | 0.0681 | 0.0596 | 0.4158 | 0.0672 * | 0.3596 |
Forest Fires | 0.4871 * | 0.4801 | 1.5411 | 0.5106 | 1.4490 |
Airfoil Self-Noise | 0.2983 * | 0.2983 | 0.6478 | 0.3406 | 0.4724 |
Concrete Slump Test | 0.0391 * | 0.0361 | 0.1948 | 0.0405 | 0.1354 |
Table 16.
Testing error values obtained by MAGD, FMA, SBGD, Adam, and GD for each dataset. Bolded values indicate the best result. ‘*’ denotes the best result among the algorithms excluding FMA, as FMA is not always the best.
Table 16.
Testing error values obtained by MAGD, FMA, SBGD, Adam, and GD for each dataset. Bolded values indicate the best result. ‘*’ denotes the best result among the algorithms excluding FMA, as FMA is not always the best.
Dataset Name | MAGD | FMA | SBGD | Adam | GD |
---|
Iris | 0.1756 | 0.1723 | 0.5853 | 0.1723 * | 0.6774 |
Wine | 0.0914 * | 0.0914 | 0.3137 | 0.0914 * | 0.281 |
Breast Cancer Wisconsin | 0.1443 | 0.1443 | 0.3590 | 0.1434 * | 0.3865 |
Digits | 0.3250 * | 0.3250 | 1.5970 | 0.3269 | 1.8109 |
Pima Indians Diabetes | 0.5381 | 0.5378 | 0.5264 * | 0.5340 | 0.5449 |
Connectionist Bench | 0.8415 | 0.8075 | 1.3797 | 0.7793 * | 1.4082 |
Glass Identification | 1.2559 | 1.1650 | 1.3674 | 1.1613 * | 1.4225 |
Balance Scale | 0.3393 * | 0.3393 | 0.4879 | 0.3603 | 0.7383 |
Banknote Authentication | 0.1013 | 0.1007 | 0.3460 | 0.1009 * | 0.5542 |
Adult | 0.3430 * | 0.3430 | 2.5628 | 0.3430 * | 2.4413 |
Diabetes | 0.5584 | 0.6359 | 0.5827 | 0.5825 | 0.4455 * |
California Housing | 0.3338 | 0.3303 | 0.5903 | 0.3314 * | 0.4498 |
Boston Housing | 0.1522 * | 0.1533 | 0.4468 | 0.1644 | 0.2982 |
Auto MPG | 0.1173 | 0.1133 | 0.1941 | 0.1165 * | 0.1926 |
Concrete Compressive Strength | 0.2063 * | 0.2070 | 0.4983 | 0.2086 | 0.4252 |
Energy Efficiency | 0.0670 * | 0.0661 | 0.1706 | 0.0728 | 0.1229 |
Yacht Hydrodynamics | 0.0705 | 0.0613 | 0.3533 | 0.0695 * | 0.3237 |
Forest Fires | 3.8441 | 3.2475 | 0.5428 | 2.9355 | 0.4613 * |
Airfoil Self-Noise | 0.2906 * | 0.2906 | 0.6447 | 0.3325 | 0.4701 |
Concrete Slump Test | 0.0973 | 0.0756 | 0.3574 | 0.0829 * | 0.2542 |
Table 17.
Accuracy scores of different algorithms across various classification datasets. Bold values highlight the highest accuracy achieved on each dataset. (*) indicates the maximum accuracy among all algorithms except FMA.
Table 17.
Accuracy scores of different algorithms across various classification datasets. Bold values highlight the highest accuracy achieved on each dataset. (*) indicates the maximum accuracy among all algorithms except FMA.
Dataset Name | MAGD | FMA | SBGD | Adam | GD |
---|
Iris | 100% * | 100% | 80% | 100% * | 78% |
Wine | 98% * | 98% | 94% | 98% * | 98% * |
Breast-Cancer-Wisconsin | 96% * | 96% | 94% | 96% * | 96% * |
Digits | 97% * | 97% | 82% | 97% * | 77% |
Pima-Indians-Diabetes | 79% * | 79% | 77% | 76% | 77% |
Connectionist Bench | 78% * | 76% | 75% | 76% | 76% |
Glass Identification | 62% | 57% | 48% | 55% | 45% |
Balance Scale | 90% * | 90% | 90% * | 87% | 80% |
Banknote Authentication | 99% * | 98% | 90% | 98% | 85% |
Adult | 85% * | 85% | 79% | 85% * | 80% |
Table 18.
scores of different algorithms across various regression datasets. Bold values highlight the highest score achieved on each dataset. (*) indicates the maximum score among all algorithms except FMA.
Table 18.
scores of different algorithms across various regression datasets. Bold values highlight the highest score achieved on each dataset. (*) indicates the maximum score among all algorithms except FMA.
Dataset Name | MAGD | FMA | SBGD | Adam | GD |
---|
Diabetes | 0.41 | 0.32 | 0.33 | 0.38 | 0.50 * |
California Housing | 0.71 * | 0.71 | 0.45 | 0.71 * | 0.58 |
Boston Housing | 0.89 * | 0.89 | 0.56 | 0.88 | 0.72 |
Auto MPG | 0.89 * | 0.90 | 0.79 | 0.89 * | 0.79 |
Concrete Strength | 0.85 * | 0.85 | 0.56 | 0.85 * | 0.63 |
Energy Efficiency | 0.96 * | 0.97 | 0.84 | 0.95 | 0.89 |
Yacht Hydrodynamics | 0.98 * | 0.98 | 0.58 | 0.98 * | 0.62 |
Forest Fires | −23.66 | −19.46 | −1.58 | −17.24 | −0.98 * |
Airfoil Self-Noise | 0.79 * | 0.79 | 0.37 | 0.72 | 0.55 |
Concrete Slump Test | 0.95 | 0.96 | 0.74 | 0.96 * | 0.82 |
Table 19.
Comparison of MAGD and Adam in terms of the superiority ratio in classification tasks .
Table 19.
Comparison of MAGD and Adam in terms of the superiority ratio in classification tasks .
Comparison | |
---|
MAGD ≻ Adam | |
Table 20.
Comparison of MAGD, Adam, and GD in terms of the superiority ratio in regression tasks .
Table 20.
Comparison of MAGD, Adam, and GD in terms of the superiority ratio in regression tasks .
Comparison | |
---|
MAGD ≻ Adam | |
MAGD ≻ GD | |
Table 21.
Performance comparison of execution time (in seconds) between MAGD and other methods for neural network training. The shortest execution time for each dataset is bolded.
Table 21.
Performance comparison of execution time (in seconds) between MAGD and other methods for neural network training. The shortest execution time for each dataset is bolded.
Dataset Name | MAGD | FMA | SBGD | Adam | GD |
---|
Iris | 2.0073 | 7.4819 | 0.1353 | 0.7797 | 0.7647 |
Wine | 1.8196 | 7.8309 | 0.1648 | 0.8613 | 0.8372 |
Breast Cancer-Wisconsin | 2.2510 | 10.2646 | 0.4303 | 1.1015 | 1.0794 |
Digits | 11.5828 | 53.5168 | 3.0308 | 6.4266 | 6.6754 |
Pima-Indians-Diabetes | 1.9711 | 8.9067 | 0.4297 | 1.0061 | 1.0352 |
Connectionist Bench | 2.6584 | 16.9505 | 0.4844 | 2.5504 | 2.4633 |
Glass Identification | 1.3941 | 7.9909 | 0.2585 | 0.9120 | 0.9146 |
Balance Scale | 1.7598 | 8.3114 | 0.1932 | 0.9340 | 0.8763 |
Banknote Authentication | 1.8752 | 9.4750 | 0.5279 | 1.0181 | 0.9934 |
Adult | 160.4189 | 802.5116 | 40.1365 | 89.4274 | 88.4908 |
Diabetes | 8.0211 | 7.0622 | 0.1925 | 0.8138 | 0.7878 |
California Housing | 1.9058 | 57.6776 | 0.24739 | 7.1823 | 8.1433 |
Boston Housing | 1.7830 | 7.3301 | 0.2278 | 0.8431 | 0.8244 |
Auto MPG | 1.2297 | 6.7451 | 0.1115 | 0.7248 | 0.7024 |
Concrete Compressive Strength | 1.7650 | 7.5510 | 0.1668 | 0.8032 | 0.7982 |
Energy Efficiency | 2.2301 | 7.2413 | 0.1361 | 0.7633 | 07519 |
Yacht Hydrodynamics | 1.088 | 6.6225 | 0.1232 | 0.7054 | 0.6873 |
Forest Fires | 2.1748 | 8.4036 | 0.2031 | 0.9233 | 0.8603 |
Airfoil Self-Noise | 2.0718 | 7.6019 | 0.1645 | 0.8126 | 0.8074 |
Concrete Slump Test | 1.6497 | 6.4361 | 0.1177 | 0.6745 | 0.6580 |
Table 22.
Statistical comparison of iterations required for convergence for different algorithms in neural network training.
Table 22.
Statistical comparison of iterations required for convergence for different algorithms in neural network training.
Algorithm | Mean ± Std. Dev. |
---|
MAGD | |
FMA | |
SBGD | |
Adam | |
GD | |
Table 23.
Statistical comparison between the execution duration of different algorithms for neural network training.
Table 23.
Statistical comparison between the execution duration of different algorithms for neural network training.
Comparison | MAGD | FMA | SBGD | Adam | GD |
---|
mean ± std | 10.6 ± 34.5 | 52.8 ± 172.6 | 2.5 ± 8.7 | 6.0 ± 19.2 | 6.0 ± 19.0 |
median | 1.9 | 7.9 | 0.2 | 0.9 | 0.8 |
90% HDI | | | | | |
Table 24.
p-values comparing MAGD execution time with other algorithms for neural network training before removing the outlier dataset. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Table 24.
p-values comparing MAGD execution time with other algorithms for neural network training before removing the outlier dataset. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Comparison | p-Value | Significant Difference | MAGD Performance |
---|
MAGD vs. FMA | 0.31 | No | - |
MAGD vs. SBGD | 0.33 | No | - |
MAGD vs. Adam | 0.6 | No | - |
MAGD vs. GD | 0.6 | No | - |
Table 25.
p-values comparing MAGD execution time with other algorithms for neural network training after removing the outlier dataset. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Table 25.
p-values comparing MAGD execution time with other algorithms for neural network training after removing the outlier dataset. Symbols (↑/↓) denote whether the difference signifies outperforming (↑) or underperforming (↓) of MAGD, while the absence of a symbol indicates no significant difference.
Comparison | p-Value | Significant Difference | MAGD Performance |
---|
MAGD vs. FMA | 0.007 | Yes | ↑ |
MAGD vs. SBGD | 0.002 | Yes | ↓ |
MAGD vs. Adam | 0.141 | No | - |
MAGD vs. GD | 0.172 | No | - |
Table 26.
Execution time comparison among MAGD and Adam variants for neural network training across datasets. Bold values indicate the shortest time for each dataset.
Table 26.
Execution time comparison among MAGD and Adam variants for neural network training across datasets. Bold values indicate the shortest time for each dataset.
Dataset Name | MAGD-500 | Adam-500 | Adam-750 | Adam-1000 |
---|
Iris | 2.01 | 0.78 | 1.19 | 2.05 |
Wine | 1.82 | 0.86 | 1.24 | 2.16 |
Breast-Cancer-Wisconsin | 2.25 | 1.10 | 1.71 | 3 |
Digits | 11.58 | 6.43 | 11.31 | 21.73 |
Pima-Indians-Diabetes | 1.97 | 1.01 | 2.03 | 3 |
Connectionist Bench | 2.66 | 2.55 | 3.86 | 9.38 |
Glass Identification | 1.39 | 0.91 | 1.34 | 3.41 |
Balance Scale | 1.76 | 0.93 | 1.38 | 2.96 |
Banknote Authentication | 1.88 | 1.02 | 1.53 | 3.37 |
Adult | 160.42 | 89.43 | 166.8 | 206.56 |
Diabetes | 1.95 | 0.81 | 1.22 | 1.64 |
California Housing | 8.02 | 7.18 | 12.17 | 15.99 |
Boston Housing | 1.91 | 0.84 | 1.27 | 1.78 |
Auto MPG | 1.78 | 0.72 | 1.11 | 1.53 |
Concrete Compressive Strength | 1.23 | 0.80 | 1.27 | 1.71 |
Energy Efficiency | 1.77 | 0.76 | 1.25 | 1.64 |
Yacht Hydrodynamics | 1.09 | 0.70 | 1.1 | 1.49 |
Forest Fires | 2.17 | 0.92 | 1.51 | 1.93 |
Airfoil Self-Noise | 2.07 | 0.81 | 1.27 | 1.66 |
Concrete Slump Test | 1.65 | 0.67 | 1.05 | 1.49 |
Table 27.
Mean and standard deviation of execution times for MAGD and Adam with varying maximum iterations for neural network training.
Table 27.
Mean and standard deviation of execution times for MAGD and Adam with varying maximum iterations for neural network training.
Algorithm | Convergence Epochs | Execution Time (Mean ± std) |
---|
MAGD-500 | | |
Adam-500 | | |
Adam-750 | | |
Adam-1000 | | |
Table 28.
Accuracy scores of MAGD with 500 maximum iterations compared to Adam with 500, 750, and 1000 maximum iterations across various classification datasets. Bold values indicate the highest accuracy achieved for each dataset.
Table 28.
Accuracy scores of MAGD with 500 maximum iterations compared to Adam with 500, 750, and 1000 maximum iterations across various classification datasets. Bold values indicate the highest accuracy achieved for each dataset.
Dataset Name | MAGD-500 | Adam-500 | Adam-750 | Adam-1000 |
---|
Iris | 100% | 100% | 100% | 100% |
Wine | 98% | 98% | 98% | 98% |
Breast-Cancer-Wisconsin | 96% | 96% | 96% | 96% |
Digits | 97% | 97% | 97% | 97% |
Pima-Indians-Diabetes | 79% | 76% | 78% | 76% |
Connectionist Bench | 78% | 76% | 76% | 76% |
Glass Identification | 62% | 55% | 54% | 58% |
Balance Scale | 90% | 87% | 88% | 88% |
Banknote Authentication | 99% | 98% | 98% | 98% |
Adult | 85% | 85% | 85% | 85% |
Table 29.
scores of MAGD with 500 maximum iterations compared to Adam with 500, 750, and 1000 maximum iterations across various regression datasets. Bold values indicate the highest score achieved for each dataset.
Table 29.
scores of MAGD with 500 maximum iterations compared to Adam with 500, 750, and 1000 maximum iterations across various regression datasets. Bold values indicate the highest score achieved for each dataset.
Dataset Name | MAGD-500 | Adam-500 | Adam-750 | Adam-1000 |
---|
Diabetes | 0.41 | 0.38 | 0.36 | 0.36 |
California Housing | 0.71 | 0.71 | 0.71 | 0.71 |
Boston Housing | 0.89 | 0.88 | 0.88 | 0.88 |
Auto MPG | 0.89 | 0.89 | 0.90 | 0.90 |
Concrete Strength | 0.85 | 0.85 | 0.85 | 0.85 |
Energy Efficiency | 0.96 | 0.95 | 0.95 | 0.96 |
Yacht Hydrodynamics | 0.98 | 0.98 | 0.98 | 0.98 |
Forest Fires | −23.66 | −17.24 | −19.25 | −19.81 |
Airfoil Self-Noise | 0.79 | 0.72 | 0.72 | 0.72 |
Concrete Slump Test | 0.95 | 0.96 | 0.97 | 0.97 |
Table 30.
Comparison of MAGD for fixed maximum iterations and Adam with varying maximum iterations using (average time penalty) and (performance superiority) in classification tasks.
Table 30.
Comparison of MAGD for fixed maximum iterations and Adam with varying maximum iterations using (average time penalty) and (performance superiority) in classification tasks.
Comparison | | |
---|
MAGD-500 vs. Adam-500 | | |
MAGD-500 vs. Adam-750 | | |
MAGD-500 vs. Adam-1000 | | |
Table 31.
Comparison of MAGD for fixed maximum iterations and Adam with varying maximum iterations using (average time penalty) and (performance superiority) in regression tasks.
Table 31.
Comparison of MAGD for fixed maximum iterations and Adam with varying maximum iterations using (average time penalty) and (performance superiority) in regression tasks.
Comparison | | |
---|
MAGD-500 vs. Adam-500 | | |
MAGD-500 vs. Adam-750 | | |
MAGD-500 vs. Adam-1000 | | |