*2.4. Target Preparation*

Different quantities of DNA with a fixed length of 300 bp were tested in the experiment [5]. The output of our model is the DNA amount per bead for tested beads, including the bare bead and beads bonded with the least-concentrated DNA to those with the most-concentrated DNA, which are exponentially distributed. There is a total number of 7 outputs shown in Table 1. In situations where the data are distributed exponentially, taking a log function is one common way to normalize the data [42]. Therefore, we normalized the output features by using logarithmic transformation. In the next step, the standard score and min–max normalization are applied, which were described in Section 2.3.

#### *2.5. Model Training*

After feature extraction and preprocessing of data, 10 different deep learning architecture models are implemented to evaluate the performance of the approach. We employ classification, regression, and a hybrid model. The scikit-learn library is used to build the models in Google Colab using Python [43]. The data are shuffled randomly, and 30% is used for testing, while the rest is used for training. The model training was stopped after 5000 epochs (iterations) for feature selection, which is described in Section 3.1, and after 10,000 epochs for deep learning models both in classification and regression.

Before training the model, the most important task is to determine the combination of best features for DNA amount per bead prediction [40,44]. Moreover, the best number of features is chosen for regression and classification analysis. In each part of the analysis, 10 different models consisting of different numbers of hidden layers and neurons were implemented to examine the performance of different architectures. The best architecture giving the highest test and train accuracy and the lowest error was used as the best model for the classification part.

R\_Squared and mean square error (MSE) are statistical parameters used to evaluate the performance of regression models [45]. The best deep learning architecture giving the highest R\_Squared and lowest MSE was selected as the best candidate model. The hybrid model (Figure 3) uses the best architecture of the classification and regression models to train the model. The hybrid model is used to enhance the performance of the regression model. The prediction results of the classification model and original features are used as the input to the regression model. In other words, the 7 outputs from the classification method, combined with the 8 original input features, result in a total of 15 features that serve as the input to the candidate regression model. The output of the regression model is the DNA amount per bead.

**Figure 3.** Hybrid model (combining the best architecture of the classification and regression models).

#### **3. Results**

#### *3.1. Feature Selection*

We first studied the effect of the number of features on the performance of the classification and regression models, benchmarking their performance on four figures of merit (FOMs) in terms of accuracy and error for classification, and R\_Squared and MSE for regression analysis. The model training stopped after 5000 epochs. The deep learning model consists of 5 hidden layers with 70, 60, 30, 20, and 10 neurons in each layer. For the

input features, we evaluated two different datasets, including those with five and eight features. In the first dataset for DNA amount per bead prediction, five features consisting of frequency; the real part, imaginary part, and absolute value of the peak intensity; and the phase change of the peak intensity are used. For the second dataset, in addition to the frequency and phase change of the peak intensity, we divided each exponential input feature (real, imaginary, and absolute value of the peak intensity) into two parts: base and power. Figure 4 compares the performance of the classification model trained on the five-feature and eight-feature datasets. The results show that with the second dataset, which includes eight features, can lead to a more than 16% improvement in both training and testing accuracy. Furthermore, the train and test errors markedly decreased.

**Figure 4.** Effect of feature selection on FOMs (%).

The effect of the number of features on the performance of the regression model was evaluated by the R\_Squared and mean square error (MSE) values. The results are shown in Figure 5, which indicates that the dataset with eight features yields better results. Specifically, the regression model improved by around a 33% increase in R\_Squared and around a 7% decrease in the MSE. Overall, both the classification and regression models performed better on representative FOMs; therefore, the dataset with eight features is chosen as the input for the following analysis.

**Figure 5.** Effect of feature selection on (**a**) R\_Squared with respect to number of epochs; (**b**) mean square error (MSE) with respect to number of epochs.

#### *3.2. Classification*

To achieve robust network training, reduce the risk of overfitting, and increase the network generalization capabilities, we constructed 10 different deep learning architectures, from simple to complex. The implemented architectures are summarized in Table 2. Determining the optimal number of neurons and hidden layers is a very crucial step in deciding the optimal deep learning architecture [46]. Using too many neurons and hidden layers can result in overfitting by the model. On the other hand, having too few neurons and hidden layers may result in underfitting [46]. There are several methods and approaches to tuning the hyperparameters such as the number of neurons, activation function, number of layers, batch size, and epochs of deep learning algorithms. The possible approaches for finding the optimal parameters are hand or manual tuning, grid search, random search, Bayesian search, and AutoML. Grid search and random search are the most widely used strategies for hyperparameter optimization. In the grid search method, the domain of the hyperparameters is divided into a discrete grid, and the performance of every combination of values will be calculated. The point of the grid that maximizes the average value in cross-validation is the optimal combination of values for the hyperparameters [47]. While grid search evaluates the performance of every possible combination of hyperparameters to find the best model, random search only selects and tests a random combination of hyperparameters. Bergstra et al. [47] demonstrated that the performance of the random search is more efficient for hyperparameter optimization than trials on a grid. The Bayesian method, in contrast to random and grid search, builds a probability model to find the next set of hyperparameters which performs best on a probability function [48]. In other words, Bayesian optimization considers past evaluations when choosing the hyperparameter set to evaluate the next set of parameters [48]. All the aforementioned techniques are dedicated to special cases; as an example, grid search is only reliable for low-dimensional input spaces [47]. On the other hand, it was shown that random search results in better sampling efficiency in high-dimensional search spaces compared to grid search [49]. Bayesian optimization might potentially trap the model at a local optimum. In this analysis, manual tuning has been employed to determine the hyperparameters of the deep learning model to address these difficulties. In addition, manual tuning provides us the behavior of hyperparameters and reduces the runtime of the process. Therefore, we employed 10 architectures and analyzed the effect of the numbers of neurons and hidden layers on FOMs.


**Table 2.** Deep learning models.

The test and train accuracy of each architecture are evaluated. The training procedure was stopped after 10,000 iterations for all models. In addition, the ReLU activation function was used, which is the most commonly used activation function in deep learning models. ReLU stands for rectified linear unit and is an activation function commonly used in neural networks. It is a simple function that outputs the input directly if it is positive and outputs zero if it is negative. Figure 6 represents the train and test accuracy of each architecture.

**Figure 6.** Effect of model complexity on train and test accuracy.

Among all architectures, model number 9 achieved the highest accuracy, which is around 75% on the training data and around 74% on the test data. It is also worth mentioning that the selected model performed well on train and test data. This means that the model generalizes well from observed data (train data) to predict unseen data (test data), and no overfitting occurs [50]. Therefore, we selected model number 9 as the representative model for classification. Table 3 shows the configuration matrix of the representative model. To evaluate the performance of the representative model, the following metrics are used: accuracy (ACC), true positive rate (TPR), true negative rate (TNR), false negative rate (FNR), and false positive rate (FPR). These measures are computed using the following equations:

$$\text{Accuracy} \,(\text{ACC}) = \frac{\text{TP} + \text{TN}}{\text{TN} + \text{TP} + \text{FN} + \text{FP}} \tag{3}$$

$$\text{Sensitivity} \left(\text{TRP}\right) = \frac{\text{TP}}{\text{TP} + \text{FN}} \tag{4}$$

$$\text{Specificity} \,(\text{TNR}) = \frac{\text{TN}}{\text{TN} + \text{FP}} \tag{5}$$

$$\text{Fallout} \,(\text{FPR}) = \frac{\text{FP}}{\text{TN} + \text{FP}} \tag{6}$$

$$\text{False Negative Rate (FNR)} = \frac{\text{FN}}{\text{TP} + \text{FN}} \tag{7}$$

where TPs (FPs) refer to the number of correct (incorrect) predictions of outcomes in the considered output class, whereas TNs (FNs) refer to the number of correct (incorrect) predictions of outcomes in any other output classes [14]. The below table shows the accuracy (ACC), true positive rate (TPR), true negative rate (TNR), false positive rate (FPR), and false negative rate (FNR) for each individual output (class). For each individual class, we achieved above 88% accuracy.

**Table 3.** Confusion matrix of representative model.


#### *3.3. Regression*

Similar to classification, we constructed 10 different neural network architectures using the same models previously shown in Table 2. Table 4 displays the R\_Squared and MSE of each model. According to the results given in Table 4, it can be concluded that among all architectures, model number 8 achieved the highest R\_Squared for both the train and test data. Therefore, model number 8 is selected as the representative architecture for the regression model. Figure 7 shows the results obtained by the representative regression model on the test and train data. In this figure, the average DNA amount per bead prediction for each of our seven outputs is plotted versus its corresponding ground truth. The first point in Figure 7 represents the bare bead prediction, and the next six points represent the beads coupled with DNA concentrations from the lowest to the highest. This figure shows that there is a relationship between electrical measurements and DNA concentrations coupled to paramagnetic beads.

**Table 4.** Effect of model complexity on R\_Squared and MSE.


(**a**) (**b**)

**Figure 7.** Results of representative regression model on (**a**) train and (**b**) test data.

A linear fit was applied to these results, and an R\_Squared of around 96% is achieved for both the train and test data, with a maximum standard error of 0.008. For an ideal model, the slope of the trend line should be equal to one, as the prediction should be equal to the ground truth. Here, the slope of trend line is around 0.47 indicating the error between the prediction and ground truth values. This motivated us to design a hybrid model to improve the performance of the regression model. In the next section, the architecture of the proposed hybrid model will be discussed.
