Next Article in Journal
Phytochemical Profile, Antioxidant and Wound Healing Potential of Three Artemisia Species: In Vitro and In Ovo Evaluation
Next Article in Special Issue
Presenting a Human Pupil Orbit Model (HPOM) for Eye-Gaze Tracking
Previous Article in Journal
Deep Learning-Based Automatic Segmentation of Mandible and Maxilla in Multi-Center CT Images
 
 
Article
Peer-Review Record

Research on Device Modeling Technique Based on MLP Neural Network for Model Parameter Extraction

Appl. Sci. 2022, 12(3), 1357; https://doi.org/10.3390/app12031357
by Haixia Kang 1,2, Yuping Wu 1,2,3,4,*, Lan Chen 1,3 and Xuelian Zhang 1,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(3), 1357; https://doi.org/10.3390/app12031357
Submission received: 31 October 2021 / Revised: 21 January 2022 / Accepted: 24 January 2022 / Published: 27 January 2022
(This article belongs to the Special Issue High Performance Computing, Modeling and Simulation)

Round 1

Reviewer 1 Report

27% plagrism the plagrism report is attached herewith.

This kind of work has been done previously.

MLP is a basic technique, research is going in https://doi.org/10.1016/j.microrel.2017.08.016

I hope if the authors can give some comparision study by SVM, KNN, XGBOOST, ADABOOST, etc , then the work will be justified.

Comments for author File: Comments.pdf

Author Response

Dear expert,

  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

 

Point 1: 27% plagrism the plagrism report is attached here with.

Response 1:

We point out the marked positions in the order of the Primary source, and then place their contents in parentheses. We have divided the contents of the markers into the following categories:

(1) Long sentence: This part is marked in blue. This part is all modified (3-[line 101, line 134-137, line 183], 4-[ line 51, line 167-168, line 211-212], 19-[line 176], 66-[line 169], in word version).

(2) The content required by the journal: Acknowledgments and Conflicts of Interest, Author's name and affiliation, Author Contributions, Applied Sciences journal template content, Reference information (title, conference, journal, etc.). This part is marked in green. We are apologizing that this part was not modified because it was unavoidable.

(3) Professional terms and experimental data. For example, drain-source voltage (Vds), the description of the model parameters, device size (56 nm, 100 nm, etc.). This part is marked in orange. We consider this part to be reasonable and have not modified it.

(4) Brief introduction to references: In order to accurately introduce the references, the introduction of references is quoted. This part is marked in purple. This part has not been modified.

(5) Universal words, phrases and short sentences: For example, model, the number of, the input layer, the output layer, in recent years, neural network, neural network model, Experimental results show that, results show that, the testing error, Results and Discussion, Figures 7 show the, In this paper, we propose, is shown in Figure 2, is defined as follows. This part is marked in gray. For simplicity, we replace some ‘neural network’ with ‘NN’. The rest of the content is kept unchanged.

1: line 53 (As the complexity of the, model, the number of), line 106 (drain-source voltage (Vds), gate-source voltage (Vgs)), line 276-278 (Acknowledgments and Conflicts of Interest.), References [1], [3], [6], [8], [9], [13], [19], [22], [23].

2: line 43(Applied Sciences journal template content).

3: line 98 (we take an compact model, as the source for, training, and), line 129-132 (is dominated by the mismatch in the high-value region, and the errors in small values are not weighted enough, a logarithmic scaling to the training data, range of values is more equally weighted.), line176 (to see: In the lower-value range, both curves match quite well. However), References [4].

4: line 49 (capable of approximating a nonlinear function of multi-dimensional variables), line 162-163 (Among the available, activation functions, those of sigmoid types with smooth derivatives are preferred due to the requirements on), line 204-205 (More complex mathematics is involved around the minimum gate length due to short-channel effects [10], so), References [25], [30].

5: References [5], [6].

6: Author Contributions.

7: References [14].

8: References [7].

9: References [15].

10: References [21].

11: References [17].

12: References [12].

13: line 84-85 (rest of this paper is organized as follows. Section 2 introduces the data, and , methods. Section 3, the, and), line 142-143 (from the input layer, the output layer).

14: Author's affiliation.

15: line 47 (In recent years, neural networks have been, used, to), line 79 (are used as the inputs, neural network), line 233 (comparison between the neural network, prediction and).

16: Author's name and author's affiliation.

17: References [11].

18: Table 1, the description of the model parameters in the BSIM manual.

19: line 59 (between early device measurements and a later occurring compact model), line 169 (Looking at the transfer curves in Figure 3, we can observe that).

20: Applied Sciences journal template content.

21: References [24].

22: References [25].

23: Applied Sciences journal template content.

24: References [17].

25: References [23].

26: References [16].

27: References [2].

28: line 55 (neural network model), line 63 (in the neural network model, the, accuracy).

29: line 23 (Experimental results show that, neural network model), line 235 (results show that the, neural network model is).

30: line 113 (nm to 1000 nm, from, nm, 1200 nm), line 192 (nm to, nm, from, nm to, nm, from).

31: References [5].

32: Author's affiliation.

33: References [26].

34: Author Contributions.

35: Applied Sciences journal template content.

36: References [28].

37: References [9].

38: References [11].

39: References [16].

40: line 109 (network consists of an input layer, hidden layers, and an output layer).

41: References [18].

42: References [14].

43: References [13].

44: line 239 (56 nm, 100 nm, 200 nm, 300 nm, 400 nm, 500 nm, and the).

45: line 156 (voltage, Voff the offset voltage, n is the subthreshold swing parameter).

47: References [15].

48: line 62 (A novel physics-inspired neural network (Pi-NN) approach for compact modeling).

49: line 22 (neural network. The, error, the neural network, the testing error).

50: line 74 (We use the MLP neural network to, a, model for. The).

51: Applied Sciences journal template content.

52: References [10].

53: line 242 (between the, result and simulation result, the extracted).

54: line 199 (Results and Discussion, Figures 7 show the).

55: line 64 (In this paper, we propose, MLP neural network to).

56: line 81 (is shown in Figure 2).

57: line 240 (nm, 700 nm, 800 nm, 900 nm, 1000 nm).

58: Applied Sciences journal template content.

59: References [7].

60: line 154 (is the, channel width, Leff the effective, length).

61: line 161 (the substrate current induced body effect (SCBE)).

62: line 256 (In this paper, a novel, technique based on).

63: line 145 (initial learning rate of the, set to 0).

64: line 147 (is defined as follows).

65: line 82 (based on neural network and the).

66: line 163 (higher-order derivatives from circuit simulations).

67: References [29].

 

Point 2: This kind of work has been done previously. MLP is a basic technique, research is going in https://doi.org/10.1016/j.microrel.2017.08.016.

Response 2:

The comparative literature https://doi.org/10.1016/j.microrel.2017.08.016 uses ANN model to extract small-signal model parameters, and the small-signal model has only 15 model parameters, one of the more complex physical equations is shown as follows:

                          (1)

The equation contains both fractional and squared relations. The small signal model is simple, so the structure of the ANN model built for parameter extraction is also simple, and the ANN model used contains 11 input neurons, 15 output neurons, and one hidden layer containing 85 neurons.

We use NN to model the BSIM-SOI model, and one equation of the BSIM-SOI model is as follows:

       (2)

The equation contains not only fractional and square relationships, but also exponential and other relationships. Our device model is much more complex than the small signal model, and the NN model used when modeling the device is also more complicated. Our NN model (134-160-160-160-1) has 134 neurons as input, 1 output neuron, and contains three hidden layers, each with 160 neurons.

 

Point 3: I hope if the authors can give some comparison study by SVM, KNN, XGBOOST, ADABOOST, etc, then the work will be justified.

Response 3:

Device modeling is a regression problem rather than a classification problem, for example, SVM, KNN and ADABOOST are mainly used to solve classification problems. Some machine learning algorithms commonly used to solve regression problems are as follows:

(1) Linear and Polynomial Regression

In Linear Regression, the output is a linear combination of the input feature vectors, and in Polynomial Regression, the output is a nonlinear combination of the input feature vectors. The advantage of Linear and Polynomial Regression is that they are fast in modeling and are suitable for cases where the model structure is not complex and the amount of data is small. However, for complex models or cases with large amounts of data, the performance of the model is less satisfactory.

(2) Decision Tree and Random Forest

Decision Tree is a tree structure in which each internal node represents a judgment on an attribute and each leaf node represents a classification result, and it is a tree composed of multiple judgment nodes. Random Forest is a simple integration of Decision Tree. For regression problems, the average of the outputs of all trees is taken as the final output. Random Forest is very practical for complex, highly nonlinear problems. Since a complete Random Forest model is very complex, it is prone to overfitting.

(3) Neural network

Neural networks have multiple hidden layers. By introducing nonlinear activation functions, they can efficiently model complex nonlinear relationships. At the same time, neural networks can obtain high performance on large datasets.

First, the device model is highly nonlinear; second, the device model contains hundreds of model parameters, plus factors such as voltages, device size, and temperature, with many input feature variables and a large dataset. Therefore, we choose the MLP neural network that is suitable for processing nonlinear problems and performs well on large datasets for device modeling.

Based on your valuable comments, we have used another machine learning algorithm Support Vector Regression (SVR) for device modeling. SVR is an application of SVM to regression problems. By adjusting several most important parameters in SVR, kernel function, penalty factor ‘C’ and kernel coefficient ’gamma’, we obtained a set of parameter combinations with good performance. The kernel function was used ‘rbf’, penalty factor ‘C’ was taken as 1000, ‘gamma’ was taken as 0.01 and other parameters were taken as default values. Since SVR training is time-consuming and time limited, we reduce the training set of one million to one hundred thousand. The minimum training error obtained so far is 30.48% (the error is calculated according to Equation (3) in the manuscript), and the single training duration is about 30 hours.

 

Thanks again for your valuable comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors well address the comments raised by the editor and reviewer, and I have one minor comment before the publication of this manuscript. I'm not fully convinced by the fact that neural network has a much faster speed of extraction than SPICE model even though the authors mentioned that this is because the NN-based approach doesn't need to understand device physics. In general, it is believed that the NN algorithm is inefficiently computed by conventional CPU since it requires massive calculations. I believe that the understanding of device physics is required when we need to develop a new SPICE model, but it's not once the SPICE model is developed and it is expected that using a developed SPICE model is less-burdensome than the massive calculations in NN for vector matrix multiplication. Please discuss this more thoroughly in your manuscript.

Author Response

Dear expert,

  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

 

Point 1: I'm not fully convinced by the fact that neural network has a much faster speed of extraction than SPICE model even though the authors mentioned that this is because the NN-based approach doesn't need to understand device physics.

Response 1:

The NN model is not compared with the SPICE model for extraction speed, but with the SPICE simulator, all SPICE in the manuscript implies SPICE simulator. SPICE model is the device model nested inside the SPICE simulator. When the SPICE simulator is called for simulation, it is necessary to parse the netlist, establish the matrix equations, and iteratively solve the matrix equations, etc. Simulation operations such as Measure are very time-consuming. The speed of the NN model evaluation is much faster than the model evaluation with SPICE simulator. We have added this content to the manuscript (line 42-44 in word version).

SPICE model can be divided into the following two categories in the form:

(1) Verilog-A support: The model is written in hardware description language (HDL) Verilog-A. When using the model evaluation, it is necessary to parse the Verilog-A statement into C Language first, and then solve the equations. Therefore, the calculation speed of this model is slow.

(2) C Programming Language support: The model can be used without parsing the model and solving the equations directly.

The speed of the NN model evaluation is faster than the Verilog-A support model evaluation.

From development perspective, the NN-based approach does not require an understanding of device physics. When extracting the model parameters, the comparison is between the running speed of the trained NN model and SPICE simulator.

 

Point 2: In general, it is believed that the NN algorithm is inefficiently computed by conventional CPU since it requires massive calculations.

Response 2:

Multiple iterations are required for training the NN, which requires a large amount of computation. When using the trained NN model evaluation for parameter extraction, its computation is greatly reduced compared to the training phase, and a high running speed can be obtained when running on CPU.

The training of NN-based device model is time-consuming, but the training procedure is out of the device model parameter extraction procedure. Only the NN-based device model evaluation is inner of the device model parameter extraction procedure, and the NN-based device model evaluation is much fast. Hence, there is no negative time cost effects on the device model parameter extraction due to the time-consuming on the training of NN-based device model, and the training of NN-based device model is executed only once for each device type.

 

Point 3: I believe that the understanding of device physics is required when we need to develop a new SPICE model, but it's not once the SPICE model is developed and it is expected that using a developed SPICE model is less-burdensome than the massive calculations in NN for vector matrix multiplication.

Response 3:

Developing a new SPICE model requires device physics. Our work is to develop device model parameter extraction software. For the SPICE model by Verilog-A support, it takes time to convert it to C Programming Language support, and the understanding of device physics is required for the development of such conversion module. We use the training of NN-based model instead of it, and also the automatic parameter sensitivity analysis is implemented to replace necessary human understanding of device physics in a bit.

 

Thanks again for your comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Reviewer 3 Report

This review has been carefully considered. The author answered the questions raised carefully. All the questions have been corrected and reasonably explained. I think that the paper can be accepted for publication in Applied Science.

Author Response

Dear expert,

Thanks very much for your patience with our manuscript, and we are honored to receive your approval for our manuscript. We hope our manuscript can be accepted for publication in Applied Science.

 

Thanks again for your comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

There are serious flaws in the review justification provided by the authors.

The modelling approach is very conventional, tuning of the model is not done the error and performance analysis of model is not done properly.

The standard parameters like AICc , BICc is not calculated for each of the regression based model which is applied in the manuscript.

Author Response

Dear expert,

  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

 

Point 1: The modelling approach is very conventional, tuning of the model is not done the error and performance analysis of model is not done properly.

Response 1:

According to your valuable comments, we have revised the error and performance analysis sections of the manuscript (line 253 to line 300, Figures 4-6). The new content is as follows:

The mean value μ of RE for the training data is 2.81%, and the standard deviation σ of RE is 3.04%. The training data with RE greater than μ+3σ accounted for 0.89% of all training data. We performed analysis on the training data with RE greater than μ+3σ and counted the distributions of temperature, gate width, and gate length. The statistical results are shown in Figures 4-6, respectively.

According to Figure 4, it can be seen that for the training data with RE greater than μ+3σ, the proportions of different bars are not very different. The proportion of the first bar in Figure 5 is lower than that of the other bars, which is because the gate width range of this bar is smaller. The overall distribution of the gate width is uniform when the gate width range is considered. From Figure 6, it can be observed that the proportion of data with small gate length is very large, and the proportion almost decreases with the increase of the gate length. It indicates that the NN model fits less well to the data with small gate length. Due to the short-channel effect, more complex device physics mathematics are involved around the minimum gate length, so it is reasonable that the error is slightly higher at the smaller gate length.

Figure 4. Temperature distribution of the training data with RE greater than μ+3σ. The training data with RE greater than μ+3σ accounted for 0.89% of all training data.

Figure 5. Gate width distribution of the training data with RE greater than μ+3σ. The training data with RE greater than μ+3σ accounted for 0.89% of all training data.

Figure 6. Gate length distribution of the training data with RE greater than μ+3σ. The training data with RE greater than μ+3σ accounted for 0.89% of all training data.

The mean value μ of RE for the test data is 3.89% and the standard deviation σ of RE is 3.71%. The test data with RE greater than μ+3σ accounted for 1.50% of all test data. The distributions of temperature, gate width, and gate length for the test data with RE greater than μ+3σ were analyzed and were generally similar to those in the training data. For the test data with RE greater than μ+3σ, the distribution of temperature and gate width is uniform overall, and the percentage of data with gate length smaller than 200 nm is 33.23%. The NN model fits less well to the training data of small gate length and therefore fits the test data of small gate length less well.

Statistical analysis of the I-V curves for the training data was performed. It was found that for the Ids-Vgs curves, the error of Vgs from 0 to 0.2 V was slightly higher than the other regions of the curves. It is basically in the subthreshold region when Vgs is between 0 and 0.2 V. According to Equation (4), it is known that the relationship between current and voltage in the subthreshold region is exponential. The exponential relation has infinite order derivatives and its Taylor series expansion is composed of infinite polynomials. According to Equation (5), The relationship between current and voltage in the non-subthreshold region is polynomial. When using the same NN model for subthreshold and non-subthreshold regions, the finite polynomial for the non-subthreshold region is easier to fit and therefore its modeling accuracy is higher. The errors were counted separately for the subthreshold and non-subthreshold data in the training data using Equation (3), with 4.77% for the subthreshold region and 3.83% for the non-subthreshold region. At the same time, the errors were counted separately for subthreshold and non-subthreshold data in the test data using Equation (3), with 6.89% for the subthreshold region and 4.79% for the non-subthreshold region.

 

Point 2: The standard parameters like AICc, BICc is not calculated for each of the regression based model which is applied in the manuscript.

Response 2:

According to your valuable comments, we have added the content of optimal model selection (line 182 to line 215, Table2). the new content is as follows:

Some criteria were used in this study to better estimate trained NN models and avoid overfitting. Those criteria include Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), and Bayesian Information Criterion (BIC).

The AIC method attempts to find the model that explains the data best with the smallest number of parameters. The preference model is the model with the lowest AIC value. Under the assumption that the model errors are normally distributed, the value of AIC can be obtained by the following equation:

 

(8)

where k is the number of parameters for the model to be estimated, n is the sample size, RSS is the residual sum of squares from the estimated model. AICc is a correction of the AIC, which performs better when the sample size is smaller. The equation of AICc is as follows:

 

(9)

The BIC penalizes parameters more heavily than does the AIC. For any two estimated models, the model with the lower value of BIC is preferred. The equation of BIC is as follows:

 

(10)

The value of RR can reflect the fitting quality of the I-V curves, so when the values of RR of the different NN models differ widely, we prefer the NN model that makes the smallest RR value. When the RR value of different NN models are close to each other, we select the optimal model by referring to AIC, AICc, and BIC values.

The AIC, AICc, and BIC values were calculated for each NN model, and the results are shown in Table 2. RE-10% in Table 2 represents the proportion of data with relative error (RE) greater than 10% in the fitting data of the NN model.NN1, NN2, and NN3 are the optimal models corresponding to the AIC, AICc, and BIC, respectively. NN1, NN4, and NN5 are NN models with the smallest RR values.

Although NN2 and NN3 achieved optimality under the AICc and BIC criteria, respectively, their RR values were significantly higher than the remaining three NN models, so they were not selected as the optimal model. The RR values for the NN1, NN4, and NN5 models were close, so we further evaluated them using the AIC, AICc, and BIC criteria. NN1 outperforms NN4 and NN5 under the AIC criterion and the AICc criterion. NN4 outperforms NN1 and NN5 under the BIC criterion, while the difference of BIC value between NN1 and NN4 is small. According to the RE-10% in Table 2, when the NN1 model fits the data, RE of nearly 98% of the data is within 10% and the probability of RE greater than 10% is very low. We eventually choose the NN1 model for further work. It took 1.7 hours to train the NN1 model, with a training error of 4.14% and a testing error of 5.38%.

Table 2. Estimation results of different NN models.

No.

NN

AIC

AICc

BIC

RR

RE-10%

1

134-160-160-160-1

-3.70E+06

-3.59E+06

-2.96E+06

5.38%

2.02%

2

134-120-120-120-1

-3.69E+06

-3.66E+06

-3.21E+06

6.18%

3.81%

3

134-100-100-100-1

-3.66E+06

-3.64E+06

-3.32E+06

6.53%

5.83%

4

134-150-150-150-1

-3.68E+06

-3.55E+06

-3.02E+06

5.48%

2.67%

5

134-190-190-190-1

-3.54E+06

-3.28E+06

-2.55E+06

5.66%

2.95%

The modeling approach is very conventional but appropriate for our work.

 

Thanks again for your valuable comments.

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

The performance analysis should be done for testing and validation data also and a cumulative error between the last output and the model output should be given with the justification why there is an error between the model and resultant output.

By reading the AICc and BICc part the author is unable to justify why the result is coming in negative value. The value like a random value. What is the significance of the AICc and BICc value it should be justified.

Author Response

Response to Reviewer 1 Comments
Dear expert,
  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

Point 1: The performance analysis should be done for testing and validation data also.
Response 1: 
According to your valuable comments, we have revised the error and performance analysis sections for the testing data of the revised manuscript (line 263 to line 273, Figures 7-9). The new content is as follows:
The mean value μ of RE for the test data is 3.89%, and the standard deviation σ of RE is 3.71%. The test data with RE greater than μ+3σ accounted for 1.50% of all test data. The distributions of temperature, gate width, and gate length for the test data with RE greater than μ+3σ are shown in Figures 7-9, respectively. According to Figure 7 and Figure 8, it can be seen that for temperature and gate width, the proportions of different bars are not very different. According to Figure 9, it can be seen that the proportion of data with small gate length is higher. The NN model fits less well to the training data of small gate length and therefore fits the test data of small gate length less well.
 
Figure 7. Temperature distribution of the test data with RE greater than μ+3σ. The test data with RE greater than μ+3σ accounted for 1.50% of all test data.
 
Figure 8. Gate width distribution of the test data with RE greater than μ+3σ. The test data with RE greater than μ+3σ accounted for 1.50% of all test data.
 
Figure 9. Gate length distribution of the test data with RE greater than μ+3σ. The test data with RE greater than μ+3σ accounted for 1.50% of all test data.

Point 2: A cumulative error between the last output and the model output should be given with the justification why there is an error between the model and resultant output.
Response 2:
The Root Sum Squares (RSS) between the last output and the model output has been given and added in the revised manuscript. The equation of RSS and the RSS value of the NN models are in Response3.

Point 3: By reading the AICc and BICc part the author is unable to justify why the result is coming in negative value. The value like a random value. What is the significance of the AICc and BICc value it should be justified.
Response 3:
According to your valuable comments, we have revised the content of the optimal model selection section.
The AIC method attempts to find the model that explains the data best with the smallest number of parameters. The preference model is the model with the lowest AIC value. Under the assumption that the model errors are normally distributed, the value of AIC can be obtained by the following equation:
AIC=2k+nln⁡(RSS/n)
where k is the number of parameters for the model to be estimated, n is the sample size, RSS is the residual sum of squares from the estimated model. The equation of RSS is as follows:
RSS= ∑_(i=1)^N▒〖(I_nn^i-I_sim^i)〗^2 
AICc is a correction of the AIC, which performs better when the sample size is smaller. The equation of AICc is as follows:
AICc=AIC+(2k(k+1))/(n-k-1)
The BIC penalizes parameters more heavily than does the AIC. For any two estimated models, the model with the lower value of BIC is preferred. The equation of BIC is as follows:
BIC=k*ln⁡(n)+nln⁡(RSS/n)
The value of RR is obtained based on the relative error. Since there is a large magnitude difference in the currents of different regions, using the relative error can show the fit of the NN model more intuitively. The value of RR can reflect the fitting quality of the I-V curves, so when the values of RR of the different NN models differ widely, we prefer the NN model that makes the smallest RR value. When the RR values of different NN models are close to each other, we select the optimal model by referring to AIC, AICc, and BIC values.
In the previous manuscript, we expressed the values of AIC, AICc, and BICc using scientific notation and kept two decimal places for ease of reading. In the revised manuscript, we express the values of AIC, AICc, and BICc using scientific notation and retaining four decimal places. The AIC, AICc, and BIC values were calculated for each NN model, and the results are shown in Table 2. RE-10% in Table 2 represents the proportion of data with relative error (RE) greater than 10% in the fitting data of the NN model.NN1, NN2, and NN3 are the optimal models corresponding to the AIC, AICc, and BIC, respectively. NN1, NN4, and NN5 are NN models with the smallest RR values.
Table 2. Estimation results of different NN models.
No.    NN    AIC    AICc    BIC    RR    RE-10%
1    134-160-160-160-1    -3.6964e+06    -3.5891e+06    -2.9608e+06    5.38%    2.02%
2    134-120-120-120-1    -3.6945e+06    -3.6624e+06    -3.2394e+06    6.18%    3.81%
3    134-100-100-100-1    -3.6564e+06    -3.6401e+06    -3.3174e+06    6.53%    5.83%
4    134-150-150-150-1    -3.6832e+06    -3.6031e+06    -3.0238e+06    5.48%    2.67%
5    134-190-190-190-1    -3.5421e+06    -3.2832e+06    -2.5539e+06    5.66%    2.95%
(1)  NN1
For NN1, k = 73121, n = 172781, RSS = 3.7910053306828354e-05.
AIC=2k+n ln⁡(RSS/n)
≈2*73121+172781*ln⁡((3.7910053306828354e-05)/172781)
≈146242+172781*ln⁡(2.1941100761558478e-10)
≈146242+172781*(-22.24007439789845)
≈146242-3842662.294543292
≈-3696420.294543292≈-3.6964e+06
AICc=AIC+2k(k+1)/(n-k-1)
≈-3696420.294543292+(2*73121*(73121+1))/(172781-73121-1)
≈-3696420.294543292+107300.97155299572
≈-3589119.3229902964≈-3.5891e+06
BIC=k*ln⁡(n)+nl n⁡(RSS/n)
≈73121*ln⁡(172781)+172781*ln⁡((3.7910053306828354e-05)/172781)
≈73121*12.059780175603038+172781*ln⁡(2.1941100761558478e-10)
≈881823.1862202698+172781*(-22.24007439789845)
≈881823.1862202698-3842662.294543292
≈-2960839.1083230223≈-2.9608e+06

(2)  NN2
For NN2, k = 45241, n = 172781, RSS = 5.2933543433901086e-05.
AIC=2k+n ln⁡(RSS/n)
≈2*45241+172781*ln⁡((5.2933543433901086e-05)/172781)
≈90482+172781*ln⁡(3.063620619969851e-10)
≈90482+172781*(-21.906253504275965)
≈ 90482-3784984.3867223044
≈-3694502.3867223044≈-3.6945e+06
AICc=AIC+2k(k+1)/(n-k-1)
≈-3694502.3867223044+(2* 45241*( 45241+1))/(172781- 45241-1)
≈-3694502.3867223044+32096.744086122675
≈ -3662405.642636182≈-3.6624e+06
BIC=k*l n⁡(n)+nl n⁡(RSS/n)
≈45241*ln⁡(172781)+172781*ln⁡((5.2933543433901086e-05)/172781)
≈45241*12.059780175603038+172781*ln⁡(3.063620619969851e-10)
≈545596.5149244571+172781*(-21.906253504275965)
≈545596.5149244571-3784984.3867223044
≈-3239387.8717978476≈-3.2394e+06

(3)  NN3
For NN3, k = 33701, n = 172781, RSS = 7.54255043621728e-05.
AIC=2k+n ln⁡(RSS/n)
≈2*33701+172781*ln⁡((7.54255043621728e-05)/172781)
≈67402+172781*ln⁡(4.3653818627148124e-10)
≈67402+172781*(-21.552145261608214)
≈ 67402 -3723801.210445929
≈-3656399.210445929≈-3.6564e+06
AICc=AIC+2k(k+1)/(n-k-1)
≈-3656399.210445929+(2* 33701*( 33701+1))/(172781- 33701-1)
≈-3656399.210445929+16333.035210204273
≈-3640066.1752357245≈-3.6401e+06
BIC=k*l n⁡(n)+nl n⁡(RSS/n)
≈33701*ln⁡(172781)+172781*ln⁡((7.54255043621728e-05)/172781)
≈33701*12.059780175603038+172781*ln⁡(4.3653818627148124e-10)
≈406426.651697998+172781*(-21.552145261608214)
≈406426.651697998-3723801.210445929
≈-3317374.558747931≈-3.3174e+06

(4)  NN4
For NN4, k = 65551, n = 172781, RSS = 4.466574395434743e-05.
AIC=2k+n ln⁡(RSS/n)
≈2*65551+172781*ln⁡((4.466574395434743e-05)/172781)
≈131102+172781*ln⁡(2.58510738763796e-10)
≈131102+172781*(-22.076083880236375)
≈ 131102 -3814327.848911121
≈-3683225.848911121≈-3.6832e+06
AICc=AIC+2k(k+1)/(n-k-1)
≈-3683225.848911121+(2* 65551*( 65551+1))/(172781- 65551-1)
≈-3683225.848911121+ 80146.21328185473
≈-3603079.6356292665≈-3.6031e+06
BIC=k*l n⁡(n)+nl n⁡(RSS/n)
≈65551*ln⁡(172781)+172781*ln⁡((4.466574395434743e-05)/172781)
≈65551*12.059780175603038+172781*ln⁡(2.58510738763796e-10)
≈790530.6502909547+172781*(-22.076083880236375)
≈790530.6502909547-3814327.848911121
≈-3023797.198620166≈-3.0238e+06

(5)  NN5
For NN5, k = 98231, n = 172781, RSS = 6.92521006780584e-05.
AIC=2k+n ln⁡(RSS/n)
≈2*98231+172781*ln⁡((6.92521006780584e-05)/172781)
≈196462+172781*ln⁡(4.0080854190019966e-10)
≈196462+172781*(-21.637537254258746)
≈ 196462 -3738555.3243280803
≈ -3542093.3243280803≈-3.5421e+06
AICc=AIC+2k(k+1)/(n-k-1)
≈-3542093.3243280803+(2* 98231*( 98231+1))/(172781-98231-1)
≈-3542093.3243280803+ 258874.76939999196
≈-3283218.5549280886≈-3.2832e+06
BIC=k*l n⁡(n)+nl n⁡(RSS/n)
≈98231*ln⁡(172781)+172781*ln⁡((6.92521006780584e-05)/172781)
≈98231*12.059780175603038+172781*ln⁡(4.0080854190019966e-10)
≈1184644.266429662+172781*(-21.637537254258746)
≈1184644.266429662-3738555.3243280803
≈-2553911.057898418≈-2.5539e+06

According to the above calculation process, we can see that the value of RSS is very small. Since the current is a small value and is typically taken in the range of 1e-12A to 1e-3A, the magnitude of RSS is very small. After taking the logarithm of RSS, we get negative values, which is why the values of AIC, AICc and BIC are negative.
Although NN2 and NN3 achieved optimality under the AICc and BIC criteria, respectively, their RR values were significantly higher than the NN1 model. The equations for AIC and BIC can be divided into two items, the first one is the complexity penalty of the model (2k for AIC, k*ln(n) for BIC), and the second one is the accuracy penalty of the model (n*ln(RSS/n)). The first item is mainly to avoid overfitting of the model, and the second item considers the fitting of the model itself. The first item is a penalty to the second item, and the model with the least number of parameters is selected under the condition of satisfying the model validity and reliability, so the complexity must be considered after satisfying the high accuracy first. The difference of the accuracy penalty between the different NN models refers to the difference of RSS. RSS value is 3.7910e-05, 5.2934e-05, and 7.5426e-05 for the NN1, NN2, and NN3 models, respectively. There is a significant difference in the accuracy of the three NN models, and the accuracy of NN1 model is higher than the other two models. Therefore NN2 and NN3 models were not selected as the optimal model.
The RR values for the NN1, NN4, and NN5 models were close, so we further evaluated them using the AIC, AICc, and BIC criteria. The AIC, AICc, and BIC values for NN5 are larger than those of NN1 and NN4. While the AIC, AICc, and BIC values for NN1 and NN4 are closer to each other. RSS value is 3.7910e-05 and 4.4666e-05 for the NN1 and NN4 models, respectively. Therefore we eventually choose the NN1 model for further work. According to the RE-10% in Table 2, when the NN1 model fits the data, RE of nearly 98% of the data is within 10%. It took 1.7 hours to train the NN1 model, with a training error of 4.14% and a testing error of 5.38%.


Thanks again for your valuable comments.
Best regards
Haixia Kang

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The authors have presented the estimation of drain current based on artificial neural network (ANN), perceptron, with a couple of device parameters. I feel this manuscript should be extensively revised for the publication and the detailed comments are follows.

Basically, the training of ANN should be done with experimental results; however, the authors did this with SPICE simulation results. They set the Isim as the desired value and train the network with supervised learning. Why do we need to train ANN further instead of using a simple compact SPICE model? A compact model itself will generate device characteristics clearly and precisely with just simple equations. The purpose of utilizing ANN should be to avoid the procedure of building device model, but in this work, they tried to train ANN the generated results by device SPICE model. The generated results by SPICE simulation are already based on some exponential or polynomial equations; therefore, they should be well trained with ANN very easily. I very doubt the motivation and method of this work and it is very hard to see the novelty of this work.

Author Response

Dear expert,

  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

 

Point 1: Basically, the training of ANN should be done with experimental results; however, the authors did this with SPICE simulation results. They set the Isim as the desired value and train the network with supervised learning.

Response 1:

For the device model parameter extraction software, the device model evaluation is implemented in physics-based analytical expression directly or through the embedded SPICE simulators indirectly. The implementation of physics-based analytical expressions for device model evaluation requires the developer to be skilled in device physics and have a deep understanding of the models, so the programming implementation of the physics-based analytical model is tedious and error-prone, and therefore the software development is time-intensive. The development of the device model parameter extraction software is not generic, need to programming for each device model individually. It is time-consuming to run the device model evaluation for the device model parameter extraction software by calling the SPICE simulator, so it makes the parameter extraction less efficient.

Our neural network model is developed based on the SPICE model, i.e., neural network model is equivalent to the SPICE model. It improves the extraction speed than the device model evaluation by calling SPICE, and approximate the accuracy as the SPICE model. Our technique does not require understanding the device model and enables faster and less error-prone development than the implementation of the physics-based analytical model evaluation. The development of the device model parameter extraction software is generic, and programming once for any device model rather than for each device model individually.

 

Point 2: Why do we need to train ANN further instead of using a simple compact SPICE model? A compact model itself will generate device characteristics clearly and precisely with just simple equations.

Response 2:

It can make the implementation of the device model evaluation easier, less error-prone, and time-saving for the device model parameter extraction software, even the programmer without the device model knowledge. Also, the neural network model can accelerate the speed of parameter extraction. According to Table 4 in the manuscript, the neural network computation speed is thousands of times faster than the SPICE simulation.

 

Point 3: The purpose of utilizing ANN should be to avoid the procedure of building device model, but in this work, they tried to train ANN the generated results by device SPICE model.

Response 3:

Yes, our work is to avoid the procedure of building device model. Our neural network model is developed based on the SPICE model, i.e., neural network model is equivalent to the SPICE model. It improves the extraction speed than the device model evaluation by calling SPICE, and approximate the accuracy as the SPICE model.

 

Point 4: The generated results by SPICE simulation are already based on some exponential or polynomial equations; therefore, they should be well trained with ANN very easily.

Response 4:

Our technique enables faster and less error-prone development than the implementation of the physics-based analytical model evaluation. It can make the implementation of the device model evaluation easier.

 

Point 5: I very doubt the motivation and method of this work and it is very hard to see the novelty of this work.

Response 5:

For the device model parameter extraction software, the device models are manually implemented physics-based analytical models or embedded SPICE simulator. The implementation of physics-based analytical models requires the developer to be skilled in device physics and have a deep understanding of the models, so the programming implementation is tedious and error-prone, and therefore the software development is time-intensive. The development technique of the device model parameter extraction software is not generic, need to programming for each device model individually. It is time-consuming to run the SPICE simulation for the device model parameter extraction software, so it makes the parameter extraction less efficient.

Our neural network model improves the extraction speed than the device model evaluation by calling SPICE. Our technique does not require understanding the device model and enables faster and less error-prone development than the implementation of the physics-based analytical model evaluation. The development of the device model parameter extraction software is generic, and programming once for any device model rather than for each device model individually.

 

Thanks again for your comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Reviewer 2 Report

In this manuscript, the authors purpose a modeling technique based on neural networks for optimal extraction of device. In my opinion, this manuscript is interesting to the readers of Applied Science. The topic is very important in this field. This work is novel and original. The authors have solid background in this field. Therefore, the referee recommends it to be published after the following revisions:
1. The English should be polished by a native speaker.
2. It is suggested the authors to check their figures carefully and thoroughly to avoid some typical mistakes (e.g.1200 nm, not 1200nm).
3. The qualities of figures in the manuscript are poor. Specially figure 5(Inset).
4. How are the performances here compared with state-of-the-art reports? The readers would like to see a paragraph near the end of the manuscript before the conclusion to dedicate to such comparison. More recent literature are suggested to be included as comparison


In general, this work seems to be very interesting. The referee would like to see the revision if possible.

Comments for author File: Comments.docx

Author Response

Dear expert,

  Thanks very much for your patience with our manuscript, and we are very grateful for your valuable opinions. The responds and modifications are as follows:

 

Point 1: The English should be polished by a native speaker.

Response 1: The manuscript has been carefully revised according to your valuable comments.

 

Point 2: It is suggested the authors to check their figures carefully and thoroughly to avoid some typical mistakes (e.g.1200 nm, not 1200nm).

Response 2: There are indeed some typical mistakes in the manuscript due to our negligence, and they have been carefully checked and revised (e.g. 1e-12 A, not 1e-12A, -40 °C, not -40°C).

 

Point 3: The qualities of figures in the manuscript are poor. Specially figure 5(Inset).

Response 3: All figures in the manuscript are replaced with high-quality figures, and the original figures are packaged and uploaded.

 

Point 4: How are the performances here compared with state-of-the-art reports? The readers would like to see a paragraph near the end of the manuscript before the conclusion to dedicate to such comparison. More recent literature are suggested to be included as comparison.

Response 4: In the revision, a recent reference (Reference 31) is added to the manuscript. We add the performance comparison between our work and the state-of-the-art study (line 236 to line 238). According to comparison, the extraction speed of the neural network model is much faster than the compact model.

 

Thanks again for your comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed the questions by reviewers properly. I have minor comments before the publication.

 

1) If the authors would like to address NN-based method haves less error, please provide the error-rated generated by SPICE model as well and compare with the proposed method.

2) It would be better if the authors can compare SPICE and NN-based method in terms of not only 'time' but also 'power consumption'.

Author Response

Dear expert,

Thanks very much for your patience and encourage. Based on your comments, the responds and modifications are as follows.

 

Point 1: If the authors would like to address NN-based method haves less error, please provide the error-rated generated by SPICE model as well and compare with the proposed method.

Response 1:

Our technique does not require understanding the device model and enables faster and less error-prone development than the implementation of the physics-based analytical model evaluation. The neural network model is equivalent to the SPICE model. It improves the extraction speed than the SPICE simulator and approximates the accuracy as the SPICE model.

The comparison of the extraction results of the SPICE model and the neural network model is added to the manuscript (line 241 to line 244). For the measured data of a single device, the extraction error of the SPICE simulator was 3.27% and took 4.4 hours. The extraction error of the neural network model is 4.28% and took is 8.6 seconds. According to comparison, the extraction accuracy of the neural network model is very approximate to the SPICE simulator, while the neural network is much faster than SPICE simulator.

 

Point 2: It would be better if the authors can compare SPICE and NN-based method in terms of not only 'time' but also 'power consumption'.

Response 2:

Both the SPICE simulator and the neural network model work on the CPU processor, so the comparison of time can be approximated as the comparison of power consumption. On this issue, we have added an explanation in the manuscript (line 244 to line 245).

 

Thanks again for your comments.

 

Best regards

Haixia Kang

Author Response File: Author Response.pdf

Back to TopTop