Next Article in Journal
A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks
Previous Article in Journal
Development of an Electrooculogram (EOG) and Surface Electromyogram (sEMG)-Based Human Computer Interface (HCI) Using a Bone Conduction Headphone Integrated Bio-Signal Acquisition System
 
 
Article
Peer-Review Record

SOMN_IA: Portable and Universal Device for Real-Time Detection of Driver’s Drowsiness and Distraction Levels

Electronics 2022, 11(16), 2558; https://doi.org/10.3390/electronics11162558
by Jonathan Flores-Monroy, Mariko Nakano-Miyatake, Enrique Escamilla-Hernandez, Gabriel Sanchez-Perez and Hector Perez-Meana *
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Electronics 2022, 11(16), 2558; https://doi.org/10.3390/electronics11162558
Submission received: 14 July 2022 / Revised: 3 August 2022 / Accepted: 12 August 2022 / Published: 16 August 2022
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

The proposed paper is quite well written and is of very much interest to the reader. A portable system for real-time detection of drowsiness and distraction in drivers has been developed. The device SOMN_IA used for the purpose seems to be a useful aid, and both the hardware and software aspects of the implementation have been clearly described. 

However, there are some concerns that may need to be addressed:

·      -  The dataset used lacks a validation set for tuning the hyperparameters. A portion of   the training set may be used as validation set, and the model must be trained with   various values of hyperparameters while validating on the validation set. The best   model obtained during training must be saved and then evaluated on the test set,   which may give better classification accuracy.

·        -  A bar graph corresponding to the comparison tables 3 and 5 would be useful.

·        -  In lines 248 and 262, it says the process passes to Stage-4. Isn’t it Stage-5?

·        -  Some grammatical and spelling corrections are needed:

    line 295: “into account” instead of “by account”

    line 299: must loss more than two landmark points?

    line 333: in environmental conditions

    line 560: adapted to

    line 590: “dangerous” instead of “dengerous”

    line 601: “requirements” instead of “requierments”

    line 602: adjustment

    line 606: laboratory

 

    line 608: immediate

Author Response

Dear Reviewer 1.

Thank you for revising our paper and the helpful recommendations and suggestions about it.  We have modified and improved our paper attending your recommendations as follows.  

  1. The dataset used lacks a validation set for tuning the hyperparameters. A portion of   the training set may be used as validation set, and the model must be trained with   various values of hyperparameters while validating on the validation set. The best   model obtained during training must be saved and then evaluated on the test set, which may give better classification accuracy.

Thank you for your helpful observation.  Attending it we have modified our paper including the paragraph contained in the lines 319-324, together with Table 3, which shows three best CNN models with different hyperparameters determining while validating.

  1. A bar graph corresponding to the comparison tables 3 and 5 would be useful.

Thank you for your observations.  Attending your recommendation, we have included the bar graph (Figure 22) related to Table 3 in the original version, Table 4 in the revised version and Figure 29 corresponding to Table 5 in the original version, Table 6 in the revised one.  We also include some explanation related to Figure 29 in lines 605-614.  It is important to mention that despite including Figures 22 and Figure 29, we have kept Tables 4 and 6 in the revised version, because it contains information that cannot be included in the bar graphs.

  1. In lines 248 and 262, it says the process passes to Stage-4. Isn’t it Stage-5?

Thank you for your observation.  Stage-4 have been modified according to your observation by Stage-5.

  1. Some grammatical and spelling corrections are needed:

Thank you for your observations.  The grammatical and spelling error have been corrected according to your observations as follows

In line 295 of original version, line 297 in the revised one, the word by account was replaced by into account instead of “by account”

In line 299, line 300-301 of the revised version, the sentence “must loss more than two landmark points, was replaced by “the analyzed face region must contain at most three landmark points”.

In line 333 of original version, line 350 of revised one, the term “environmental conditions” was modified by “in environmental conditions”

    In line 560 of the original version, line 587, adapted was change by “adapted to”

    In line 590 of the original version, line 648 of the revised one was changed by “dangerous”

In line 601 of the original version, line 658 of the revised one, the term “requierments” was replace by the term “requirements” instead of

In line 602 and 606 on the original version, lines 659 and 663 of the revised one on the terms and adjustment were corrected.

    In line 608 of the original version, the term immediate was deleted.

Reviewer 2 Report

The article is well written and accept for the publication in current form.

Author Response

Dear Reviewer 2.

Thank you for revising our paper and the helpful recommendations and suggestions about it.  We have modified and improved

Reviewer 3 Report

 

My major concern is testing and validation of the proposed methodology and the device. The dataset used for training and validation contains 1.6k  images of each class. This dataset is very small for a CNN based deep model. This will surely introduce over-fitting which is my second concern. There are no graphs which can give us an idea  whether the proposed model is overfitting. My third concern is the images vs image sequences. The proposed model is trained on still images whereas the sensing device produces image sequences i.e. videos. I am not much convinced that a model trained over still images will perform the same over video streams.  The sensing device is also streaming infrared images, however, no results related to infrared images are discussed.  The authors should test the proposed approach with a very large scale dataset as well as over different vehicles and at different device mounting positions and the related qualitative and quantitative results should be included in the manuscript. Why  S-CNN is selected  should also be rationalized. The comparison give on table 5 should only be included if all of the algorithms are tested on the same dataset, otherwise, it does not make sense.

Author Response

Dear Reviewer 3.

Thank you for revising our paper and the helpful recommendations and suggestions about it.  We have modified and improved our paper attending your recommendations as follows.

  1. My major concern is testing and validation of the proposed methodology and the device. The dataset used for training and validation contains 1.6k images of each class. This dataset is very small for a CNN based deep model.
  2. Thank you for your observation. According to it, we have modified our paper including in the revised version the lines 330-334 and 341-342 which explain about the use of data augmentation to increase artificially size of training and validation datasets. Also, in the lines 319-324, how the data set was divided to perform the training and validation processes. This will surely introduce over-fitting which is my second concern. There are no graphs which can give us an idea whether the proposed model is overfitting.

Attending your comment, we have modified our paper including the paragraph contained in the lines 341-344, together with the figure 12 of the revised version.  Here we include an explanation about some techniques used during training and validation process to alleviate overfitting, showing the overfitting is minimum in Figure 12.

  1. My third concern is the images vs image sequences. The proposed model is trained on still images whereas the sensing device produces image sequences i.e., videos. I am not much convinced that a model trained over still images will perform the same over video streams. 

Attending your observation, we have included an explanation about it in lines 151-153 and 194-196, where we mentioned that, firstly the video sequence is decoded and segmented in frames before to carry out the driver drowsiness and distraction.  Thus, the detection process is carried out frame by frame as during the training process. Also, we have modified the Figure 2 to clarify the above-mentioned operation.  

  1. The sensing device is also streaming infrared images, however, no results related to infrared images are discussed. 

We apologize for the lack of clarity in the first version.  The revised version was modified to explain clearly, in lines 530-533, the results obtained when the proposed system is required to operate in infrared light conditions.  Also, in Table 5 and Figure 23 of the revised version, we show the detection accuracy and confusion matrices obtained when the proposed systems are required to determine the states of alert, distracted and drowse of the driver with and without glasses under both visible light and infrared light conditions

  1. The authors should test the proposed approach with a very large scale dataset as well as over different vehicles and at different device mounting positions and the related qualitative and quantitative results should be included in the manuscript.

Attending your observation, we added lines 615-627, as well as Table 7 in the revised version.  Here we present the evaluation results in different conditions using Real-Life Drowsiness Dataset (UTA-RLDD), which is only database that includes real driving condition using different vehicle and different optical sensors.  Unfortunately, as explain in lines 664-667 of the revised version, it’s quite difficult to obtain the Ground Truth for the larger-scale database for quantitative evaluation, although we include results obtained using a public domain database UTA-RLDD which approach reasonably well in the more realistic conditions.    

  1. Why S-CNN is selected should also be rationalized.

Attending your observations, we introduced in the revised version the Table 3 which shows the best three configurations for the S-CNN, after several validation process

  1. The comparison given on table 5 should only be included if all of the algorithms are tested on the same dataset, otherwise, it does not make sense.

Attending you observation, we include in the revised version which compares the performance of proposed algorithms when they are required to detect only driver drowsiness and processing rates in Figure 29.  We also kept the Table 6, Table 5 in the original version, because it has other important data about hardware implementation and real-time process, considering the principal contribution of the paper Also, we added a new section 4.4 “Evaluation of proposed system in the real-world condition” (lines 615-627), in which we compare detection accuracy with other systems using the same database UTA-RLDD.

Round 2

Reviewer 3 Report

Many of  my concerns raised in previous iteration are addressed. I think the paper is in a good form.

Back to TopTop