*4.2. Metrics*

To investigate the effectiveness of the method, we evaluate fall detection at the classifier-output level by measuring error rates, computed from good and misclassified images. However, to fully evaluate the algorithm, we needed to measure the precision and recall parameters [23]. Precision provides information about the proportion of positive fall identifications that are actually falls and recall the proportion of falls that were identified correctly. Unfortunately, these parameters work in different directions, meaning that improving precision typically reduces recall and vice versa:

$$Pr = \frac{TP}{TP + FP} \tag{6}$$

$$Re = \frac{TP}{TP + FN} \tag{7}$$

being


#### *4.3. Experiment 1: Fall Classification*

In this first experiment, we evaluated the performance of the learning-based fall/nonfall classification algorithm by itself without considering the person detection part. We use the groundtruth hand-labeled bounding boxes from our dataset as inputs. Cross-validation was performed on the training set to find the optimal *C* and *γ* values in the RBF SVM classifier. Figure 11 shows the accuracy-level curves for both parameters. We selected *γ* = 2 and *C* = 128 with an accuracy of 99.55%. These values were also established for Experiments 2 and 3.

**Figure 11.** Accuracy-level curves during cross-validation for *γ* and *C* parameters in Experiment 1.

We summarize the experiment results in only one table to help with comparisons (Table 3). The first row of this table are the results of this experiment. The fall classifier detected 390 true positives, 1 false negative and 0 false positives, which means precision and recall of 100% and 99.74%, respectively. These results confirm a grea<sup>t</sup> selection of the selected input parameters to the SVM classifier.

**Table 3.** Performance over testing set *T* in the FPDS dataset.


Several approaches have been proposed to detect falls, with good results. However, only a few of them take into account realistic datasets with different normal daily situations. One of the more complicated situations to solve is not detecting falls versus standing but rather falls versus resting situations where the person has a similar pose orientation. Our fall classifier can detect both situations in all cases that were tested. Figure 12 shows two examples.

**Figure 12.** Example images from testing test where algorithm differentiates between fallen person and resting person.

#### *4.4. Experiment 2: Fall Detection Algorithm*

The next experiment evaluated the performance of the overall end-to-end fall detection algorithm—person detection and fall classification. In this case, the person detection part was done by using deep learning method YOLOv3.

To maximize performance, the confidence score of bounding box conf*bi* provided by YOLOv3 should have been above than a certain threshold Conf*i*. We selected threshold value Conf*i* = 0.2 for having good trade-off performance between recall and precision. Figure 13a shows this point by '\*'.

**Figure 13.** Recall and precision metrics for different thresholds. (**a**) Experiment 2, conf*i*. (**b**) Experiment 3, conf*<sup>r</sup>*.

Note the terminology—subindex "*i*" is used for parameters assigned to the "image directly from the camera" to differentiate them from the parameters assigned to the "rotated images" with subindex "*r*" that is explained in the next subsection.

We used Intersection over Union (IoU) as an evaluation metric to compare the bounding boxes provided by the fall detection algorithm and the ground-truth hand-labeled images from our dataset. To set a threshold value for the IoU, called IoU*i*, we analyzed how this value affects the precision and recall parameters. It was observed that the values of these metrics were almost independent of the selected threshold, setting; in that case, value to IoU*i* = 0.2. Values Conf*i* = 0.2 and IoU*i* = 0.2 were also established in Experiment 3.

As in the preceding subsection, the second row of Table 3 shows the results of testing set *T* for this experiment. It detected 304 true positives, 87 false negatives and 9 false positives. The values of precision and recall, in this case, were 97.12% and 77.74%, respectively. The false alarms were

mainly caused by errors in the person detection step of the overall algorithm. Therefore, if we compare with the performance of the SVM classifier itself, overall performance is worse. YOLOv3 was trained using the Common Objects in Context (COCO) dataset [49], which did not have enough lying-position persons for training the CNN to recognize persons in that position with high accuracy.

#### *4.5. Experiment 3: Fall Detection with Pose Correction*

YOLOv3 performance to detect persons in lying-positions improves with customized training using a dataset with a large number of persons in that position. However, to build this kind of dataset is costly and time-consuming. Due to the lack of public datasets with this characteristic at the moment, this training is not possible. The FPDS dataset proposed in this paper is useful for evaluating the robustness of the algorithm in different situations but does not have enough images for the customized training of YOLOv3.

The smallness of the training set represents a significant problem to the overall algorithm, as we analyzed in the previous experiment. Many have, therefore, tried to reduce the need for large training sets. In this article, we investigated how person pose position affects the efficiency of the approach. The experiments show that adding simple pose correction to YOLOv3 improves performance without the need for new customized training. The pose correction algorithm is explained in Figure 14. We ran three separate YOLOv3 networks, one for the initial image and two more for the rotated images at 90 and 270 degrees.

**Figure 14.** Flowchart of fall detection with pose correction.

For better optimization, we analyzed whether the correct threshold of the confidence score applied to the rotated images, called conf*<sup>r</sup>*, was the same as the one used for the image directly from the camera, conf*<sup>r</sup>*. Figure 13b shows the precision and recall metrics for different thresholds. In this case, we obtained the best trade-off for a value of conf*r* = 0.15, keeping the value of conf*i* = 0.2 for the image directly from the camera.

This modified person-detector algorithm could detect the same fall more than once. To identify if the bounding boxes belonged to the same fall, we needed to establish a new threshold for the IoU parameter, called IoU*<sup>r</sup>*. In case the bounding boxes are the same, the algorithm keeps only one; otherwise, it keeps both of them. A threshold of IoU*r* = 0.1 provides a good trade-off between the precision and recall metrics.

Table 4 shows three examples from the testing set of the FPDS dataset with its detections in the initial and rotated images. In the first row, we can observe how the lying-person was detected in the two rotated images but not in the initial one. However, in the second example, the person was only detected in the 270◦-rotated image. In the last example, with two fallen persons, one of the falls was detected in the three images but the other one was only detected in the 270◦-rotated image.


**Table 4.** Fall detection examples with pose correction.

Thanks to pose position correction, the overall method improved considerably in recall while keeping almost the same precision. Results are shown in the third row of Table 3. It has detected 360 true positives, 31 false negatives and 17 false positives with values of precision and recall of 95.49% and 92.07%, respectively.

#### *4.6. Evaluation 1: Relocation for Doubtful Cases*

One of the main points of the proposed approach is the ability of the robot to relocate itself when fall detection is doubtful. The relocation algorithm moves the robot depending on the size and position of the detected bounding box. In all cases, the robot moves to center the possible fall detection with proper dimensions. Figure 15 shows three different cases where the robot needs to move to ge<sup>t</sup> a better picture of the person.

**Figure 15.** Robot relocation in three different testing examples of the FPDS dataset.

#### *4.7. Evaluation 2: Other Datasets*

To evaluate the detection effectiveness of our algorithm, we needed to test the proposed approach with alternative algorithms described in the literature. In this case, we decided to use the public Intelligent Autonomous Systems Laboratory Fallen Person Dataset (IASLAB-RGBD) [52], close enough to our dataset. This dataset was generated by using a Kinect One V2 camera mounted on a mobile robot 1.16 m above the floor. We used static dataset with 374 images, 363 falls and 133 nonfalls. Despite our camera being 76 cm above the floor and the training set having been built by using the same splits of the FPDS dataset as in the other test experiments, results were quite satisfactory in Experiment 1, with precision and recall of 99.45% and 100%, respectively. However, detection was not so good in Experiments 2 and 3, as can be observed in Table 5. These results indicate the good selection of the input feature vector to the SVM, which makes the classifier almost independent of the camera setup. Giving the impossibility to compare the results directly, the comparison is proof of the good performance of our method with other datasets that contain images that considerably differ from the examples in the training set.

**Table 5.** Performance over the Intelligent Autonomous Systems Laboratory Fallen Person Dataset (IASLAB-RGBD).


#### **5. Conclusions and Future Work**

In this paper, we presented a low-cost system for detecting falls in elderly populations and people with functional disabilities who are living alone. The system is based on an assistive patrol robot that can be used for one, two, or more people. Our objective was to implement a vision-based fall detector system in our robot that acquires image data, detects falls and alerts emergency services. In our attempts to detect falls with an easy, fast and flexible end-to-end solution, we proposed a two-step algorithm. We combined a CNN to be used for person-detection and an SVM for fall classification. One of the main contributions of this paper was to find the combination features for the SVM-based classifier that provide the best performance for the design requirements. Results obtained from the different experiments indicate that the system had a high success rate in fall detection and could correct the position of the robot in case of doubt.

It is important to remark that, compared with existing fall detection approaches that show weakness in distinguishing between a resting position and a real fall scene, our fall classification algorithm could correctly detect both situations in all tested cases. Another important result to highlight is the ability to work correctly and detect fall situations with persons of different heights.

Since one of the goals of the work was to run a fall detection algorithm in real-time, it was needed to evaluate time implementation. In our case, the only time-consuming task was due to YOLOv3 person detection, which is more than acceptable for a real-time fall detection system.

We evaluated the robustness of the method using a realistic dataset called FPDS, which is publicly available and a contribution of this paper. The main features of this dataset are eight different scenarios, various person sizes, more than one person in an image, several lying-position perspectives and resting persons.

Additionally, we tested our algorithm using other datasets (training was done using the FPDS dataset). The results are quite satisfactory in fall classification, which showed us the almostindependence of the algorithm with the camera setup.

Future works to investigate are improvement in occlusion detection and the possibility to merge person detection and fall classification into a single CNN by using one or two different classes.

**Author Contributions:** Conceptualization, S.M.-B.; methodology, S.M.-B. S.L.-A., and C.I.-I.; software, C.I.-I.; validation, C.I.-I., S.M.-B., and P.M.-M.; writing—original-draft preparation, P.M.-M.; supervision, S.M.-B.

**Funding:** This work was partially supported by the Alcalá University research program through projects CCG2018/EXP-061 and TEC2016-80326-R from the "Ministerio de Economía, Industria y Competitividad".

**Conflicts of Interest:** The authors declare no conflict of interest.
