1. Introduction
Benign paroxysmal positional vertigo (BPPV) is the most common vestibular peripheral vertigo, which is a transient induced vertigo when the head position changes to a specific position. The pathogenesis of BPPV has been widely recognized around the theory of canalithiasis and cupulolithiasis. At present, all kinds of induction tests and corresponding manual reduction therapy have been taken as the primary diagnosis and treatment method of BPPV in various hospitals, and these methods have achieved obvious effects [
1,
2]. For example, from August 2012 to August 2014, 175 patients with BPPV were diagnosed in the vestibular function examination room of the ENT Head and Neck Surgery Department of Xiangya Hospital of Central South University [
3]. These patients, comprising 53 males and 122 females, were successfully treated by manual reduction. All patients were questioned about their medical history in detail, including vertigo attack, past history, and family history, and routine otological examination was performed. All patients were examined with the American VisualEyes infrared nystagmograph. The patients wore goggles to complete all body position tests in a dark room. In 175 patients with BPPV, vertical nystagmus was recorded in all patients.
Vertical nystagmus is a common neuro-ophthalmic sign in the field of vestibular medicine. Vertical nystagmus not only reflects the functional state of the vertical semicircular canal but also reflects the role of otoliths. According to the literature [
4,
5,
6,
7], medical experts can take nystagmus symptoms as the key factor to determine the cause of dizziness. At present, the clinical diagnosis of BPPV mainly depends on the specific displacement test to induce nystagmus in addition to the preliminary judgment of the involved semicircular canal according to the patient’s history characteristics. Then, according to different types of otolith reduction, the patients can obtain better therapeutic effect. Therefore, accurate nystagmus detection and analysis are the premise of correct diagnosis for BPPV and the key to ensuring efficacy.
Traditional observation is visual observation conducted by medical experts, which may be biased subjectively. Visual examination also requires medical experts to have enough experience to make an accurate diagnosis. In addition, people with dizziness may feel uncomfortable when trying to keep their eyes open completely, so their eyes may only remain partially open. Therefore, it is necessary to emphasize the observation of nystagmus to support clinical decision so as to enhance the diagnostic accuracy of medical experts [
8]. Meanwhile, a practical method is needed to accurately detect nystagmus and provide results to medical experts.
Electronystagmography (ENG) is an image of the electric field changes around the eyeball when the eyeball is moving. The eyeball is a bipolar sphere. The cornea shows a positive potential relative to the retina and the retina shows a negative potential relative to the cornea. The two constitute an axis of potential difference. When the eyeball is in the emmetropic position, the potential difference between the cornea and the retina is about 1mV, and an electric field is formed on the head and face. This electric field changes its spatial phase when the eyeball moves. Placing an electrode on the skin of both sides of the eyeball, a voltage value can be drawn between the two electrodes. The voltage obtained was amplified with the principle of bioelectrical amplification, and recorded as an image. This is called electronystagmography, which reflects the change of the eye position. Visual observation of nystagmus is greatly limited, and it is difficult to analyze quantitatively. Accordingly, Henriksson [
9] designed a special electronic electronystagmography machine and applied it to clinical practice. At present, electronystagmography is one of the important means for the localization diagnosis of the nervous system.
ENG has been applied to otology, mainly for the diagnosis of lesions around the vestibular system. At present, electronystagmography has been widely used in various clinical departments. Recording devices and technology have been greatly improved, especially the application of computers. The analysis of electronystagmography parameters has developed from naked-eye and manual analysis to automatic sampling quantitative analysis, which has promoted the research of electronystagmography and improved its application value.
Another method to measure eye movement is video measurement [
10,
11,
12]. This method uses cameras to capture eye movement videos and uses relevant software to track pupil movements. With the development of computer vision technology, video ophthalmology has become a frequently used method [
13]. Syahbana [
14] proposed a method to obtain nystagmus waveform by visual measurement. This method estimates the eye movement by tracking the position of the patient’s eye pupil. In order to accurately estimate the position of the patient’s pupil, it is necessary to model the shape of the pupil. Generally, the existing research adopts the circular shape to approximate the pupil shape [
15], such as Hough transform method [
16]. However, the actual pupil shape is not a perfect circle. The approximate ellipse shape leads to a decline in the accuracy of pupil estimation. To solve this problem, Syahbana [
14] proposed a pupil detection and tracking method based on the Mexican hat elliptical pattern, which can improve the accuracy of pupil position estimation.
It is very difficult to detect the vertical nystagmus with electronystagmograph (ENG). Most quantitative observations of human and animal optokinetic nystagmus (OKN) are conducted on the horizontal plane. It is generally agreed that using ENG to record the vertical movement of the eyeball leads to blinking artifacts. Iijima [
17] thinks that high-speed videography (VOG) can replace the traditional ENG. If the detection device can be miniaturized and the recording time can be extended, the system can be widely used in high-speed eye movement image detection. VOG was widely used in the diagnosis of vertigo. However, the clinical manifestations of vertigo change with time. In this condition, VOG can be used in emergency and telemedicine diagnoses [
18,
19]. In such a different clinical environment, the challenges faced by VOG interpretation are not insignificant. Most emergency doctors have not received VOG equipment training, let alone the patients experiencing dizziness. Partly because of these problems, telemedicine solutions have emerged, allowing neuroscientists to quickly interpret VOG data remotely [
20]. However, the number of neuro-otologists is not adequate and the implementation of telemedicine solutions is unrealistic. In this case, VOG analysis with automatic nystagmus detection is becoming a potential key solution for the future.
Charoenpong [
21] proposed a method to detect involuntary eye movements with eye movement velocity. This method includes three main steps: pupil extraction, eye movement velocity calculation, and nystagmus detection. The accuracy of non-autonomous eye movement detection was 87.21%. The error is due to the inaccurate extraction of the pupil center. In practice, it is difficult to evaluate patients with videonystagmography (VNG) when their pupils are covered by drooping eyelids or eyelashes, and the interference of infrared light makes the situation worse [
22]. Therefore, it is urgent to establish a nystagmus detection model.
With the development of technology, the detection system of nystagmus can be realized by using artificial intelligence (AI) technology. AI is an interdisciplinary approach, committed to data-driven experiential learning [
23], which is considered as a potential solution to some medical diagnosis challenges. Zhang et al. [
24] proposed a kind of nystagmus detection model based on optical flow technology, which can avoid interference caused by eyelash occlusion and pupil deformation. However, this model only provides a basic framework for the detection of nystagmus and cannot be directly applied to disease diagnosis. Lim et al. [
25] developed a diagnosis decision support system for BPPV diagnosis using a two-dimensional convolutional neural network (2D-CNN) model. The results show that the system can detect nystagmus with a large number of training data, but this prediction ability is limited in the case of insufficient otological expert annotation data. Lu et al. [
26] developed a new method for pupil location and iris distortion detection. This model has been verified in BPPV patients and has high sensitivity and accuracy in nystagmus detection and disease diagnosis. The first step of this method is to find the location of the pupil in each frame. The pupil location algorithm was used to locate the pupil center.
The previous research tried to use deep learning model to predict the pupil position [
27,
28]. With the continuous improvement of deep learning, pupil detection has mainly used the data-driven mode. Tonsen et al. [
29] designed a deep learning model based on an open-source dataset which contains 66 high-quality and high-speed videos [
30] and then used the pre-trained model to mark the original video.
On the basis of previous research, this paper designs a new method based on deep learning to detect vertical nystagmus so as to further improve the detection accuracy of vertical nystagmus. The innovation of this paper is to propose a new vertical nystagmus recognition method based on deep learning. We designed a new method of vertical nystagmus feature extraction and temporal feature recognition. The dilated convolution was used to obtain larger receptive field and more abstract features of vertical nystagmus. In order to reduce computational complexity, an improved depthwise-separable convolution structure was proposed to reduce the number of parameters which were needed for the calculation of vertical nystagmus feature extraction. L2 regularization strategy was added to the depthwise-separable convolution structure to solve the problem of over-fitting. Meanwhile, convolution attention mechanism was added to each depthwise-separable convolution operation to better obtain the channel features and plane space features of vertical nystagmus images. In order to improve the recognition accuracy, an improved GRU recognition model was proposed to capture the vertical nystagmus information at the critical moment. This paper is divided into five parts: The first part introduces the background of this research. The second part introduces the basic principle of the vertical nystagmus detection method. The third part introduces the experimental process and results. The fourth part is a comparison between this method and other methods. The last part is the conclusion.
3. Experimental Verification of the Designed Method
The dataset used in this paper is from the Eye & ENT Hospital of Fudan University. The vertical nystagmus training data and test data were annotated by ophthalmologists of Affiliated Hospital of Fudan University in Shanghai of China. The equipment used for nystagmus video capture was the eye movement recorder of Shanghai Zhiting Medical Technology Co., Ltd. The vertical nystagmus video is 640 × 480 pixels and 60 fps. The collected data came from 1090 patients, and 21,743 segments of vertical nystagmus video were collected. The collected data were labeled by the doctors of the hospital to form a test and validation dataset; 80% of the samples in the dataset were used for model training, and 20% were used for test verification. The training results of the proposed model and the verification results are shown in
Figure 10.
From
Figure 10, it can be seen that the proposed model has a good effect in the training and verification process. With the increase in the number of training iterations, the classification accuracy of the model continues to improve. The model tends to be stable after 24 iterations. The recognition accuracy of vertical nystagmus when the model was stable during training and verification is shown in
Table 1.
The LOSS of the model during training and verification is shown in
Figure 11.
It can be seen from
Figure 11 that the LOSS of the model gradually drops to a stable state during the training and verification process with the increase in training iterations. When the LOSS was stable, it was in a lower numerical range. In order to further evaluate the algorithm,
Figure 12 shows the fusion matrix, PR curve, and ROC curve.
In
Figure 12a, 0 indicates no nystagmus and 1 indicates nystagmus. It can be seen from
Figure 11 that the proposed method can identify vertical nystagmus more accurately.
In order to inspect the effect of each module of the proposed algorithm on the overall performance of the model, an ablation experiment was carried out. The experimental results are shown in
Table 2.
As can be seen from
Table 2, the introduction of convolution attention module significantly improved the classification accuracy. This shows that the introduction of attention mechanism in the network can better extract nystagmus motion characteristics and spatiotemporal information. Other modules also improve the classification accuracy.
4. Comparison of Feature Extraction Method Replacement
In the process of model design, we designed another feature extraction method to compare the recognition effect of vertical nystagmus. The feature extraction network module structure is shown in
Figure 13.
The feature extraction model mainly includes convolution layer, residual block, and average pooling layer. The structure of each residual block is shown in
Figure 14.
This vertical nystagmus recognition method was named Method 2 with this feature extraction method. For Method 2, we used the same training dataset for training and the same verification set for test. The training and verification process is shown in
Figure 15.
It can be seen from
Figure 15 that the recognition accuracy is constantly improved during the training and verification process. With the increase in iterations, the recognition accuracy tends to be stable. The model started to be stable after 21 iterations. This shows that this method is feasible. The LOSS during training and verification is shown in
Figure 16.
It can be seen from
Figure 16 that the LOSS of Method 2 gradually drops to a stable state in the process of training and verification. With the increase in iterations, the curve remains in a small numerical range. In order to further evaluate Method 2,
Figure 17 shows the fusion matrix, PR curve, and ROC curve.
In
Figure 17a, 0 indicates no nystagmus and 1 indicates nystagmus. It can be seen from
Figure 16 that method 2 can also identify vertical nystagmus efficiently. Then the recognition accuracy of Method 2 was compared with the proposed method. The comparison result is shown in
Figure 18.
It can be seen from
Figure 18 that the recognition accuracy of vertical nystagmus is constantly improving. With the increase in training iterations, the recognition accuracy tends to be stable. After 24 iterations, the process started to be stable. The average recognition accuracy of the two methods is shown in
Table 3 after the model recognition tends to be stable.
It can be seen from
Table 3 that the proposed method has a high recognition accuracy of vertical nystagmus. The vertical nystagmus recognition accuracy of the two methods in the test set is shown in
Figure 19.
As can be seen from
Figure 19, the recognition accuracy of vertical nystagmus continues to improve and become stable with the increase in iterations. The process started to be stable after 24 iterations. When the recognition accuracy tends to be stable, the average recognition accuracy of the two methods is shown in
Table 4.
It can be seen from
Table 4 that the proposed method has a high recognition accuracy on the test set after the model is stable.
5. Comparison with Other Methods
The proposed method was compared with Lim’s method [
25], Lu’s method [
26], and Zhang’s method [
24]. These methods used the same training set for training and used the same verification set for testing. The recognition accuracy during training and testing is shown in
Figure 20 and
Figure 21, respectively.
It can be seen from
Figure 20 and
Figure 21 that the recognition accuracy of these methods tends to be stable with the increase in iterations during the training and testing process, which indicates that these methods are feasible for vertical nystagmus recognition. After the model recognition is stable, the average recognition accuracy in the training set and verification set is shown in
Table 5 and
Table 6, respectively.
It can be seen from
Table 5 and
Table 6 that the proposed method has a relatively high recognition accuracy for vertical nystagmus, which indicates that the proposed method has a good effect on vertical nystagmus recognition. Further, we extracted sample images from original videos and the intermediate results in the main processes, as shown in
Figure 22.
From the results of program statistics, the intermediate feature map of the proposed method has the most activation values.
Compared with other methods, the proposed method does not need to locate the pupil. Zhang’s method needs to calibrate the pupil and combine it with Hough transform and trajectory tracking based on template matching. Lu’s method also needs to mark the position of the pupil center and use the pre-training model to label the original video. Lim’s method used an algorithm based on the center of gravity to track the pupil center. Circular Hough transform was used to detect elliptical pupil. If the pupil was found, an edge detection and ellipse fitting algorithm would be used to locate the center of the pupil. Compared with other methods, the proposed method simplifies the processing process. In the data processing, Lu’s method used data enhancement and Zhang’s method compressed video data. The proposed method and Lim’s method directly used the original video clip. This reduced the calculation steps. From the experimental results, the proposed method can further improve the accuracy of vertical nystagmus recognition. In the future, the recognition accuracy may be further improved, which requires the efforts of more researchers.