Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm

Zhang, Zhong Shi; Wu, Yun; Zheng, Bin

doi:10.3390/info16040303

Open AccessArticle

Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm

by

Zhong Shi Zhang

¹

,

Yun Wu

¹ and

Bin Zheng

^1,2,*

¹

Surgical Simulation Research Lab, University of Alberta, Edmonton, AB T6G 2E2, Canada

²

Department of Surgery, University of Alberta, Edmonton, AB T6G 2B7, Canada

^*

Author to whom correspondence should be addressed.

Information 2025, 16(4), 303; https://doi.org/10.3390/info16040303

Submission received: 3 March 2025 / Revised: 28 March 2025 / Accepted: 9 April 2025 / Published: 11 April 2025

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Background: Minimally invasive surgery (MIS) is an advanced surgical technique that relies on a camera to provide the surgeon with a visual field. When the camera rotates along its longitudinal axis, the horizon of the surgical view tilts, increasing the difficulty of the procedure and the cognitive load on the surgeon. To address this, we proposed training a convolutional neural network (CNN) to detect camera rotation, laying the groundwork for the automatic correction of this issue during MIS procedures. Methods: We collected trans-nasal MIS procedure videos from YouTube and labeled each frame as either “tilted” or “non-tilted”. The dataset consisted of 2116 video frames, with 497 frames labeled as “tilted” and 1619 frames as “non-tilted”. This dataset was randomly divided into three subsets: training (70%), validation (20%), and testing (10%) Results: The ResNet50 was trained on the dataset for 10 epochs, achieving an accuracy of 96.9% at epoch 6 with a validation loss of 0.0242 before validation accuracy began to decrease. On the test set, the model achieved an accuracy of 96% with an average loss of 0.0256. The final F1 score was 0.94, and the Matthews Correlation Coefficient was 0.9168, with no significant bias toward either class. The trained ResNet50 model demonstrated a high success rate in predicting significant camera rotation without favoring the more frequent class in the dataset. Conclusions: The trained CNN accurately detected camera rotation with high precision, establishing a foundation for developing an automatic correction system for camera rotation in MIS procedures.

Keywords:

camera rotation; minimally invasive surgery; machine learning; convolutional neural network; surgical performance; patient safety

1. Introduction

Minimally invasive surgery (MIS) is a surgical technique involving a camera to visualize the surgical field instead of opening up the patient [1]. Paralleled with rapid technological advances, the minimally invasive techniques are now a part of almost every surgical sub-specialty [2]. When compared to open surgery, minimally invasive techniques offer significant benefits such as reductions in bleeding, post-operative pain, recovery time, and length of hospital stay [3]. However, the advantages to the patient comes with a high cognitive load on the surgeon [4,5]. The high workload in MIS can significantly impact both surgical performance and jeopardize patient safety. While substantial efforts have been made to assess and evaluate surgeons’ workload, there remains limited evidence on how to effectively provide cognitive support [6,7,8,9]. The recent advancements in machine learning techniques have made it increasingly possible to bridge this gap.

As previously mentioned, the surgeon has to rely on a camera to visualize the surgical site; any disruptions to the camera will affect the visual orientation of operators, creating an environment for increased workload and operating errors [10]. Since MIS is predominantly a team-based task, the operator manipulates the surgical tools while the assistant holds the camera. It is common for the assistant to fall out of sync with the operator, leading to visual distortion [11,12]. One common type of distortions is horizontal line disruption caused by camera rotations; horizontal line disruptions cause confusion due to the mismatch between the orientation of the actual versus the perceived operative field [13,14]. Often, horizontal line disruptions are difficult to catch; unfortunately, they are typically recognized too late wherein the direction of the surgeon’s intended motion differs from that of the actual motion, causing damage to the internal organ structure [15].

Machine learning is a series of computational algorithms that takes raw input data and produces the desired output by mimicking the human cognitive process [16]. Convolutional neural networks (CNNs) are a type of machine learning algorithm that excels at image recognition. CNNs were designed to process visual inputs such as images and videos through various layers, including a convolutional layer, pooling layers, and fully connected layers. These layers allow for their excellent ability to detect and learn spatial hierarchies and edges [17]. In performance analysis, CNN significantly outperformed other deep learning models in the realm of image classification [17]. Other neural networks, like Long-Short Term Memory (LSTM), also have applications in image processing and healthcare. LSTM integrates time as a variable, allowing it to have a “memory”, thus allowing the sequential analysis of past data to predict the future [18]. However, because of the added variables, LSTM struggles to control the impact of noise in the data and may increase the amount of false predictions in complex environments [18]. Generative adversarial networks (GANs) have also been used in image classification and data synthesis, and it has two neural networks running in concert, consisting of a generator to generate synthetic data and a discriminator to classify data [19]. Despite the diverse applications, GAN suffers from the mode collapse problem where it is unable to analyze the data distribution as a whole but rather focuses on a small segment. Additionally, its complex systems slow down the processing speed, making it difficult to integrate to a fast pace surgical environment [20].

Currently, there is already some evidence showing the implementations of machine learning in MIS. Hajj et al. used CNN for instrument classification in cholecystectomy and achieved 97.9% accuracy on differentiating between different surgical tools [21]. Similarly, Laina et al. used a type of CNN in the binary segmentation of surgical data from colorectal surgery to classify surgical instrument, achieving an accuracy of 88.9% [22]. Other than tool recognition, machine learning algorithms can also recognize the phase of the surgery. By feeding laparoscopic cholecystectomy data to a CNN, the machine could accurately predict the phase of the surgery 93% of the time, which may help navigate novice surgeons during training [23]. In vitro experiments with a CNN model predicting tool trajectory using both image and kinematics data were able to achieve 70.6% accuracy, which could be used as a tool for precautionary warning before an intra-operative mistake [24]. Outside of image recognition, positional trackers placed on surgical camera can be used to determine its orientation [15]. Outside of image recognition, machine learning algorithms can also predict flow status, which has seen its application in blood flow rate during transfusions in the operating room [25]. Generalized additive models have been implemented to calculate post-operative risks like sepsis, acute kidney injury blot clots, etc., for at least 12 surgical specialties [26]. Linear Regression, Regression Trees, Support Vector Regression, and Bagging Regression Trees have all been used to estimate surgical time [27]. LSTM has also been used to predict correlation between length and resistance [28]. However, there is a current lack of comprehensive studies analyzing the ability of CNN for recognizing camera orientation during MIS.

We plan to construct a dataset comprising images from MIS with distinguishable features to train a CNN model capable of accurately identifying when the surgical camera is tilted. Based on previous literature on the application of CNNs in MIS, we anticipate that a trained CNN model will detect camera distortions with high accuracy in specific surgical contexts.

2. Materials and Methods

2.1. Environment

This study was conducted at the Surgical Simulation Research Lab at the University of Alberta. Ethics approval was obtained from the Health Research Ethics Board of the University of Alberta prior to the commencement of this study.

2.2. Data Selection and Image Processing

Twenty full-length, public, and minimally invasive nasal surgical videos totaling up to 1000 min were exported from YouTube (San Bruno, CA, USA). Nasal surgical videos were selected due to their consistent background and reference object within the nasal cavity. Selected videos were all above 480 p in quality with a frame rate above 30 frames per second and without major visual obstructions during the procedure.

The frames within the videos were then normalized to the same dimensions of 1080 px by 1080 px with a focus on the endoscope content using Da Vinci Resolve (Coral Spring, FL, USA). To determine at what degree of camera rotation is both noticeable and difficult for the operation, an expertise surgeon (YW) was shown multiple frames randomly sampled from the videos and determined that an angle deviation from the vertical line above 30° was considered noticeable and caused disorientation; additionally, according to Shepard et al., increasing the angle of rotation, in increments of 20°, results in an increase in reaction time due to the additional workload of mental rotation; an rotation of 30° yields a nearly 2 s reaction time, significantly affecting the surgical workflow [29]. Thus, angles of rotation above 30° were considered “tilted”, and frames below this threshold were considered “non-tilted”, with edge cases removed (Figure 1).

Subsequently, the videos were manually analyzed using binary classification, categorizing the camera angle in each frame as either “tilted” or “non-tilted”. Irrelevant segments of the videos, such as when the endoscope was temporarily removed from the nostril or when the camera was obstructed, were excluded. Each video was divided into segments containing only “tilted” or “non-tilted” frames, and frames of the same type were merged into two separate videos: one consisting entirely of “tilted” frames and the other of “non-tilted” frames (Figure 2).

To reduce redundancy from frames captured in close temporal proximity, the videos were processed at a frame rate of 25 frames per second, and images were sampled and labeled once per second. Each image was converted to the RGB format and augmented through random horizontal, vertical, or combined flips. The images were then resized to 224 × 224 pixels for standardization.

In total, 2116 samples were collected, with 497 frames labeled as “tilted” and 1619 frames labeled as “non-tilted”. This sufficiently large dataset ensures the robust training of the machine learning model and minimizes the risk of overfitting.

2.3. Model Selection and Training

For the purpose of this study on primarily classifying images, we used Residual Network 50 (ResNet50) as the CNN model for training; ResNet50 contains residual layers that eliminate the vanishing gradient problems, as it offers “skipping” between layers when performance does not improve and more direct backpropagation, allowing adjustments to the algorithm. Additionally, ResNet50 is a pre-trained model, which means that it already has features and basic image recognition abilities, setting the foundation for more fine-tune training. For those reasons, we believe that ResNet50 will comparatively boost the learning rate and image classification accuracy.

We split the dataset into 3 sets, each with a 1:4 ratio between tilted and non-tilted angles: training, validation, and test. A total of 1482 samples (70%) were in the training set, 424 (20%) were in the validation set, and 210 (10%) were in the test set (Table 1). The samples were randomly selected to each set while maintaining the desired ratio of both classification groups. The model was trained with ADAM as the optimizer with a static learning rate of 0.001 and CrossEntropyLoss as the loss function.

3. Results

3.1. Validation Accuracy, Validation Loss, and Statistical Analysis

Validation accuracy and loss were monitored and recorded after each epoch. After the first epoch of training, the training accuracy was around 84.0%, the validation accuracy was around 88.7%, and the validation loss was around 0.0704. We continued running the training for 10 epochs and monitored the training accuracy, validation accuracy, and validation loss. After every epoch, the training accuracy increased as expected; however, the validation accuracy began to plateau at around epochs 5 and 6, reaching an accuracy of 96.9% at epoch 6 with a validation loss 0.0242 before it began to decrease in the following epochs along with an increase in validation loss (Figure 3). The increase in validation loss signifies that the model is overfitting, and future training should be avoided. Overall, a net 8.25% improvement in validation accuracy was observed before a decline in performance. A one-sample t-test using the validation accuracy of 10 epochs showed that the validation accuracy is significantly different from random chance (50%) (p < 0.0001). The 95% confidence interval for the difference between the prediction accuracy and random chance is [0.364969, 0.447271].

3.2. Accuracy Parameters and Confusion Matrix

After training, the model accuracy on the test set is 96%, with an average loss of 0.0256. The positive predicted value (PPV) was 0.92, the true negative rate (TNR) was 0.97, and the true positive rate (TPR) was 0.96. The final F1 score was 0.94. There was no observed imbalance between predicting false positive (2.4%) and false negative (1.0%) (Figure 4). To ameliorate the effects of an imbalanced dataset, the Matthews Correlation Coefficient (MCC) was calculated to be 0.9168.

P P V = \frac{T P}{T P + F P}

T N R = \frac{T N}{F P + T N}

T P R = \frac{T P}{T P + F N}

F 1 S c o r e = \frac{2 \times T P}{2 \times T P + F P + F N}

M C C = \frac{T P \times T N - F P \times F N}{\sqrt (T P + F P) (T P + F N) (T N + F P) (T N + F N)}

where TP = true positive, TN = true negative, FP = false positive, FN = false negative.

4. Discussion

The ability for a trained algorithm to analyze and classify intraoperative camera rotation provides an additional avenue for CNN applications within surgery and adds a layer of security during the procedure. In this study, we explored the reliability of ResNet50 to capture high-level features in trans-nasal MIS and to accurately distinguish frames of camera tilt. The accuracy of the post-training algorithm validated our hypothesis that a CNN can recognize camera tilt in specific surgical settings.

4.1. Dataset

The dataset used in the experiment was generated to minimize several dataset imbalance problems, and several factors were considered, including class distribution, data density, and dataset shift [30]. In binary classification, experimental data have shown that a 1:10 ratio between the training dataset of the two classes is alarming and significantly detrimental to feature detection, and a ratio of 1:35 is inadequate for building a model [30]. In cases where the samples between classes are imbalanced, the algorithm will learn to only predict the high frequency class as it yields the highest accuracy, thus losing the purpose of a classification algorithm. In this study, we collected enough samples for both classes with a 1:4 ratio, ensuring the validity of the training data. Additionally, to minimize the effect of the class imbalance, we generated a large enough dataset (N > 1000) to allow for rare features to be detected during training; experimentally, as the size of the training set increases, the error rate from the sample imbalance decreases [31]. Dataset shift is when the proportion of samples in each class differs in the training versus test set, which will cause minor features to not be learned [32]. To minimize the shift problem, we used the same proportions of each set of class sample when dividing the data into the training, validation, and test sets.

4.2. Training, Validation, and Test

During training, we tested for both training and validation accuracy, as well as validation loss. The validation parameters were analyzed to monitor model training status, primarily, to avoid overfitting. Overfitting occurs when the algorithm picks up information that is irrelevant to the classification and negatively influence its performance on unseen data [33]. In our study, we suspected the model to be overfitted after the sixth epoch, where the validation loss reaches its minima.

The accuracy during the classification of the test set was 96% overall. To analyze the ability of the model in predicting each class, we analyzed the true positive rate and true negative rate and found no significant difference between the two values, indicating that there was no bias in the predictions. The PPV, or precision, was measured to be 92%, indicating that, out of all the “tilted” guesses, 92% were true positives and 8% were false positives. The TPR, or recall, was 96%, which means that, out of 56 total true positives, the algorithm was able to correctly identify 54 of them. Combining both precision and recall, we can see that, for each “tilted” guess the algorithm provides, it has a 92% chance of being correct, and it was able to catch 96% of all the “tilted” moments. The TNR, or specificity, was measured as 97%, which means that, out of all the non-tilted moments, the algorithm was able to catch 97% of them. To prevent the algorithm from “cheating” and only guessing the high frequency class, the F1 score and MCC was measured to summarize the performance of the algorithm and assess for biases. The F1 score was measured as 0.94, and the MCC was 0.9168, which indicates that the model’s performance is well balanced despite dataset imbalance. When combining AI with medicine, the primary focus has always been to avoid false negatives, as misdiagnosis can lead to delayed treatment [34]. However, in the surgical environment, any sound or alert can also cause the deflection of attention from the surgeon, leading to poorer outcomes [35]. Thus, we believe that, in the case of MIS, having a high false positive rate is also detrimental, highlighting the importance of a balanced false positive rate and false negative rate. Trans-nasal procedures usually have a relatively stable surgical environment under the camera; however, there are some preoperative and intraoperative variability. For example, depending on the type of procedure, location of the surgical site, and the surgeon’s preference, the operation could be at different nostrils. In trans-sphenoidal pituitary tumor surgery, the infra-nasal portion of the procedure will depend on the location of the tumor [36]. However, we expect that our model will have a stable performance in either nostril as the training set contains normalized images vertically flipped to ensure that there are representations for both the left and the right nostril to eliminate bias.

4.3. Limitations

This study has a few limitations. In the data collection process, we used open-source videos from YouTube, which has certain biases for promotion or demonstration purposes. Thus, less camera tilt will be shown within the video, contributing to dataset imbalance. Secondly, during the training, the algorithm only learned 10 epochs. Increasing the number of epochs could yield a better model, although it would risk overfitting. Lastly, due to the limitation of existing pre-trained algorithms, a stable, recognizable background has to be maintained for learning and classification to occur; thus, the algorithm can only recognize camera tilt in the trans-nasal setting, reducing its generalizability to all types of surgeries. Additionally, this model lacks training for minor rotations; thus, it would be less sensitive to minor camera rotations. However, while camera rotations are dangerous in MIS, smaller rotations are generally less disruptive to the surgical workflow. Our model aims to identify critical moments during the procedure when unintended camera rotations cause disorientation, making it more sensitive to angle tilts above 30°. Increasing its sensitivity to minor tilts could introduce a level of distraction that might negatively affect the surgical flow more than the rotations themselves.

4.4. Future Directions

This study provides a solid foundation for a more comprehensive algorithm that can be generalized to all types of MIS; moreover, future algorithms can be integrated within the operating room to alert surgeons when the horizontal line is disrupted, preventing mistakes and leading to better patient outcomes. Additionally, with the automatic detection of camera tilt, the automatic correction algorithms can be developed to allow seamless surgical flow without alerting the surgeon.

5. Conclusions

Machine learning offers a valuable approach to supporting surgeons. In trans-nasal surgery, camera positioning plays a critical role in proprioception and orientation within the surgical field. In this study, we utilized open-source surgical videos to create a dataset and trained a ResNet50 model to classify moments of camera rotation. Our results demonstrate that the ResNet50 model is effective for the automated detection of camera rotation in trans-nasal procedures. Further refinement of this method will be essential to minimize false positives and prevent unnecessary distractions during surgery, ultimately enhancing its clinical utility.

Author Contributions

Conceptualization: Z.S.Z. and B.Z.; Methodology: Z.S.Z., Y.W. and B.Z.; Formal analysis and investigation: Z.S.Z., Y.W. and B.Z.; Writing—original draft preparation: Z.S.Z.; Writing—review and editing: Z.S.Z., Y.W. and B.Z.; Funding acquisition: Z.S.Z. and B.Z.; Resources: Z.S.Z., Y.W. and B.Z.; Supervision: Y.W. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Yun Wu acknowledges the financial support from the China Scholarship Council (Grant No. 202106370009) and Alberta Innovate Graduate Student Scholarship. Zhong Shi Zhang acknowledges the financial support from the NSERC Undergraduate Student Research Award (USRA) from the Faculty of Medicine and Dentistry of the University of Alberta.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available upon reviewer request.

Acknowledgments

The authors would like to acknowledge the members of the Surgical Simulation Research Lab for their continued support in the data collection, analysis, and drafting of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fuchs, K.H. Minimally Invasive Surgery. Endoscopy 2002, 34, 154–159. [Google Scholar] [CrossRef] [PubMed]
Shah, P.C.; De Groot, A.; Cerfolio, R.; Huang, W.C.; Huang, K.; Song, C.; Li, Y.; Kreaden, U.; Oh, D.S. Impact of type of minimally invasive approach on open conversions across ten common procedures in different specialties. Surg. Endosc. 2022, 36, 6067–6075. [Google Scholar] [CrossRef]
Darzi, A.; Munz, Y. The Impact of Minimally Invasive Surgical Techniques. Annu. Rev. Med. 2004, 55, 223–237. [Google Scholar] [CrossRef]
Berguer, R.; Smith, W.D.; Chung, Y.H. Performing laparoscopic surgery is significantly more stressful for the surgeon than open surgery. Surg. Endosc. 2001, 15, 1204–1207. [Google Scholar] [CrossRef] [PubMed]
Zheng, B.; Cassera, M.A.; Martinec, D.V.; Spaun, G.O.; Swanström, L.L. Measuring mental workload during the performance of advanced laparoscopic tasks. Surg. Endosc. 2010, 24, 45–50. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.S.; Wu, Y.; Zheng, B. A Review of Cognitive Support Systems in the Operating Room. Surg. Innov. 2024, 31, 111–122. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Y.; Zheng, B. Workload Assessment of Operators: Correlation Between NASA-TLX and Pupillary Responses. Appl. Sci. 2024, 14, 11975. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Z.; Aghazadeh, F.; Zheng, B. Early Eye Disengagement Is Regulated by Task Complexity and Task Repetition in Visual Tracking Task. Sensors 2024, 24, 2984. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Z.; Zhang, Y.; Zheng, B.; Aghazadeh, F. Pupil Response in Visual Tracking Tasks: The Impacts of Task Load, Familiarity, and Gaze Position. Sensors 2024, 24, 2545. [Google Scholar] [CrossRef] [PubMed]
Nema, S.; Mathur, A.; Vachhani, L. Plug-in for visualizing 3D tool tracking from videos of Minimally Invasive Surgeries. arXiv 2024, arXiv:2401.09472. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, Y.; Li, X.; Turner, S.R.; Zheng, B. Increased team familiarity for surgical time savings: Effective primarily in complex surgical cases. Surgeon 2024, 22, 80–87. [Google Scholar] [CrossRef] [PubMed]
Hashemi, G.; Zhang, Y.; Wu, Y.; He, W.; Sun, L.; Lee, H.; Wilson-Keates, B.; Zheng, B. Perioperative inter-professional education training enhance team performance and readiness. Clin. Simul. Nurs. 2024, 97, 101655. [Google Scholar] [CrossRef]
Wentink, M.; Breedveld, P.; Meijer, D.W.; Stassen, H.G. Endoscopic camera rotation: A conceptual solution to improve hand-eye coordination in minimally-invasive surgery. Minim. Invasive Ther. Allied Technol. 2000, 9, 125–131. [Google Scholar] [CrossRef]
Swanstrom, L.; Zheng, B. Spatial Orientation and Off-Axis Challenges for NOTES. Gastrointest. Endosc. Clin. N. Am. 2008, 18, 315–324. [Google Scholar] [CrossRef] [PubMed]
Abdelaal, A.E.; Hong, N.; Avinash, A.; Budihal, D.; Sakr, M.; Hager, G.D.; Salcudean, S.E. Orientation Matters: 6-DoF Autonomous Camera Movement for Minimally Invasive Surgery. arXiv 2020, arXiv:2012.02836. [Google Scholar] [CrossRef]
El Naqa, I. Machine Learning in Radiation Oncology: Theory and Applications; Springer International Publishing AG: Cham, Switzerland, 2015. [Google Scholar]
Sharma, S.; Guleria, K. Deep Learning Models for Image Classification: Comparison and Applications. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; pp. 1733–1738. [Google Scholar] [CrossRef]
Ackerson, J.; Dave, R.; Seliya, N. Applications of Recurrent Neural Network for Biometric Authentication & Anomaly Detection. Information 2021, 12, 272. [Google Scholar] [CrossRef]
Aggarwal, A.; Mittal, M.; Battineni, G. Generative adversarial network: An overview of theory and applications. Int. J. Inf. Manag. Data Insights 2021, 1, 100004. [Google Scholar] [CrossRef]
Saxena, D.; Cao, J. Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions. ACM Comput. Surv. 2022, 54, 1–42. [Google Scholar] [CrossRef]
Al Hajj, H.; Lamard, M.; Conze, P.-H.; Cochener, B.; Quellec, G. Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks. Med. Image Anal. 2018, 47, 203–218. [Google Scholar] [CrossRef]
Laina, I.; Rieke, N.; Rupprecht, C.; Vizcaíno, J.P.; Eslami, A.; Tombari, F.; Navab, N. Concurrent Segmentation and Localization for Tracking of Surgical Instruments. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, 11–13 September 2017. [Google Scholar] [CrossRef]
Kurian, E.; Kizhakethottam, J.J.; Mathew, J. Deep learning based Surgical Workflow Recognition from Laparoscopic Videos. In Proceedings of the 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 928–931. [Google Scholar] [CrossRef]
Zhao, H.; Xie, J.; Shao, Z.; Qu, Y.; Guan, Y.; Tan, J. A Fast Unsupervised Approach for Multi-Modality Surgical Trajectory Segmentation. IEEE Access 2018, 6, 56411–56422. [Google Scholar] [CrossRef]
Peng, Y.; Yang, X.; Li, D.; Ma, Z.; Liu, Z.; Bai, X.; Mao, Z. Predicting flow status of a flexible rectifier using cognitive computing. Expert Syst. Appl. 2025, 264, 125878. [Google Scholar] [CrossRef]
Bihorac, A.; Ozrazgat-Baslanti, T.; Ebadi, A.; Motaei, A.; Madkour, M.; Pardalos, P.M.; Lipori, G.; Hogan, W.R.; Efron, P.A.; Moore, F.; et al. MySurgeryRisk: Development and Validation of a Machine-learning Risk Algorithm for Major Complications and Death After Surgery. Ann. Surg. 2019, 269, 652–662. [Google Scholar] [CrossRef]
Martinez, O.; Martinez, C.; Parra, C.A.; Rugeles, S.; Suarez, D.R. Machine learning for surgical time prediction. Comput. Methods Programs Biomed. 2021, 208, 106220. [Google Scholar] [CrossRef]
Mao, Z.; Kobayashi, R.; Nabae, H.; Suzumori, K. Multimodal Strain Sensing System for Shape Recognition of Tensegrity Structures by Combining Traditional Regression and Deep Learning Approaches. IEEE Robot. Autom. Lett. 2024, 9, 10050–10056. [Google Scholar] [CrossRef]
Shepard, R.N.; Metzler, J. Mental Rotation of Three-Dimensional Objects. Science 1971, 171, 701–703. [Google Scholar] [CrossRef]
Sun, Y.; Wong, A.K.C.; Kamel, M.S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
Alaiz-Rodríguez, R.; Japkowicz, N. Assessing the Impact of Changing Environments on Classifier Performance. In Advances in Artificial Intelligence; Bergler, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 13–24. [Google Scholar] [CrossRef]
Salman, S.; Liu, X. Overfitting Mechanism and Avoidance in Deep Neural Networks. arXiv 2019, arXiv:1901.06566. [Google Scholar] [CrossRef]
Olliaro, P.; Torreele, E. Managing the risks of making the wrong diagnosis: First, do no harm. Int. J. Infect. Dis. 2021, 106, 382–385. [Google Scholar] [CrossRef]
Mentis, H.M.; Chellali, A.; Manser, K.; Cao, C.G.L.; Schwaitzberg, S.D. A systematic review of the effect of distraction on surgeon performance: Directions for operating room policy and surgical training. Surg. Endosc. 2016, 30, 1713–1724. [Google Scholar] [CrossRef]
Kanavel, A.B. The removal of tumors of the pituitary body by an infranasal route: A proposed operation with a description of the technic. J. Am. Med. Assoc. 1909, 53, 1704. [Google Scholar] [CrossRef]

Figure 1. Examples of “tilted” versus “non-tilted” data frames. Green line is the normal axis, and red line is the degrees of tilt. Right is labeled “non-tilted”; left is labeled “tilted”.

Figure 2. Flowchart of the data collecting and processing phase.

Figure 3. Validation output assessment metric. Left is epoch versus validation accuracy. Right is epoch versus validation loss.

Figure 4. Confusion matrix generated from the test set assessing the potency of the trained model.

Table 1. Number of video frames used for train set (70%); validation set (20%); test set (10%).

	Non-Tilt, 0	Tilt, 1
Train Set	1146	336
Validation Set	319	105
Test Set	154	56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.S.; Wu, Y.; Zheng, B. Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm. Information 2025, 16, 303. https://doi.org/10.3390/info16040303

AMA Style

Zhang ZS, Wu Y, Zheng B. Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm. Information. 2025; 16(4):303. https://doi.org/10.3390/info16040303

Chicago/Turabian Style

Zhang, Zhong Shi, Yun Wu, and Bin Zheng. 2025. "Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm" Information 16, no. 4: 303. https://doi.org/10.3390/info16040303

APA Style

Zhang, Z. S., Wu, Y., & Zheng, B. (2025). Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm. Information, 16(4), 303. https://doi.org/10.3390/info16040303

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Detection of Camera Rotation Moments in Trans-Nasal Minimally Invasive Surgery Using Machine Learning Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. Environment

2.2. Data Selection and Image Processing

2.3. Model Selection and Training

3. Results

3.1. Validation Accuracy, Validation Loss, and Statistical Analysis

3.2. Accuracy Parameters and Confusion Matrix

4. Discussion

4.1. Dataset

4.2. Training, Validation, and Test

4.3. Limitations

4.4. Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI