Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition
Abstract
1. Introduction
- We propose a novel Incremental Spatio-temporal Augmented Sampling (ISAS) framework for robust power grid operation behavior recognition under dynamic environments.
- We design a Feature-Enhancement Fusion Module (FEFM) that incorporates multi-scale spatio-temporal augmentation and cross-scale aggregation to enhance feature robustness.
- We introduce a Selective Replay Mechanism (SRM) with a dual-criteria strategy to optimize memory sample selection, mitigating catastrophic forgetting.
- We validate the proposed method on a real-world power grid behavior dataset, demonstrating superior performance across multiple meteorological scenarios.
2. Methods
2.1. Spatio-Temporal Feature Extraction Module
2.2. Spatio-Temporal Feature-Enhancement Fusion Module
2.3. Selective Replay Mechanism
2.4. Model Optimization
3. Experimental Results
3.1. Dataset and Evaluation Metrics
3.2. Implementation Details
3.3. Comparative Results and Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Meng, L.; Ban, G.; Liu, F.; Qiu, W.; He, D.; Zhang, L.; Wang, S. Classification of Violations in Power Grid Operations Based on Cross-domain Few-Shot Learning. Power Big Data 2024, 27, 69–76. [Google Scholar] [CrossRef]
- Wang, J.; Sun, L.; Du, N.; Hua, C. The Functions and Applications of Live-line Operation Robots in Distribution Networks. Electr. Power Energy 2024, 45, 518–520. [Google Scholar]
- Sun, Y.; Liu, Y.; Han, Y. Research on Safety Supervision Technology for Power Grid Operators Based on Wearable Sensors and Video Surveillance. Electr. Age 2018, 45–46. Available online: https://kns.cnki.net/kcms2/article/abstract?v=IMWkopLkOPXW_5EjYSgpPWEUdRHZwPWVzhcuJ7FYvdI8SR78Ll29rMUC1ZHuUAPyyqc79swY2xZuvZpF2naVE8tXTab6WnTpx9MGrZefCCe5Vx3J4M4Q6z_DwjWa1V7ma2nBkq3qgm6f1D8ze77fKwz6qUWZsslANIJAoVLLWfI=&uniplatform=NZKPT&language=CHS (accessed on 1 August 2025).
- Wu, T. Research on On-Site Safety Management of State Grid Xiaogan Power Supply Company. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2022. [Google Scholar] [CrossRef]
- Cen, J.; Weng, Z.; Lin, T.; Li, G.; Yang, L. Detection Method of Violations in Power Grid Operation Sites Based on Machine Vision. Electr. Technol. Econ. 2025, 307–308+315. Available online: https://kns.cnki.net/kcms2/article/abstract?v=IMWkopLkOPUkJXTJIt04G0SE_o9sx1w_bHTk0Q2z8R4E4HP90PMFTFjWpJJfgH0Ij7X01EkIQcfNTT8E8nGwsg4MnJkKjeaB39OHpi-5TTJEMEQtgooYaMni3ecwZB5QiDdGlH4RNj4DOIrekrXzcCuPu0WPvMueen7_ES0L2Yooytgzee_wEPzTFI4Oiwiz&uniplatform=NZKPT&language=CHS (accessed on 1 August 2025).
- Xing, Z.; Dai, Q.; Hu, H.; Chen, J.; Wu, Z.; Jiang, Y.G. Svformer: Semi-supervised video transformer for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18816–18826. [Google Scholar]
- Xiao, C.; Lei, Y.; Liu, C.; Wu, J. Mean teacher-based cross-domain activity recognition using WiFi signals. IEEE Internet Things J. 2023, 10, 12787–12797. [Google Scholar] [CrossRef]
- Ban, G.; Fu, L.; Jiang, L.; Du, H.; Li, A.; He, Y.; Zhou, J. Dynamic Risk Identification of Personnel Behavior in Two-stage Complex Operations Based on Image Screening. Power Big Data 2024, 27, 58–69. [Google Scholar] [CrossRef]
- Song, X.; Yao, X. Human Behavior Recognition Based on Multi-Descriptor Feature Coding. Comput. Technol. Dev. 2018, 28, 17–21. [Google Scholar]
- Shi, A.; Cheng, Y.; Cao, X. Human Behavior Recognition Method Combining Codebook Optimization and Feature Fusion. Comput. Technol. Dev. 2018, 28, 107–111. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Feichtenhofer, C. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 203–213. [Google Scholar]
- Tian, Q.; Miao, W.; Zhang, L.; Yang, Z.; Yu, Y.; Zhao, Y.; Yao, L. STCA: An action recognition network with spatio-temporal convolution and attention. Int. J. Multimed. Inf. Retr. 2025, 14, 1. [Google Scholar] [CrossRef]
- Lin, J.; Gan, C.; Han, S. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 7083–7093. [Google Scholar]
- Zhao, C.; Feng, X.; Cao, R. Video Behavior Recognition Based on Spatio-Temporal Dual-Stream Feature Enhancement Network. Comput. Eng. Des. 2025, 46, 871–878. [Google Scholar] [CrossRef]
- Chen, D.; Chen, M.; Wu, P.; Wu, M.; Zhang, T.; Li, C. Two-stream spatio-temporal GCN-transformer networks for skeleton-based action recognition. Sci. Rep. 2025, 15, 4982. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Zhou, X.; Zhang, S.; Liang, G.; Xing, Y.; Cheng, D.; Zhang, Y. Review of Continuous Learning Methods Based on Pre-Training. Comput. Eng. 2025, 1–17. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Zhang, C.L.; Wu, J.; Li, Y. Actionformer: Localizing moments of actions with transformers. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 492–510. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, Montreal, QC, Canada, 8–13 December 2014; pp. 568–576. [Google Scholar]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
- Shi, D.; Zhong, Y.; Cao, Q.; Ma, L.; Li, J.; Tao, D. Tridet: Temporal action detection with relative boundary modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18857–18866. [Google Scholar]
- Jiang, Y.; Liu, J.; Roshan Zamir, A.; Toderici, G.; Laptev, I.; Shah, M.; Sukthankar, R. THUMOS’14: ECCV Workshop on Action Recognition with a Large Number of Classes. 2014. Available online: http://crcv.ucf.edu/THUMOS14/ (accessed on 1 August 2025).
- Shao, J.; Wang, X.; Quan, R.; Zheng, J.; Yang, J.; Yang, Y. Action sensitivity learning for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 13457–13469. [Google Scholar]
- Cheng, F.; Bertasius, G. Tallformer: Temporal action localization with a long-memory transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 503–521. [Google Scholar]
- Liu, X.; Wang, Q.; Hu, Y.; Tang, X.; Zhang, S.; Bai, S.; Bai, X. End-to-end temporal action detection with transformer. IEEE Trans. Image Process. 2022, 31, 5427–5441. [Google Scholar] [CrossRef] [PubMed]
Data | Number of Videos | Avg. Video Duration | Avg. Action Duration |
---|---|---|---|
Training Set | 3319 | 805.9 s | 57.4 s |
Validation Set | 700 | 876.2 s | 56.3 s |
Test Set | 697 | 822.5 s | 50.5 s |
Method | Scenario | Climbing (%) | Verification (%) | Grounding (%) | mAP (%) | Inter-Scenario mAP Diff. (%) |
---|---|---|---|---|---|---|
TS-CNN [23] | Sunny | 85.66 | 81.63 | 80.06 | 82.45 | 8.52 |
Cloudy | 80.75 | 77.79 | 76.68 | 78.41 | ||
Rainy | 76.84 | 73.86 | 71.08 | 73.93 | ||
EWC [24] | Sunny | 88.24 | 83.82 | 84.78 | 85.61 | 7.06 |
Cloudy | 84.87 | 79.68 | 81.25 | 81.93 | ||
Rainy | 81.58 | 76.37 | 77.69 | 78.55 | ||
TriDet [25] | Sunny | 89.74 | 85.63 | 84.81 | 86.73 | 5.28 |
Cloudy | 87.15 | 83.98 | 81.63 | 84.25 | ||
Rainy | 84.03 | 81.76 | 78.57 | 81.45 | ||
SVFormer [6] | Sunny | 90.12 | 86.75 | 85.26 | 87.38 | 4.92 |
Cloudy | 87.43 | 85.01 | 82.58 | 85.01 | ||
Rainy | 84.65 | 83.26 | 79.47 | 82.46 | ||
Ours | Sunny | 93.67 | 89.59 | 86.13 | 89.80 | 2.74 |
Cloudy | 91.89 | 88.32 | 85.11 | 88.44 | ||
Rainy | 90.68 | 86.87 | 83.64 | 87.06 |
Method | Sunny mAP | Cloudy mAP | Rainy mAP | Avg. mAP |
---|---|---|---|---|
Baseline | 83.67 | 79.13 | 74.25 | 79.02 |
+FEFM | 85.19 | 83.35 | 78.62 | 82.39 |
+SRM | 86.76 | 85.27 | 83.79 | 85.27 |
Ours | 89.80 | 88.44 | 87.06 | 88.43 |
Ours * | 88.15 | 86.78 | 85.64 | 86.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Meng, L.; He, D.; Ban, G.; Guo, S. Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition. Electronics 2025, 14, 3579. https://doi.org/10.3390/electronics14183579
Meng L, He D, Ban G, Guo S. Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition. Electronics. 2025; 14(18):3579. https://doi.org/10.3390/electronics14183579
Chicago/Turabian StyleMeng, Lingwen, Di He, Guobang Ban, and Siqi Guo. 2025. "Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition" Electronics 14, no. 18: 3579. https://doi.org/10.3390/electronics14183579
APA StyleMeng, L., He, D., Ban, G., & Guo, S. (2025). Incremental Spatio-Temporal Augmented Sampling for Power Grid Operation Behavior Recognition. Electronics, 14(18), 3579. https://doi.org/10.3390/electronics14183579