China’s distribution networks predominantly operate either in a neutral non-grounding mode or through arc suppression coil grounding [
1], among which approximately 80% of faults are classified as single-phase-to-ground faults (SPGFs) [
2]. If not addressed in a timely and effective manner, such faults may compromise the stability of the entire power system and pose significant safety hazards to personnel [
3]. However, existing fault line selection methods are often inadequate for practical deployment [
4]. Therefore, the development of accurate and reliable fault line selection methods for distribution networks has become a critical research focus.
According to the nature of the characteristic quantities employed, fault line selection methods can be categorized into three main types [
5]: (i) transient characteristic methods, (ii) steady-state characteristic methods, and (iii) injection-based methods. Steady-state characteristic methods typically identify fault lines by analyzing variations in steady-state zero-sequence voltage and admittance characteristics. However, the integration of distributed energy resources and changes in network topology have diminished the distinguishability of steady-state features. Additionally, the presence of arc suppression coils can significantly influence diagnostic results [
6]. Transient characteristic methods utilize abrupt changes in transient signals, such as zero-sequence current and residual voltage, that occur at the instant of fault inception [
7]. While these methods can be effective, their performance degrades in high-resistance grounding scenarios. Injection-based methods, which require additional equipment, also struggle under high-resistance conditions and hence have limited practical applicability [
8]. Reference [
9] proposes a fault line selection method based on one-dimensional convolutional neural networks (1DCNN) and bidirectional long short-term memory networks (BiLSTM). This approach constructs a sequence fusion feature vector from transient zero-sequence currents across multiple lines, normalizes the data, and extracts local features using 1DCNN. BiLSTM is then employed to learn contextual dependencies, with a final SoftMax layer used for classification. Despite its advantages, this method incurs high computational complexity when processing long time-series data. Reference [
10] introduces a CNN-based model enhanced by an attention mechanism. Initially, the S-transform is used to convert time-series zero-sequence currents into two-dimensional matrices compatible with CNN input. Attention layers are integrated to improve the accuracy and robustness of classification. However, the method exhibits limited generalization performance on small datasets. Reference [
11] proposes a traveling wave-based fault identification approach that acquires the zero-mode current traveling waves from all feeders in the distribution system. By determining reference lines and applying cross-wavelet transform, the method constructs several time–frequency sets to isolate the fault-related information. However, its applicability is constrained in complex network topologies. With the rapid advancement of artificial intelligence, deep learning-based methods have become a promising direction for fault diagnosis in distribution systems. Reference [
12] employs a time–frequency matrix as input to a ResNet model for fault line identification. Reference [
13] presents a fault line selection method using the Hausdorff distance of transient currents. It extracts the 5th and 7th order components from zero-sequence currents to distinguish faulty and healthy lines, enabling accurate identification. This method, however, is highly sensitive to noise. References [
14,
15] directly input characteristic diagrams of zero-sequence current signals into deep learning models but neglect the global temporal characteristics of the signals, limiting their performance in complex scenarios. With the success of Transformer architectures in time-series modeling, advanced structures such as Informer and Temporal Fusion Transformer (TFT) have demonstrated superior long-range dependency modeling capabilities in power system data analysis [
16]. Simultaneously, Vision Transformer (ViT) and its variants have been employed in time–frequency image modeling, significantly improving global feature extraction from images. Furthermore, cross-modal attention mechanisms have shown promising robustness and generalization performance in multimodal data fusion tasks [
17].
To overcome the aforementioned limitations, this paper introduces a fault line selection method based on a multimodal feature fusion strategy. First, a hybrid time–frequency analysis is performed by combining the Short-Time Fourier Transform (STFT) and Wigner–Ville Distribution (WVD) to generate comprehensive time–frequency representations. A dual-branch architecture is then devised to concurrently extract temporal features from zero-sequence current signals and spatial features from the derived time–frequency images. The image branch adopts RepLKNet to capture cross-regional structural features, while the time-series branch employs a BiGRU network enhanced with Global Attention to model temporal dependencies. Finally, a multimodal feature fusion network integrates both modalities to perform fault identification. Simulation results demonstrate that the proposed method outperforms traditional single-modal models in terms of accuracy, robustness, and generalization.