**1. Introduction**

A common biological phenomenon in nature, heterosis serves as an efficient agricultural approach for increasing crop yield. Utilization of heterosis in rice and corn could increase crop yield by 15% to 50% [1,2]. Research on utilization of soybean heterosis started late and progressed slowly. The discovery of soybean cytoplasmic male sterility (CMS) laid a foundation for the utilization of soybean heterosis [3,4]. At present, a number of research institutions in China had realized a three-line support system for hybrid soybean production [5,6].

**Citation:** Bai, Z.; Ding, X.; Zhang, R.; Yang, Y.; Wei, B.; Yang, S.; Gai, J. Transcriptome Analysis Reveals the Genes Related to Pollen Abortion in a Cytoplasmic Male-Sterile Soybean (*Glycine max* (L.) Merr.). *Int. J. Mol. Sci.* **2022**, *23*, 12227. https://doi.org/ 10.3390/ijms232012227

Academic Editors: Jian Zhang and Zhiyong Li

Received: 22 September 2022 Accepted: 11 October 2022 Published: 13 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Plant male sterility refers to the phenomenon that plants cannot produce functional pollen. Plant male sterility could be used not only as an important tool for heterosis utilization [7] but also as an ideal material for studying plant reproductive development [8,9]. Research indicated that male sterility was an extremely complex process, with diverse abortion forms and degrees [10,11]. Male sterility in plants was caused mainly by abnormal function of genes in nucleus or cytoplasm involving in pollen development, while toxic proteins, insufficient energy supply, abnormal programmed cell death (PCD), and other factors might lead to abnormal plant fertility [12–16]. In view of the complexity of male sterility in plants, it was very difficult to analyze the genetic mechanism from the perspective of individual genes by conventional methods. The transcriptome refers to the sum of all RNA transcribed by a specific cell under a certain functional state, and it thus can provide information on gene expression, gene regulation, and amino acid content [17]. The study of transcriptomics could screen and find the target genes regulating biological traits, infer the function of corresponding unknown genes, and reveal the action and molecular mechanism of genes in biological processes, which had been widely used in the study of plant male sterility [18–20]. However, there were still few reports on transcriptomics between soybean cytoplasmic male sterile lines and maintainer lines. Soybean cytoplasmic male sterile line NJCMS1A had been studied, but its sequencing library construction only involved nuclear genome [21]. The underlying molecular mechanism of CMS and the genes related to pollen abortion in soybean remains unclear.

The soybean CMS line SXCMS5A is a new male-sterile line successfully transferred from the variety JY20 with H3A cytoplasm. In the present study, we performed transcriptomic analyses of SXCMS5A vs. SXCMS5B, combined with quantitative real time PCR (qRT-PCR) analysis, cyto-morphological characteristic and enzyme activity assay, and substance content analysis in order to reveal the male sterility mechanism of SXCMS5A. We aimed to identify differences between CMS line SXCMS5A and its maintainer SXCMS5B at the transcriptional level and to find important differentially expressed genes (DEGs) and metabolic pathways related to pollen abortion. These findings might contribute to greater understanding of the molecular mechanism underlying CMS and provide useful information to facilitate progress in hybrid breeding in soybean.

#### **2. Results**

### *2.1. Comparison of the Cyto-Morphological Characteristics between Soybean CMS Line SXCMS5A and Its Maintainer SXCMS5B*

In order to describe the cyto-morphological characteristics of pollen abortion of soybean CMS line SXCMS5A, the flower buds of SXCMS5A and SXCMS5B were observed and compared by paraffin sections. As shown in Figure 1A, at the tetrad stage, the tapetum cells of SXCMS5A were closely arranged, vacuolated, expanded inward, and tended to squeeze microspores. Subsequently, the tapetum was gradually broken and disintegrated, and clear contours and disintegrated fragments could be observed (Figure 1B,C). After that, the diaphragm between the two chambers did not open. And there were signs of vacuolization (Figure 1D). The pollen grains were abnormally developed and could not be stained by I2-KI (Figure 1E). In contrast, the tapetum of SXCMS5B normally initiated PCD (Figure 1F). Subsequently, the tapetum continued to degrade (Figure 1G). After that, microspores gradually developed and matured, the diaphragm between the two chambers opened normally (Figure 1H,I). The pollen grains developed normally and could be stained by I2-KI (Figure 1J). It was speculated that the tapetum of soybean CMS line SXCMS5A was vacuolated and abnormally developed, which could not provide necessary nutrients for microspore development, resulting in abnormal pollen development.

**Figure 1.** Microscopic observations of anthers from the soybean cytoplasmic male sterility (CMS) line SXCMS5A and its maintainer SXCMS5B. (**A**–**D**) Transverse sections of sterile anthers; abnormal tapetum and abnormal anthers developed in SXCMS5A. (**E**) Mature pollen grains stained by I2-KI in SXCMS5A. (**F**–**I**) Transverse sections of fertile anthers; normal tapetum and normal anthers developed in SXCMS5B. (**J**) Mature pollen grains stained by I2-KI in SXCMS5B. MSP, microspore; T, tapetum; PG, pollen grain; Bars = 20 μm. **Figure 1.** Microscopic observations of anthers from the soybean cytoplasmic male sterility (CMS) line SXCMS5A and its maintainer SXCMS5B. (**A**–**D**) Transverse sections of sterile anthers; abnormal tapetum and abnormal anthers developed in SXCMS5A. (**E**) Mature pollen grains stained by I<sup>2</sup> -KI in SXCMS5A. (**F**–**I**) Transverse sections of fertile anthers; normal tapetum and normal anthers developed in SXCMS5B. (**J**) Mature pollen grains stained by I<sup>2</sup> -KI in SXCMS5B. MSP, microspore; T, tapetum; PG, pollen grain; Bars = 20 µm.

#### *2.2. Transcriptome Sequencing, Sequence Alignment and Quality Evaluation 2.2. Transcriptome Sequencing, Sequence Alignment and Quality Evaluation*

To further understand the molecular mechanism of CMS in soybean, RNA sequencing (RNA-Seq) analysis of flower buds of SXCMS5A and SXCMS5B was conducted using Illumina technology. As shown in Table S1, 58.97 Gb clean data and 393,126,454 clean reads were obtained from 6 samples. After strict filtering of the original data, 54.15 GB high quality clean data and 372,973,796 high quality clean reads were obtained. The average percentages of Q20 base, Q30 base, and GC content of all samples were 97.52%, 93.37% and 43.93%, respectively, indicating that the sequencing data was high quality to meet the standards for subsequent gene function analysis. Next, Tophat2 software was used to compare the filtered ribosomal reads to the reference genome. A total of 331,374,194 mapped reads were obtained, with an average matching rate of 89.98%. Pearson correlation coefficient analysis revealed that the correlation coefficients (R2) value between samples was greater than 0.97, showing that the expression mode of SXCMS5A was very close to SXCMS5B (Figure S1). To further understand the molecular mechanism of CMS in soybean, RNA sequencing (RNA-Seq) analysis of flower buds of SXCMS5A and SXCMS5B was conducted using Illumina technology. As shown in Table S1, 58.97 Gb clean data and 393,126,454 clean reads were obtained from 6 samples. After strict filtering of the original data, 54.15 GB high quality clean data and 372,973,796 high quality clean reads were obtained. The average percentages of Q20 base, Q30 base, and GC content of all samples were 97.52%, 93.37% and 43.93%, respectively, indicating that the sequencing data was high quality to meet the standards for subsequent gene function analysis. Next, Tophat2 software was used to compare the filtered ribosomal reads to the reference genome. A total of 331,374,194 mapped reads were obtained, with an average matching rate of 89.98%. Pearson correlation coefficient analysis revealed that the correlation coefficients (R<sup>2</sup> ) value between samples was greater than 0.97, showing that the expression mode of SXCMS5A was very close to SXCMS5B (Figure S1).

#### *2.3. Identification and Confirmation of DEGs*

*2.3. Identification and Confirmation of DEGs*  To identify putative DEGs between SXCMS5A and SXCMS5B, the thresholds of "False Discovery Rate (FDR) < 0.05 and |log2 Fold Change (FC)| > 1" was used to screen for DEGs. There were 840 DEGs between SXCMS5A and SXCMS5B, among which 658 downregulated and 182 upregulated in SXCMS5A compared to SXCMS5B (Figure 2A; To identify putative DEGs between SXCMS5A and SXCMS5B, the thresholds of "False Discovery Rate (FDR) < 0.05 and |log<sup>2</sup> Fold Change (FC)| > 1" was used to screen for DEGs. There were 840 DEGs between SXCMS5A and SXCMS5B, among which 658 downregulated and 182 upregulated in SXCMS5A compared to SXCMS5B (Figure 2A; Table S2). The expression of most DEGs in SXCMS5A was downregulated compared to SXCMS5B (Figure 2B).

Table S2). The expression of most DEGs in SXCMS5A was downregulated compared to SXCMS5B (Figure 2B). To validate the results of RNA-Seq, 13 DEGs (3 upregulated and 10 downregulated genes) were randomly selected and assayed by qRT-PCR. In Figure 2C, 12 DEGs showed the same trend in both RNA-Seq analysis and qRT-PCR; the coincidence rate between qRT-PCR and RNA-Seq data was 92.31%, suggesting that transcriptome analysis was accurate and reliable.

#### *2.4. Functional Classification of DEGs between SXCMS5A and SXCMS5B*

Through analysis of gene ontology (GO) function, with Q value ≤ 0.05 as the threshold, 324 DEGs were annotated to 479 GO terms in the biological process, 28 of which were significantly enriched, and the first 5 GO terms were external encapsulation structure organization, cell wall organization, cell wall organization or biogenesis, carbohydrate metabolic process and cell wall modification (Table S3). A total of 418 DEGs were annotated to 335 GO terms in molecular functions, 25 of which were significantly enriched. The first 5 GO terms were enzyme inhibitor activity, molecular function regulator, enzyme regulator

activity, catalytic activity and pectinesterase activity (Table S4). In addition, 131 DEGs were annotated to 111 GO terms in the cell components, 7 of which were significantly enriched, and the first 5 GO terms were cell wall, external encapsulation structure, membrane, cell peripheral, and intrinsic component of membrane (Table S5). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 20

**Figure 2.** Analysis of differentially expressed genes (DEGs) between SXCMS5A and SXCMS5B. (**A**) Number of upregulated and downregulated DEGs. (**B**) Volcano plot comparing DEGs. Red dots, green dots, and black dots indicated DEGs that were significantly upregulated, significantly downregulated, or showed no significant difference in expression, respectively. (**C**) Relative expression level of selected DEGs. The *y*-axis indicated relative mRNA expression level, determined by RNA sequencing (RNA-seq) and quantitative real time PCR (qRT-PCR) analysis. The results were obtained from three biological replicates. FC, fold change; FDR, false discovery **Figure 2.** Analysis of differentially expressed genes (DEGs) between SXCMS5A and SXCMS5B. (**A**) Number of upregulated and downregulated DEGs. (**B**) Volcano plot comparing DEGs. Red dots, green dots, and black dots indicated DEGs that were significantly upregulated, significantly downregulated, or showed no significant difference in expression, respectively. (**C**) Relative expression level of selected DEGs. The *y*-axis indicated relative mRNA expression level, determined by RNA sequencing (RNA-seq) and quantitative real time PCR (qRT-PCR) analysis. The results were obtained from three biological replicates. FC, fold change; FDR, false discovery rate.

rate. To validate the results of RNA-Seq, 13 DEGs (3 upregulated and 10 downregulated genes) were randomly selected and assayed by qRT-PCR. In Figure 2C, 12 DEGs showed the same trend in both RNA-Seq analysis and qRT-PCR; the coincidence rate between qRT-PCR and RNA-Seq data was 92.31%, suggesting that transcriptome analysis was To identify the metabolic pathways in which the DEGs were involved and enriched, pathway analysis was performed using the Kyoto encyclopedia of genes and genomes (KEGG) pathway database. The results showed that starch and sucrose metabolism, pentose and glucuronate interconversions, thiamine metabolism, glycolysis/gluconeogenesis, biosynthesis of amino acids, and selenocompound metabolism were the main metabolic pathways (Figure 3).

Through analysis of gene ontology (GO) function, with Q value ≤ 0.05 as the threshold, 324 DEGs were annotated to 479 GO terms in the biological process, 28 of which were significantly enriched, and the first 5 GO terms were external encapsulation

carbohydrate metabolic process and cell wall modification (Table S3). A total of 418 DEGs were annotated to 335 GO terms in molecular functions, 25 of which were significantly enriched. The first 5 GO terms were enzyme inhibitor activity, molecular function regulator, enzyme regulator activity, catalytic activity and pectinesterase activity (Table S4). In addition, 131 DEGs were annotated to 111 GO terms in the cell components, 7 of which were significantly enriched, and the first 5 GO terms were cell wall, external encapsulation structure, membrane, cell peripheral, and intrinsic

*2.4. Functional Classification of DEGs between SXCMS5A and SXCMS5B* 

component of membrane (Table S5).

accurate and reliable.

metabolism were the main metabolic pathways (Figure 3).

To identify the metabolic pathways in which the DEGs were involved and enriched, pathway analysis was performed using the Kyoto encyclopedia of genes and genomes (KEGG) pathway database. The results showed that starch and sucrose metabolism, pentose and glucuronate interconversions, thiamine metabolism, glycolysis/gluconeogenesis, biosynthesis of amino acids, and selenocompound

**Figure 3.** Top 20 Kyoto encyclopedia of genes and genome (KEGG) pathway analysis of DEGs between SXCMS5A and SXCMS5B. The *x*-axis indicated the rich factor corresponding to each pathway and the *y*-axis indicated name of the KEGG pathway. The dot color represented the Q values of the enrichment analysis. The size and color of bubbles represented the number and degree of enrichment of DEGs, respectively. **Figure 3.** Top 20 Kyoto encyclopedia of genes and genome (KEGG) pathway analysis of DEGs between SXCMS5A and SXCMS5B. The *x*-axis indicated the rich factor corresponding to each pathway and the *y*-axis indicated name of the KEGG pathway. The dot color represented the Q values of the enrichment analysis. The size and color of bubbles represented the number and degree of enrichment of DEGs, respectively.

#### *2.5. Identification of DEGs Associated with Mitochondrial Genome*

*2.5. Identification of DEGs Associated with Mitochondrial Genome*  Numerous open reading frames (ORFs) in the mitochondrial genome were closely correlated with plant CMS [14]. In this study, 13 DEGs, i.e., 12 ORFs and 1 *COX2* in the soybean mitochondrial genome were differentially expressed between SXCMS5A and SXCMS5B (Table S2; Figure 4A), 10 out of the 13 genes were upregulated in SXCMS5A compared to SXCMS5B. Interestingly, 3 ORFs (*ORF151*, *ORF103c* and *ORF178*) were expressed at very low levels or not expressed in SXCMS5B. Especially, qRT-PCR analysis confirmed that these 3 ORFs genes were upregulated in SXCMS5A, SXCMS6A and SXCMS7A (the latter two CMS lines having a same cytoplasm source as SXCMS5A) compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 4B–D). Furthermore, He et al. [22] found that *ORF178* was formed during the process of Numerous open reading frames (ORFs) in the mitochondrial genome were closely correlated with plant CMS [14]. In this study, 13 DEGs, i.e., 12 ORFs and 1 *COX2* in the soybean mitochondrial genome were differentially expressed between SXCMS5A and SXCMS5B (Table S2; Figure 4A), 10 out of the 13 genes were upregulated in SXCMS5A compared to SXCMS5B. Interestingly, 3 ORFs (*ORF151*, *ORF103c* and *ORF178*) were expressed at very low levels or not expressed in SXCMS5B. Especially, qRT-PCR analysis confirmed that these 3 ORFs genes were upregulated in SXCMS5A, SXCMS6A and SXCMS7A (the latter two CMS lines having a same cytoplasm source as SXCMS5A) compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 4B–D). Furthermore, He et al. [22] found that *ORF178* was formed during the process of genome recombination in a soybean CMS line NJCMS1A. This indicated that *ORF178* might be a CMS gene of soybean. Since ORF103c also contains a transmembrane domain like ORF178 (Figure 4E,F and Figure S2) [22], *ORF103c* might also be a CMS gene of soybean.

genome recombination in a soybean CMS line NJCMS1A. This indicated that *ORF178* might be a CMS gene of soybean. Since ORF103c also contains a transmembrane domain like ORF178 (Figures 4E,F and S2) [22], *ORF103c* might also be a CMS gene of soybean.

**Figure 4.** Analysis of DEGs in mitochondrial genome between soybean CMS lines and their maintainer lines. (**A**) Heat map of DEGs in mitochondrial genome between SXCMS5A and SXCMS5B. The heat map was conducted using MeV 4.9 software. Log2(FC) values were obtained from the RNA-seq data. (**B**–**D**) Relative expression level of *ORF103c*, *ORF151* and *ORF178* between soybean CMS lines and their maintainer lines. by qRT-PCR analysis. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*, *p* < 0.01; \*\*\*, *p* < 0.001. (**E**,**F**) Transmembrane domain analysis of ORF103c and ORF151. The abscissa indicated the amino acid length of ORFs. The ordinate represented the probability of the predicted transmembrane domain. **Figure 4.** Analysis of DEGs in mitochondrial genome between soybean CMS lines and their maintainer lines. (**A**) Heat map of DEGs in mitochondrial genome between SXCMS5A and SXCMS5B. The heat map was conducted using MeV 4.9 software. Log<sup>2</sup> (FC) values were obtained from the RNA-seq data. (**B**–**D**) Relative expression level of *ORF103c*, *ORF151* and *ORF178* between soybean CMS lines and their maintainer lines. by qRT-PCR analysis. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*, *p* < 0.01; \*\*\*, *p* < 0.001. (**E**,**F**) Transmembrane domain analysis of ORF103c and ORF151. The abscissa indicated the amino acid length of ORFs. The ordinate represented the probability of the predicted transmembrane domain.

#### *2.6. Identification of DEGs Associated with Pollen Development 2.6. Identification of DEGs Associated with Pollen Development*

Pectin methylesterase (PME, also named pectinesterase) and pectate lyase (PL) were two key enzymes involved in the degradation of plant pectin, and played important roles in the regulation of pollen development [23,24]. As noted above, 11 DEGs and 3 DEGs were found associated with pectinesterase activity (GO:0030599) and pectate lyase activity (GO:0030570) GO terms, respectively (Table S4). As shown in Figure 5A, all the 14 DEGs were downregulated in SXCMS5A compared to SXCMS5B. Most importantly, RNA-seq data in Phytozome v12.0 indicated all these transcripts were enriched in soybean flowers (Figure 5B). *GmPME* (*Glyma.02G008300*) and *GmPL* (*Glyma.13G064700*) were selected for qRT-PCR analysis, which were all downregulated in SXCMS5A, SXCMS6A and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 5C,D). These findings suggested that the two gene types might involve in pollen development processes and that their reduced expression in soybean Pectin methylesterase (PME, also named pectinesterase) and pectate lyase (PL) were two key enzymes involved in the degradation of plant pectin, and played important roles in the regulation of pollen development [23,24]. As noted above, 11 DEGs and 3 DEGs were found associated with pectinesterase activity (GO:0030599) and pectate lyase activity (GO:0030570) GO terms, respectively (Table S4). As shown in Figure 5A, all the 14 DEGs were downregulated in SXCMS5A compared to SXCMS5B. Most importantly, RNA-seq data in Phytozome v12.0 indicated all these transcripts were enriched in soybean flowers (Figure 5B). *GmPME* (*Glyma.02G008300*) and *GmPL* (*Glyma.13G064700*) were selected for qRT-PCR analysis, which were all downregulated in SXCMS5A, SXCMS6A and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 5C,D). These findings suggested that the two gene types might involve in pollen development processes and that their reduced expression in soybean CMS might lead to pollen abortion.

CMS might lead to pollen abortion.

**Figure 5.** Analysis of DEGs related to pollen wall development between soybean CMS lines and their maintainer lines. (**A**) Heat map of the DEGs related to pollen wall development between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log2(FC) values were obtained from RNA-seq data in this study. (**B**) Heat map of the DEGs related to pollen wall development in four different tissues. The color scale represented the relative transcript abundance of the DEGs in four soybean tissues. The heat map was created using MeV 4.9 software. Fragments per kilobase of transcript per million mapped reads (FPKM) values were obtained from RNA-seq data in Phytozome v12.0. R, root; S, stem; L, leaf; F, flower. (**C**,**D**) Relative expression level of *GmPME* (*Glyma.02G008300*) and *GmPL* (*Glyma.13G064700*) between soybean CMS lines and their maintainer lines by qRT-PCR analysis. PME, Pectin methylesterase; PL, pectate lyase. Asterisk indicated statistical significance: \*\*\*, *p* < 0.001. *2.7. Identification of DEGs Associated with Carbohydrate Metabolism and Sugar Transport*  **Figure 5.** Analysis of DEGs related to pollen wall development between soybean CMS lines and their maintainer lines. (**A**) Heat map of the DEGs related to pollen wall development between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log<sup>2</sup> (FC) values were obtained from RNA-seq data in this study. (**B**) Heat map of the DEGs related to pollen wall development in four different tissues. The color scale represented the relative transcript abundance of the DEGs in four soybean tissues. The heat map was created using MeV 4.9 software. Fragments per kilobase of transcript per million mapped reads (FPKM) values were obtained from RNA-seq data in Phytozome v12.0. R, root; S, stem; L, leaf; F, flower. (**C**,**D**) Relative expression level of *GmPME* (*Glyma.02G008300*) and *GmPL* (*Glyma.13G064700*) between soybean CMS lines and their maintainer lines by qRT-PCR analysis. PME, Pectin methylesterase; PL, pectate lyase. Asterisk indicated statistical significance: \*\*\*, *p* < 0.001.

#### Many DEGs between SXCMS5A and SXCMS5B involved in carbohydrate *2.7. Identification of DEGs Associated with Carbohydrate Metabolism and Sugar Transport*

metabolism during flower bud development. Among these DEGs, 29, 17, and 12 were associated with starch and sucrose metabolism, pentose and glucuronate interconversions, and glycolysis/gluconeogenesis pathways, respectively (Figure 6A). Especially, most of these genes (55/58) were downregulated in SXCMS5A compared to SXCMS5B. *UDP-glucuronic acid decarboxylase 2-like* (*UDP-GAD2*, *Glyma.07G246600*), *exopolygalacturonase* (*exoPG*, *Glyma.07G245100*) were selected for qRT-PCR analysis, which were all downregulated in SXCMS5A, SXCMS6A and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 6B,C). In addition, 16 DEGs were involved in sugar transport, and 14 of these were downregulated in SXCMS5A compared to SXCMS5B (Figure 6A). *Sugar transport protein 11* (*STP11*, *Glyma.20G103900*) was selected for qRT-PCR analysis, which was downregulated in SXCMS5A, SXCMS6A, and SXCMS7A, compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 6D). Furthermore, we measured soluble sugar, starch, and adenosine triphosphate (ATP) amounts in flower buds of SXCMS5A and SXCMS5B. The results showed that soluble sugar and ATP amounts decreased in SXCMS5A, while the starch amount decreased slightly in SXCMS5A, relative to SXCMS5B (Figure 6E–G). All these findings suggested that inhibition of carbohydrate metabolism and sugar transport might be two of the causes of soybean CMS. Many DEGs between SXCMS5A and SXCMS5B involved in carbohydrate metabolism during flower bud development. Among these DEGs, 29, 17, and 12 were associated with starch and sucrose metabolism, pentose and glucuronate interconversions, and glycolysis/gluconeogenesis pathways, respectively (Figure 6A). Especially, most of these genes (55/58) were downregulated in SXCMS5A compared to SXCMS5B. *UDP-glucuronic acid decarboxylase 2-like*(*UDP-GAD2*, *Glyma.07G246600*),*exopolygalacturonase*(*exoPG*, *Glyma.07G245100*) were selected for qRT-PCR analysis, which were all downregulated in SXCMS5A, SXCMS6A and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B and SXCMS7B (Figure 6B,C). In addition, 16 DEGs were involved in sugar transport, and 14 of these were downregulated in SXCMS5A compared to SXCMS5B (Figure 6A). *Sugar transport protein 11* (*STP11*, *Glyma.20G103900*) was selected for qRT-PCR analysis, which was downregulated in SXCMS5A, SXCMS6A, and SXCMS7A, compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 6D). Furthermore, we measured soluble sugar, starch, and adenosine triphosphate (ATP) amounts in flower buds of SXCMS5A and SXCMS5B. The results showed that soluble sugar and ATP amounts decreased in SXCMS5A, while the starch amount decreased slightly in SXCMS5A, relative to SXCMS5B (Figure 6E–G). All these findings suggested that inhibition of carbohydrate metabolism and sugar transport might be two of the causes of soybean CMS.

**Figure 6.** Analysis of DEGs related to carbohydrate metabolism and sugar transport between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to starch and sucrose metabolism, pentose and glucuronate interconversions, glycolysis/gluconeogenesis, and sugar transport between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log2(FC) values were obtained from RNA-seq data. (**B**–**D**) Relative expression level of *UDP-glucuronic acid decarboxylase 2-like* (*UDP-GAD2*, *Glyma.07G246600*), *exopolygalacturonase* (*exoPG*, *Glyma.07G245100*) and *sugar transport protein 11* (*STP11*, *Glyma.20G103900*) between soybean CMS lines and their maintainers by qRT-PCR analysis. (**E**–**G**) Soluble sugar, starch, and **Figure 6.** Analysis of DEGs related to carbohydrate metabolism and sugar transport between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to starch and sucrose metabolism, pentose and glucuronate interconversions, glycolysis/gluconeogenesis, and sugar transport between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log<sup>2</sup> (FC) values were obtained from RNA-seq data. (**B**–**D**) Relative expression level of *UDP-glucuronic acid decarboxylase 2-like* (*UDP-GAD2*, *Glyma.07G246600*), *exopolygalacturonase* (*exoPG*, *Glyma.07G245100*) and *sugar transport protein 11* (*STP11*, *Glyma.20G103900*) between soybean CMS lines and their maintainers by qRT-PCR analysis. (**E**–**G**) Soluble sugar, starch, and adenosine triphosphate (ATP) contents analysis between SXCMS5A and SXCMS5B. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*\*, *p* < 0.001.

#### *2.8. Identification of DEGs Associated with Reactive Oxygen Species (ROS) Metabolism 2.8. Identification of DEGs Associated with Reactive Oxygen Species (ROS) Metabolism*  Several DEGs were found involved in ROS metabolism, including glutathione

adenosine triphosphate (ATP) contents analysis between SXCMS5A and SXCMS5B. Asterisk

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 9 of 20

indicated statistical significance: \*, *p* < 0.05; \*\*\*, *p* < 0.001.

Several DEGs were found involved in ROS metabolism, including glutathione metabolism and ascorbate and aldarate metabolism (Figure 7A). Among these DEGs, 4 DEGs were associated with glutathione metabolism, and exactly half number of these genes were downregulated or upregulated in SXCMS5A compared to SXCMS5B. In addition, 2 DEGs were found associated with ascorbate and aldarate metabolism, and the RNA-seq showed that they were downregulated in SXCMS5A compared to SXCMS5B. *Glutathione S-transferase-like* (*GST*, *Glyma.02G154400*) and *L-ascorbate oxidase homolog* (*L-AO*, *Glyma.07G225400*) were selected for qRT-PCR analysis, which were downregulated in SXCMS5A, SXCMS6A, and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 7B,C). In addition, we measured catalase (CAT), ascorbic acid oxidase (AAO), and glutathione peroxidase (GPX) activities in flower buds of SXCMS5A and SXCMS5B. The results showed that CAT and AAO activities decreased in SXCMS5A, while the GPX activity was slightly decreased in SXCMS5A, relative to SXCMS5B (Figure 7D–F). All these findings suggested that downregulation of genes associated with ROS metabolism might be one of the causes of soybean CMS. metabolism and ascorbate and aldarate metabolism (Figure 7A). Among these DEGs, 4 DEGs were associated with glutathione metabolism, and exactly half number of these genes were downregulated or upregulated in SXCMS5A compared to SXCMS5B. In addition, 2 DEGs were found associated with ascorbate and aldarate metabolism, and the RNA-seq showed that they were downregulated in SXCMS5A compared to SXCMS5B. *Glutathione S-transferase-like* (*GST*, *Glyma.02G154400*) and *L-ascorbate oxidase homolog* (*L-AO*, *Glyma.07G225400*) were selected for qRT-PCR analysis, which were downregulated in SXCMS5A, SXCMS6A, and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 7B,C). In addition, we measured catalase (CAT), ascorbic acid oxidase (AAO), and glutathione peroxidase (GPX) activities in flower buds of SXCMS5A and SXCMS5B. The results showed that CAT and AAO activities decreased in SXCMS5A, while the GPX activity was slightly decreased in SXCMS5A, relative to SXCMS5B (Figure 7D–F). All these findings suggested that downregulation of genes associated with ROS metabolism might be one of the causes of soybean CMS.

**Figure 7.** Analysis of DEGs related to reactive oxygen species (ROS) metabolism between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to glutathione metabolism and ascorbate and aldarate metabolism between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log2(FC) values were obtained from RNA-seq data. (**B**,**C**) Relative expression level of *Glutathione S-transferase-like* (*GST*, *Glyma.02G154400*) and *L-ascorbate oxidase homolog* (*L-AO*, *Glyma.07G225400*) between soybean CMS lines and their maintainers by qRT-PCR analysis. (**D**–**F**) Activity assays of catalase (CAT), ascorbic acid oxidase (AAO) and glutathione peroxidase (GPX) between SXCMS5A and SXCMS5B. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*\*, *p* < 0.001. **Figure 7.** Analysis of DEGs related to reactive oxygen species (ROS) metabolism between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to glutathione metabolism and ascorbate and aldarate metabolism between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log<sup>2</sup> (FC) values were obtained from RNA-seq data. (**B**,**C**) Relative expression level of *Glutathione S-transferase-like* (*GST*, *Glyma.02G154400*) and *L-ascorbate oxidase homolog* (*L-AO*, *Glyma.07G225400*) between soybean CMS lines and their maintainers by qRT-PCR analysis. (**D**–**F**) Activity assays of catalase (CAT), ascorbic acid oxidase (AAO) and glutathione peroxidase (GPX) between SXCMS5A and SXCMS5B. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*\*, *p* < 0.001.

#### *2.9. Identification of DEGs Associated with Transcription Factor*

Many transcription factors (TFs) were differentially expressed between SXCMS5A and SXCMS5B. As shown in Figure 8A, a total of 20 differentially expressed transcription factors were found, including 15 downregulated and 5 upregulated TFs in SX-CMS5A compared to SXCMS5B. The 15 downregulated DEGs were 3 MYB family TFs,

3 NAC family TFs, 2 bHLH family TFs, 2 nuclear family transcription factor Y subunit, 1 heat stress transcription factor B-3-like isoform X2, and 4 other TFs. The 5 upregulated DEGs were 3 MYB family TFs, 1 WRKY TF and 1 ethylene-responsive TF. Furthermore, *GmMYB35* (*Glyma.06G188400*), *GmMYB* (*Glyma.16G218900*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) were selected for qRT-PCR analysis. Among them, the expression trends of *GmMYB35* (*Glyma.06G188400*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) were consistent in SXCMS5A, SXCMS6A and SXCMS7A, compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 8B,D,E). However, the expression trend of *GmMYB* (*Glyma.16G218900*) was not consistent in SXCMS5A, SXCMS6A, and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 8C). This inconsistency suggested that *GmMYB* (*Glyma.16G218900*) might play different roles in different CMS-maintainer pairs. All these findings suggested that these TFs might involve in regulation of pollen development in soybean CMS. family TFs, 2 bHLH family TFs, 2 nuclear family transcription factor Y subunit, 1 heat stress transcription factor B-3-like isoform X2, and 4 other TFs. The 5 upregulated DEGs were 3 MYB family TFs, 1 WRKY TF and 1 ethylene-responsive TF. Furthermore, *GmMYB35* (*Glyma.06G188400*), *GmMYB* (*Glyma.16G218900*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) were selected for qRT-PCR analysis. Among them, the expression trends of *GmMYB35* (*Glyma.06G188400*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) were consistent in SXCMS5A, SXCMS6A and SXCMS7A, compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 8B,D,E). However, the expression trend of *GmMYB* (*Glyma.16G218900*) was not consistent in SXCMS5A, SXCMS6A, and SXCMS7A compared to their respective maintainer SXCMS5B, SXCMS6B, and SXCMS7B (Figure 8C). This inconsistency suggested that *GmMYB* (*Glyma.16G218900*) might play different roles in different CMS-maintainer pairs. All these findings suggested that these TFs might involve in regulation of pollen development in soybean CMS.

Many transcription factors (TFs) were differentially expressed between SXCMS5A and SXCMS5B. As shown in Figure 8A, a total of 20 differentially expressed transcription factors were found, including 15 downregulated and 5 upregulated TFs in SXCMS5A compared to SXCMS5B. The 15 downregulated DEGs were 3 MYB family TFs, 3 NAC

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 10 of 20

*2.9. Identification of DEGs Associated with Transcription Factor* 

**Figure 8.** Analysis of DEGs related to transcription factors (TFs) between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to TFs between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log2(FC) values were obtained from RNA-seq data. (**B**–**E**) Relative expression level of *GmMYB35* (*Glyma.06G188400*), *GmMYB* (*Glyma.16G218900*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) between soybean CMS lines and their maintainers by qRT-PCR analysis. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*, *p* < 0.01; \*\*\*, *p* < 0.001. **Figure 8.** Analysis of DEGs related to transcription factors (TFs) between soybean CMS lines and their maintainers. (**A**) Heat map of the DEGs related to TFs between SXCMS5A and SXCMS5B. The heat map was created using MeV 4.9 software. Log<sup>2</sup> (FC) values were obtained from RNA-seq data. (**B**–**E**) Relative expression level of *GmMYB35* (*Glyma.06G188400*), *GmMYB* (*Glyma.16G218900*), *GmbHLH118* (*Glyma.09G150000*) and *GmWRKY43* (*Glyma.18G238600*) between soybean CMS lines and their maintainers by qRT-PCR analysis. Asterisk indicated statistical significance: \*, *p* < 0.05; \*\*, *p* < 0.01; \*\*\*, *p* < 0.001.

#### **3. Discussion**

In plants, CMS, i.e., cytoplasmic nuclear interaction male sterility, is controlled by cytoplasmic and nuclear male sterility genes in a coordinated manner. It is generally believed that CMS is caused by the coupling of mitochondrial genes and nuclear genes [14]. The mitochondrial genome and nuclear genome locate on different parts in a cell, but

they relate and influence each other in functions [25]. The nuclear genes encode various protein factors and enzymes needed for mitochondrial gene replication, transcription and translation, while mitochondrial gene mutation leads to the abnormality of its coding protein polypeptide, which can reverse-regulate the replication and expression of a series of nuclear genes through the signal pathway, leading to pollen abortion. In general, nuclear genes related to pollen wall development, carbohydrate metabolism and sugar transport, ROS metabolism and TFs play important roles in male fertility regulation. The relationship between DEGs and pollen abortion of SXCMS5A were discussed as follows.
