*Article* **Evolutionary and Predictive Functional Insights into the Aquaporin Gene Family in the Allotetraploid Plant** *Nicotiana tabacum*

## **Jahed Ahmed, Sébastien Mercx, Marc Boutry and François Chaumont \***

Louvain Institute of Biomolecular Science and Technology, UCLouvain, Croix du Sud 4-L7.07.14, B-1348 Louvain-la-Neuve, Belgium; jahed.ahmed@uclouvain.be (J.A.); mercxsebastien@gmail.com (S.M.); marc.boutry@uclouvain.be (M.B.)

**\*** Correspondence: francois.chaumont@uclouvain.be

Received: 8 June 2020; Accepted: 1 July 2020; Published: 3 July 2020

**Abstract:** Aquaporins (AQPs) are a class of integral membrane proteins that facilitate the membrane diffusion of water and other small solutes. *Nicotiana tabacum* is an important model plant, and its allotetraploid genome has recently been released, providing us with the opportunity to analyze the *AQP* gene family and its evolution. A total of 88 full-length *AQP* genes were identified in the *N. tabacum* genome, and the encoding proteins were assigned into five subfamilies: 34 plasma membrane intrinsic proteins (PIPs); 27 tonoplast intrinsic proteins (TIPs); 20 nodulin26-like intrinsic proteins (NIPs); 3 small basic intrinsic proteins (SIPs); 4 uncharacterized X intrinsic proteins (XIPs), including two splice variants. We also analyzed the genomes of two *N. tabacum* ancestors, *Nicotiana tomentosiformis* and *Nicotiana sylvestris,* and identified 49 *AQP* genes in each species. Functional prediction, based on the substrate specificity-determining positions (SDPs), revealed significant differences in substrate specificity among the AQP subfamilies. Analysis of the organ-specific *AQP* expression levels in the *N. tabacum* plant and RNA-seq data of *N. tabacum* bright yellow-2 suspension cells indicated that many AQPs are simultaneously expressed, but differentially, according to the organs or the cells. Altogether, these data constitute an important resource for future investigations of the molecular, evolutionary, and physiological functions of AQPs in *N. tabacum*.

**Keywords:** aquaporins; bright yellow-2 suspension cells; *Nicotiana tabacum*; substrate specificity; phylogeny

#### **1. Introduction**

Aquaporins (AQPs), also known as major intrinsic proteins (MIPs), are small integral membrane proteins present in almost all living organisms [1,2]. Plants maintain a large and diverse AQP family compared to mammals. For instance, the genomes of rice (*Oryza sativa)*, Arabidopsis (*Arabidopsis thaliana)*, maize (*Zea mays*), soybean (*Glycine max*), switchgrass (*Panicum virgatum*), foxtail millet (*Setaria italica*), sorghum (*Sorghum bicolor*), *Brachypodium distachyon,* tomato (*Solanum lycopersicum*), poplar (*Populus trichocarpa*), cotton (*Gossypium hirsutum*), and potato (*Solanum tuberosum*) encode 39, 35, 36, 66, 68, 42, 38, 28, 47, 55, 71, and 41 AQP homologs, respectively [3–12], compared to only 13 *AQP* genes in mammals [13]. Based on phylogenetic analysis and subcellular localization, vascular plant AQPs are categorized into five subfamilies: (1) plasma membrane intrinsic proteins (PIPs); (2) tonoplast intrinsic proteins (TIPs); (3) nodulin-26-like intrinsic proteins (NIPs); (4) small basic intrinsic proteins (SIPs); (5) uncharacterized X intrinsic proteins (XIPs). To date, the latter subfamily has not been found in Brassicaceae and in monocots [14,15].

While many plant AQPs primarily function as water channels, they can also transport a wide range of substrates, such as ammonia (NH3), antimony (Sb), arsenic (As), boron (B), glycerol, hydrogen peroxide (H2O2), silicon (Si), and urea (U) [2,16–19]. Furthermore, some AQPs facilitate gas diffusion, such as carbon dioxide (CO2) and oxygen (O2) [20–22]. Recently it was reported that AtPIP2;1 has cations (Na<sup>+</sup> and K+) channel activity [23]. AQPs from various plants are also involved in transmembrane water conductance in numerous physiological processes, such as cell water homeostasis, root water uptake from the soil, root and leaf hydraulic conductance, lateral root emergence, motor cell movement, rapid internode elongation, the diurnal regulation of leaf movements, and petal development and movement [1,2,24–29].

The AQP structure comprises six transmembrane (TM) α-helices (TM1-TM6), which are linked by five loops (loops A–E) and two highly conserved NPA (Asn-Pro-Ala) motifs. They form homoor hetero-tetrameric complexes in which each subunit acts as a functional water channel [2,30]. The channel pore contains two constriction regions that contribute to the transport selectivity. The first constriction is formed at the pore center by two highly conserved NPA motifs [31]. The second constriction is the aromatic/arginine (ar/R) filter, formed at the extracellular aperture of the pore by four residues from TM2, TM5, and loop E (LE1 and LE2), respectively [32,33]. Additionally, five amino acid residues known as Froger's positions (FPs) designated P1–P5, are also associated with substrate selectivity [34,35]. More recently, some substrate specificity determining positions (SDPs) have been proposed for B, H2O2, CO2, NH3, As, Sb, and Si [9,17].

*Nicotiana tabacum* (tobacco)*,* a perennial herbaceous plant of the Solanaceae family, is an allotetraploid (2n = 4x = 48), which evolved by the natural hybridization of the ancestors of *Nicotiana sylvestris* (2n = 24, maternal donor) and *Nicotiana tomentosiformis* (2n = 24, paternal donor) about 200,000 years ago [36,37]. *N. tabacum* is intensively studied as a versatile model organism for understanding genetics, functional genomics, cellular and molecular biology, biochemistry and physiology [38]. In this study, we identified *AQP* genes in the genomes of *N. tabacum* as well as its two ancestors, *N. tomentosiformis* and *N. sylvestris*, and analyzed the transcriptome data of *N. tabacum* plant and Bright Yellow-2 (BY-2) suspension cells [39]. We investigated the phylogenetic relationships, as well as the structural properties and subcellular localization of AQPs in *N. tabacum*. Comparing the primary selectivity motifs, we further predicted their probable substrate transport activities. Altogether, this study provides new insights into the expression patterns in different organs and suspension cells, as well as the transmembrane transport selectivity of AQPs in *N. tabacum*.

#### **2. Results**

#### *2.1. Genome-Wide Identification and Characterization of NtAQP Genes*

The whole genome shotgun sequence of *N. tabacum* and its two ancestors, *N. tomentosiformis* and *N. sylvestris*, were searched for *AQP* genes, using pBLAST and AQP sequences from *S. tuberosum* and *S. lycopersicum* as queries. NtAQP protein sequences were analyzed and compared with SlAQP and StAQP for domain identification and functional annotation. Of 101 initial unique hits for *NtAQPs*, 13 were considered *AQP* pseudogenes and discarded after a manual inspection of their nucleotide and amino acid sequences and their TM domains. We finally obtained 88 genes encoding 90 full-length AQP proteins, and *NtXIP1;1* and *NtXIP1;2* genes encoding two splice variants (α and β), as shown in Table 1.


#### **Table 1.** Aquaporin genes in the *N. tabacum* genome.


**Table 1.** *Cont.*


**Table 1.** *Cont.*

1 IP = Isoelectric point. <sup>2</sup> PM: plasma membrane, C: chloroplast, V: vacuole.

This represents the greatest *AQP* gene number in a Solanaceae plant genome. We identified 49 *AQP* genes encoding 51 and 50 full-length proteins in two *N. tabacum* ancestors, namely *N. tomentosiformis* and *N. sylvestris*, respectively, as shown in Table S1. The phylogenetic protein analysis showed that NtAQPs cluster into five subfamilies (PIPs, TIPs, NIPs, SIPs, and XIPs) similar to NtoAQPs, NsAQPs, and SlAQP and StAQP, as shown in Figures 1–3. NtAQPs nomenclature was done from protein sequence comparison with the known SlAQP and StAQP, as shown in Figure 1. Sequences belonging to hybrid intrinsic proteins (HIPs) and GlpF-like intrinsic proteins (GIPs) reported in the non-vascular moss *Physcomitrella patens* [14] were not found. In *N. tabacum*, we identified 34 PIPs, 27 TIPs, 20 NIPs, 3 SIPs, and 6 XIPs, including two splice variants. Figure 1 shows that the PIPs cluster either into the PIP1 or PIP2 groups, and the NtTIPs into five groups (TIP1 to TIP5), similar to the potato and tomato TIPs [3,7]. Eight NIP groups were found in *N. tabacum*, contrary to the seven groups in *Arabidopsis* and soybean [5,11], and three to four NIP groups in poplar, rice, and maize [6,10,12]. Similar to *Arabidopsis*, rice, maize, poplar, and soybean, *N. tabacum* had two SIP groups, namely SIP1 and SIP2s, with two and one isoforms, respectively. Two XIP subgroups were observed in *N. tabacum*, and four XIP subgroups in potato [3]. localizations are just predictions and need to be experimentally demonstrated. Part of the predictions are in agreement with the data reported in the literature, but many differences are also observed. For instance, plant PIP2s are not found in the chloroplasts, TIPs are mostly located in the vacuole (and not in the PM, as predicted for many NtTIPs), and NIPs were not identified in the vacuole. SIPs were localized in the PM and/or the ER in Arabidopsis and maize [40] (Lebrun and Chaumont, unpublished data), but never in the chloroplast. The amino acid number, calculated molecular weight (MW), and isoelectric point (pI) of NtAQP homologs are shown in Table 1. Like their counterparts in other plant species, all PIPs, TIPs, NIP1s, NIP2s, NIP3s, NIP4s, NIP7s, and NIP8s from *N. tabacum,* have two conserved NPA motifs in loops B and E, as shown in Figure 3 and Figures S1–S5. NIP5s and NIP6s have unusual NPA motifs, in which the alanine in loop E is substituted by a valine, and have a characteristic arginine-rich C-terminus, as shown in Figure 3 and Figure S3). In *N. tabacum* SIPs, the alanine in the first NPA motif is substitued by either a threonine (SIP1;1) or a leucine (SIP2s) residue, as shown in Figure 3 and Figure S4. All the SIPs have the conserved NPA motif in loop E with a unique characteristic lysine-rich C-terminus, as shown in Figure S4, which contains an ER retention signal [1,41] (Lebrun and Chaumont, unpublished). In the *N. tabacum* genome, there are four *XIP* genes, including *NtXIP1;1* and *NtXIP1;2*, which encode two splice variants (α and β) [15]. In *N. tabacum* XIPs in the first NPA motif (loop B), alanine is substituted

by a valine residue, as shown in Figure 3 and Figure S5.

*Int. J. Mol. Sci.* **2020**, *20*, x FOR PEER REVIEW 5 of 19

and PM; NIPs–PM and vacuole; SIPs–PM (SIP2;1 in both the PM and chloroplast); XIPs–PM. These

Subcellular localization prediction was conducted using WoLF PSORT software, and the results

**Figure 1.** Phylogenetic relationships among *Nicotiana tabacum*, *Solanum tuberosum*, and *Solanum lycopersium* AQPs. For thiq analysis, 35 selected subgroup representative StAQPs and SlAQPs were **Figure 1.** Phylogenetic relationships among *Nicotiana tabacum*, *Solanum tuberosum*, and *Solanum lycopersium* AQPs. For this analysis, 35 selected subgroup representative StAQPs and SlAQPs were aligned with all NtAQPs using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/ ClustalOmega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. AQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs, and XIPs). Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black; StAQPs and SlAQPs are in red and blue, respectively.

aligned with all NtAQPs using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/Clustal Omega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. AQPs clustered into five different subfamilies (PIPs,

are indicated in black; StAQPs and SlAQPs are in red and blue, respectively.

**Figure 2.** Phylogenetic relationships among *N. tabacum (Nt)* AQPs and its two ancestors, *N. sylvestris (Ns)* and *N. tomentosiformis (Nto)* AQPs. The deduced amino acid sequences of NtAQPs, NtoAQPs, and NsAQPs were aligned using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/Clustal Omega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. The NtAQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs), with the corresponding NtoAQP and NsAQP subfamilies. Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black, NtoAQPs are in blue, and NsAQPs are in magenta. **Figure 2.** Phylogenetic relationships among *N. tabacum (Nt)* AQPs and its two ancestors, *N. sylvestris (Ns)* and *N. tomentosiformis (Nto)* AQPs. The deduced amino acid sequences of NtAQPs, NtoAQPs, and NsAQPs were aligned using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/ClustalOmega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. The NtAQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs), with the corresponding NtoAQP and NsAQP subfamilies. Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black, NtoAQPs are in blue, and NsAQPs are in magenta.

Subcellular localization prediction was conducted using WoLF PSORT software, and the results were as follows: NtPIPs–plasma membrane (PM) and chloroplast, as shown in Table 1; TIPs–vacuole and PM; NIPs–PM and vacuole; SIPs–PM (SIP2;1 in both the PM and chloroplast); XIPs–PM. These localizations are just predictions and need to be experimentally demonstrated. Part of the predictions are in agreement with the data reported in the literature, but many differences are also observed. For instance, plant PIP2s are not found in the chloroplasts, TIPs are mostly located in the vacuole (and not in the PM, as predicted for many NtTIPs), and NIPs were not identified in the vacuole. SIPs were localized in the PM and/or the ER in Arabidopsis and maize [40] (Lebrun and Chaumont, unpublished data), but never in the chloroplast. The amino acid number, calculated molecular weight (MW), and isoelectric point (pI) of NtAQP homologs are shown in Table 1.

Like their counterparts in other plant species, all PIPs, TIPs, NIP1s, NIP2s, NIP3s, NIP4s, NIP7s, and NIP8s from *N. tabacum,* have two conserved NPA motifs in loops B and E, as shown in Figure 3 and Figures S1–S5. NIP5s and NIP6s have unusual NPA motifs, in which the alanine in loop E is substituted by a valine, and have a characteristic arginine-rich C-terminus, as shown in Figure 3 and Figure S3. In *N. tabacum* SIPs, the alanine in the first NPA motif is substitued by either a threonine (SIP1;1) or a

leucine (SIP2s) residue, as shown in Figure 3 and Figure S4. All the SIPs have the conserved NPA motif in loop E with a unique characteristic lysine-rich C-terminus, as shown in Figure S4, which contains an ER retention signal [1,41] (Lebrun and Chaumont, unpublished). In the *N. tabacum* genome, there are four *XIP* genes, including *NtXIP1;1* and *NtXIP1;2*, which encode two splice variants (α and β) [15]. In *N. tabacum* XIPs in the first NPA motif (loop B), alanine is substituted by a valine residue, as shown in Figure 3 and Figure S5. *Int. J. Mol. Sci.* **2020**, *20*, x FOR PEER REVIEW 7 of 19

**Figure 3.** Grouping of *N. tabacum* PIPs, TIPs, NIPs, SIPs, and XIPs based on the ar/R and FPs. The phylogenetic tree was generated as described in Figure 1. The residues in the ar/R selectivity filter and the FPs were identified from the multiple sequence alignment, shown in Figure S1–S5. The ar/R and FP groupings within each subfamily were done based on the corresponding amino acid compositions, which are indicated on the right side of the phylogenetic tree. The solutes predicted, based on substrate specific signature sequences to be transported, are mentioned in square brackets. As, B, C, H, N, Si, Sb, and U indicate arsenic, boron, CO2, H2O2, ammonia, silicon, antimony, and urea, respectively. **Figure 3.** Grouping of *N. tabacum* PIPs, TIPs, NIPs, SIPs, and XIPs based on the ar/R and FPs. The phylogenetic tree was generated as described in Figure 1. The residues in the ar/R selectivity filter and the FPs were identified from the multiple sequence alignment, shown in Figures S1–S5. The ar/R and FP groupings within each subfamily were done based on the corresponding amino acid compositions, which are indicated on the right side of the phylogenetic tree. The solutes predicted, based on substrate specific signature sequences to be transported, are mentioned in square brackets. As, B, C, H, N, Si, Sb, and U indicate arsenic, boron, CO<sup>2</sup> , H2O<sup>2</sup> , ammonia, silicon, antimony, and urea, respectively.

#### *2.2. NtAQP Gene Structures 2.2. NtAQP Gene Structures*

The *N. tabacum AQP* genomic sequences were analyzed for introns and exons, as shown in Figure 4 and Figure S6. Apart from a few inconsistencies, the number and position of introns are conserved within each *AQP* subfamily. *NtPIP* genes have two or three introns, except for *NtPIP2;3*, which has a The *N. tabacum AQP* genomic sequences were analyzed for introns and exons, as shown in Figure 4 and Figure S6. Apart from a few inconsistencies, the number and position of introns are

single intron, and *NtPIP1;5*, *NtPIP1;7*, *NtPIP1;11*, and *NtPIP2;8*, which have no introns, as shown in Figure 4. Among them, *NtPIP2;2* has a very long intron (~15 kb), as shown in Figure S6. The *NtTIP* Figure 4.

conserved within each *AQP* subfamily. *NtPIP* genes have two or three introns, except for *NtPIP2;3*, which has a single intron, and *NtPIP1;5*, *NtPIP1;7*, *NtPIP1;11*, and *NtPIP2;8*, which have no introns, as shown in Figure 4. Among them, *NtPIP2;2* has a very long intron (~15 kb), as shown in Figure S6. The *NtTIP* subfamily exhibits relatively stable gene structure in comparison with other subfamilies. The majority of them have two introns except for *TIP1;2–4* and *TIP1;8–9*, which have a single intron and *NtTIP1;1* with no intron, as shown in Figure 4. The majority of *NtNIPs* have four introns with variable intron-exon organization, as shown in Figure 4 and Figure S6. *NtNIP5;1* has three introns, and *NtNIP3s* and *NtNIP6;1* have five introns, while *NtNIP8;2* possesses a unique gene structure with six introns (the greatest number of introns in an *AQP* gene), one of which is 10 kb long, as shown in Figure S6. The *NtSIP* genes have two introns, except for *NtSIP2;1,* which has no intron. The *NtXIPs* gene structure was very conserved with two introns, except for *NtXIP2;1,* which has a single intron, as shown in Figure 4. *Int. J. Mol. Sci.* **2020**, *20*, x FOR PEER REVIEW 8 of 19 subfamily exhibits relatively stable gene structure in comparison with other subfamilies. The majority of them have two introns except for *TIP1;2–4* and *TIP1;8–9*, which have a single intron and *NtTIP1;1* with no intron, as shown in Figure 4. The majority of *NtNIPs* have four introns with variable intronexon organization, as shown in Figure 4 and Figure S6. *NtNIP5;1* has three introns, and *NtNIP3s* and *NtNIP6;1* have five introns, while *NtNIP8;2* possesses a unique gene structure with six introns (the greatest number of introns in an *AQP* gene), one of which is 10 kb long, as shown in Figure S6. The *NtSIP* genes have two introns, except for *NtSIP2;1,* which has no intron. The *NtXIPs* gene structure was very conserved with two introns, except for *NtXIP2;1,* which has a single intron, as shown in


**Figure 4.** *NtAQP* genes and 2-D protein structure. Introns in the *NtAQP* genes are indicated by gray arrows. The six TM regions are shown in light green bars, and loops B and E are shown in blue hexagons. **Figure 4.** *NtAQP* genes and 2-D protein structure. Introns in the *NtAQP* genes are indicated by gray arrows. The six TM regions are shown in light green bars, and loops B and E are shown in blue hexagons.

#### *2.3. Analysis of NtAQPs Ar/R Selectivity Filter and Froger's Position 2.3. Analysis of NtAQPs Ar*/*R Selectivity Filter and Froger's Position*

We identified the four amino acid residues at the ar/R selectivity filter and the five residues in the FPs using sequence alignments, and used them to group the NtAQPs based on the amino acid residue properties and to compare these groups with those of other species, such as tomato and potato, as shown in Figure 3 [3,7,9]. In addition, all NtAQPs were subjected to the ScanProsite tool (http://prosite.expasy.org/scanprosite/), to identify the substrate specificity-determining positions (SDPs) based on the ar/R, FP, and NPA motifs, and thereby the predicted substrate(s) of each isoform, as shown in Table 2, Figure 3, and Table S2. Water is considered as the universal substrate for AQPs, even though some isoforms were shown not to facilitate its diffusion through the membrane [15]. We identified the four amino acid residues at the ar/R selectivity filter and the five residues in the FPs using sequence alignments, and used them to group the NtAQPs based on the amino acid residue properties and to compare these groups with those of other species, such as tomato and potato, as shown in Figure 3 [3,7,9]. In addition, all NtAQPs were subjected to the ScanProsite tool (http://prosite.expasy.org/scanprosite/), to identify the substrate specificity-determining positions (SDPs) based on the ar/R, FP, and NPA motifs, and thereby the predicted substrate(s) of each isoform, as shown in Table 2, Figure 3, and Table S2. Water is considered as the universal substrate for AQPs, even though some isoforms were shown not to facilitate its diffusion through the membrane [15].



The ar/R selectivity filter in all the NtPIPs is composed of F, H, T, and R residues in TM2, TM5, LE1, and LE2, respectively, and is identical to the ar/R filter found in all the plant PIPs, as shown in Figure 3. According to the residues located at the P1 of FPs, M or Q (G), NtPIPs cluster into two groups, I and II, as shown in Figure 3. Twelve PIPs (mainly PIP1s) are predicted CO<sup>2</sup> channels and thirteen PIPs (mainly PIP2s) are predicted H2O<sup>2</sup> channels, as shown in Figure 3. Based on the ar/R filter, the NtTIPs cluster into four groups (I, II, III, and IV), as shown in Figure 3. The P3–P5 positions in FPs of all NtTIPs are conserved and consist of A, Y, and W residues, respectively, as shown in Figure 3. Based on the disparities in P1 and P2 positions, all TIPs could be divided into two groups. TIP1s and TIP2s are predicted H2O<sup>2</sup> channels, and TIP1s and TIP4s are predicted urea channels, as shown in Figure 3. TIP2s and TIP4s are also predicted as NH<sup>3</sup> channels, which is in agreement with experimental evidence in other species [18,42]. Based on the ar/R selectivity filters, all NtNIPs are divided into four different groups, as shown in Figure 3. On the other hand, based on the FPs, NtNIPs cluster into three groups, as shown in Figure 3, such as potato and tomato, but unlike other plants (Arabidopsis, maize, etc.) [3,6,7,11]. Our analysis predicted that the As transporters are only distributed among the NtNIPs (10 NIPs belonging to Group I, based on the ar/R filter and FPs), as shown in Figure 3. NIP2;1, NIP5;1, and NIP3;2 are predicted as Si, B, and H2O<sup>2</sup> channels, respectively. The NtSIPs are grouped into two groups based on both the ar/R selectivity filter and FPs, as shown in Figure 3. Very few studies have examined the channel specificity of SIPs. Two SIPs from Arabidopsis showed some water channel activity when expressed in yeast [40]. The NtXIPs are clustered into two groups based on the ar/R selectivity filter. However, based on FPs, all NtXIPs were grouped in a single group, as shown in Figure 3. XIP1;1 and XIP1;2 are predicted as B, urea, and H2O<sup>2</sup> channels, as shown in Figure 3. The specificity and function of NtXIP1;1, including its splice variant, were studied in detail and were shown to facilitate the diffusion of B, H2O2, NH3, and urea, but not water [15,43].

#### *2.4. Expression of NtAQP Genes in Roots, Leaves, and Flowers as well as BY-2 Suspension Cells*

The heatmap based on FPKM values shows the *NtAQPs* transcript levels in roots, leaves, and flowers, as shown in Figure 5. Among the 88 *NtAQPs* genes, 73, 75, and 71 are expressed in mature flowers, leaves, and roots, respectively, and 68 genes are ubiquitously expressed in all analyzed organs. *PIPs* are expressed in flowers, leaves, and roots but differently according to the isoforms. A greater number of *NtPIP1* genes are expressed in flowers and leaves than in roots—*NtPIP1;1* and *NtPIP1;10* being the most expressed isoforms in flowers and leaves, respectively, and *NtPIP1;3–8* and *NtPIP1;11* not being expressed in roots. A decreased amount of *NtPIP2* transcripts is generally observed, but all *NtPIP2s* are expressed in the three organs with the exception of *NtPIP2;9* and *NtPIP2;18*, which are not expressed or are expressed very little, as shown in Figure 5*. NtTIP* gene expression levels are often greater in the leaves compared with the other organs, even if a greater number of *NtTIP* genes are expressed in roots, as shown in Figure 5. Among the 20 *NtNIP* genes, seven (*NtNIP3;2* and all the *NtNIP4s*) are not or very lowly expressed in the three organs in the tested conditions. The other *NtNIP* genes are relatively less expressed compared to the other *AQP* subfamily members, as shown in Figure 5. All *NtSIP* genes were ubiquitously expressed in flowers, leaves, and roots, *NtSIP1;2* being the most expressed *NtSIP* in the leaf, as shown in Figure 5. Finally, *NtXIP1;1* was the most expressed *NtXIP* in the three organs with the expression of the others being very decreased.

*N. tabacum* BY-2 suspension cells are widely used to study different physiological processes, the role of specific proteins, or as a heterologous expression system to produce high value pharmaceutical antigens or antibodies [44–48]. We determined which *AQP* genes are expressed in those cells that grow in suspension in an aqueous environment. RNA from wild-type BY-2 cells was extracted and RNA-seq data analyzed for the expression of the 88 *NtAQP* genes. The heatmap based on FPKM values is shown in Figure 6. mRNA of 53 *NtAQP* genes were detected in BY-2 cells growing in a standard MS medium. The most expressed *NtAQP* genes were 11 *PIP1s*, *TIP1;1*, the three *SIPs,* and *XIP1;1*.


*Int. J. Mol. Sci.* **2020**, *20*, x FOR PEER REVIEW 11 of 19

*Int. J. Mol. Sci.* **2020**, *20*, x FOR PEER REVIEW 11 of 19

**Figure 5.** Expression analyses of 88 *NtAQP* genes in root, leaf, and flower. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no or very low expression. *Ns* and *Nto* in parentheses indicate that corresponding *NtAQP* gene evolved from *N. sylvestris (Ns)* or *N. tomentosiformis (Nto).* Question mark (?) indicates that *NtAQP* gene origin (*N. sylvestris* or *N. tomentosiformis*) was not identified. **Figure 5.** Expression analyses of 88 *NtAQP* genes in root, leaf, and flower. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no or very low expression. *Ns* and *Nto* in parentheses indicate that corresponding *NtAQP* gene evolved from *N. sylvestris (Ns)* or *N. tomentosiformis (Nto).* Question mark (?) indicates that *NtAQP* gene origin (*N. sylvestris* or *N. tomentosiformis*) was not identified. **Figure 5.** Expression analyses of 88 *NtAQP* genes in root, leaf, and flower. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no or very low expression. *Ns* and *Nto* in parentheses indicate that corresponding *NtAQP* gene evolved from *N. sylvestris (Ns)* or *N. tomentosiformis (Nto).* Question mark (?) indicates that *NtAQP* gene origin (*N. sylvestris* or *N. tomentosiformis*) was not identified.

**Figure 6.** Expression analyses of 88 *NtAQP* genes in *N. tabacum* BY-2 cells. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no expression or **Figure 6.** Expression analyses of 88 *NtAQP* genes in *N. tabacum* BY-2 cells. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no expression or **Figure 6.** Expression analyses of 88 *NtAQP* genes in *N. tabacum* BY-2 cells. Color scale represents logarithmic FPKM values, where green indicates high expression and red indicates no expression or very low expression. *Ns* and *Nto* in parentheses indicate that corresponding *NtAQP* gene evolved from *N. sylvestris (Ns)* or *N. tomentosiformis (Nto).* Question mark (?) indicates that *NtAQP* gene origin (*N. sylvestris* or *N. tomentosiformis*) was not identified.

#### **3. Discussion**

By screening the *N. tabacum* genome databases, we identified 88 complete *AQP* genes, almost twice the number of *AQP* genes identified in tomato and potato [3,7]. The number of *AQP* homologs always varies between plant species, the dicot plant genomes usually encoding more homologs than the monocot plants, except for the 68 full-length *AQP* genes found in *P. virgatum*, a polyploid monocot species [9]. The great number of *AQP* genes in the *N. tabacum* genome arose from an allotetraploidization event that occurred about 200,000 years ago [36,38] between *N. tomentosiformis* and *N. sylvestris*, which each have 49 *AQP* genes. The difference between the identified gene number in *N. tabacum* (88) and the sum of the *N. tomentosiformis* and *N. sylvestris AQP* genes (98) suggests that some were lost after the polyploidization event. In addition, we also could not exclude the recent local duplication events in each species, as deduced by the protein phylogenetic tree, shown in Figure 2, in which two very close isoforms from the same species are found on the same branch (i.e., NtPIP2;1 and 2;2, NtoPIP2;2 and 2;3, NsNIP3;1 and 3;2, NtoNIP6;1 and 6;2, etc.). Models have been proposed to explain duplicated gene fate: pseudogenization, sub-functionalization, and neo-functionalization [49]. Redundancy also allows one of the copies to accumulate mutations without affecting plant fitness, and new allelic variants or changes in the gene expression pattern can be observed [50]. While activity determination of the duplicated isoforms would be required to determine a sub- or neo-functionalization, changes in expression patterns can be deduced from the rough *NtAQP* expression data analysis. For instance, the duplicated *NtPIP2;1* and *NtPIP2;2* showed different expression levels, which can be organ dependent.

We identified five subfamilies (PIP, TIP, NIP, SIP, and XIP) among the three *Nicotiana* species, similar to most other dicots, except for Brassicaceae and monocots, which have no XIP subfamily [15]. Several *N. tabacum* AQPs have been characterized [51–55], and some became paradigms in the plant AQP community [21,22]. NtAQP1, corresponding to NtPIP1;5 in our study, is a PIP1 protein located both in the plasma membrane and the chloroplast envelope, which exhibits water and CO<sup>2</sup> channel permeability [21]. This discovery highlighted the important diverse roles of AQPs in plant physiology and, more particularly, in photosynthesis, through their contribution in facilitating CO<sup>2</sup> membrane diffusion [28]. More recently, the membrane diffusion of another gas, O2, was reported to be facilitated by NtPIP1;3 when expressed in yeast, and an increased *NtPIP1;3* transcript level was measured in *N. tabacum* roots after a seven day hypoxia treatment [22], suggesting a potential new physiological role of plant AQPs in O<sup>2</sup> membrane permeability. NtXIP1s are the first plant XIP isoforms that have been functionally characterized [15]. NtXIP1;1 is located in the plasma membrane and is shown in a functional assay in heterologous systems to facilitate the membrane diffusion of H2O2, glycerol, boron, and urea, but not water [15]. NtXIP1;1 overexpression in *N. tabacum* results in disturbed boron tissue distribution, leading to boron deficient phenotypes in meristems and young leaves [43]. Interestingly, the *NtXIP1;1* gene contains a sequence motif in the first intron that initiates an RNA-processing mechanism that results in two splice variants (α and β), resulting in two amino acid residue differences [15]. We also identified XIP spliced variants for *NtXIP1;2*, *NtoXIP1;1*, *NsXIP1;1*, and *NtoXIP2;1* isoforms, and also XIPs from *S. tuberosum* and *S. lycopersicum* [15], indicating a conservation of this genomic feature in the Solanaceae family.

To elucidate the substrate specificity of NtAQPs, different signature sequences, including SDPs, NPA motifs, ar/R filter, and FPs were identified, as shown in Figure 3 and Table 2. From this multiple analysis, a majority of PIP1s and PIP2s were predicted to facilitate CO<sup>2</sup> and H2O<sup>2</sup> diffusion, respectively, in addition to water, as shown in Figure 3. This was confirmed in functional assays performed for NtPIP1;5 (NtAQP1) and NtPIP2;1 [21,53,54]. Most TIPs have similar NPA and FPs, suggesting that differences in their substrate transport selectivity might be regulated by the ar/R filter residues. Based on this ar/R filter, Group I and Group II TIPs have a wider pore aperture, which might facilitate the diffusion of relatively larger substrates than water, such as urea, ammonia, and H2O<sup>2</sup> [56–58]. NtTIP4;1 (NtTIPa) was indeed shown to be permeable to water and urea, but also glycerol [51]. NIPs are most diverse in their NPA motifs, ar/R filter, and FPs, suggesting various substrate transport selectivities for these subfamily members and putatively important physiological roles. NIPs are

also known to facilitate the transport of metalloids, such as arsenic and boron, as shown in Figure 3. NtXIPs are predicted to transport H2O2, boric acid, and urea, and were confirmed in transport assays performed with NtXIP1;1 [15,43]. In addition, NtXIP1;1 is not a water channel but is able to facilitate glycerol diffusion [15]. Finally, limited information is available for plant SIP specificity. Water channel activity was determined for AtSIP1s, unlike for AtSIP2;1 [40]. This global substrate specificity study, based on prediction is, however, to be taken with caution, as a single amino acid change, even in the transmembrane domains, could affect the channel characteristic or conformation [59]. Therefore functional assays in heterologous or homologous systems will have to be carried out when analyzing the functional role of specific NtAQP.

As expected, *NtAQP* transcript levels are dependent on the plant organs, but it is quite surprising to observe that 68 of 88 *AQP* genes are ubiquitously expressed in roots, young leaves, and flowers. *PIP* and *TIP* transcripts are relatively more abundant than other subfamily mRNAs, as shown in Figure 5. Considering that the main role of these isoforms is the water facilitated permeation through plasma and vacuolar membranes, this observation confirms their primordial role in water movement through plant tissues, in cell expansion, and cell water homeostasis [24,60]. The *NIP* expression level is low, except for *NtNIP5;1*, but due to their metalloid substrate specificity, a more restricted tissue/cell expression pattern in specific physiological conditions might be expected [43,61,62]. mRNA of 53 *NtAQP* genes were also detected in BY-2 suspension cells growing in a standard MS medium, even if the relative expression level between them was different to what was observed in plant organs. This could be due to the dedifferentiated nature of those cells and/or the specific cell environment of the culture medium. The most expressed *NtAQP* genes in BY-2 cells are 11 *NtPIP1s*, *NtTIP1;1*, the three *NtSIPs*, and *NtXIP1;1*. High *NtPIP* gene expression was also reported in maize Black Mexican Sweet (BMS) suspension cells [63], but in this case, the two most expressed genes belonged to the PIP2 group. Plant PIP1s physically interact with PIP2s within heterotetramers, leading to PIP1 relocalization from the endoplasmic reticulum to the plasma membrane [59,64]. We might wonder whether PIP2 abundance in BY-2 cells is sufficient to bring all PIP1s to the plasma membrane. The increased PIP expression in suspension cells suggests that they are important in controlling membrane water permeability during suspension cell growth. In fact, *PIP* expression varies according to BMS cell growth stages, and this is correlated with greater cell water permeability, measured at the end of the log phase and stationary phase [63]. This might be dependent on variations in the medium composition and/or internal osmotic pressure. *PIP* and *TIP* gene expression in BY-2 suspension cells might also be involved in the control of cell expansion. Cauliflower BobTIP26–1 overexpression in suspension cells (*N. tabacum* cv. Wisconsin 38) increases the cell volume [65], cell enlargement being mostly accounted by vacuole swelling. The quite high expression of *NtSIPs* is also intriguing, knowing that SIPs are mostly expressed in the endoplasmic reticulum and their function is still unknown. *ZmSIP1;2* is also expressed in BMS suspension cells, and its expression is not dependent on the growth stage [63]. Suspension cells might be a promising model to investigate the physiological role at the cell level as well as the biochemical properties of this AQP subfamily. Actually, BY-2 suspension cells represent very useful tools to study AQP function, localization regulation, substrate specificity, and structure, as the cells are easily transformed by *Agrobacterium tumefaciens* or biolistics, and great cell amounts could be obtained for protein purification and reconstitution [66].

In this comprehensive analysis, we identified a highly diverse AQP gene family in *N. tabacum* as well as in its two ancestors, *N. tomentosiformis* and *N. sylvestris*. The signature sequence for substrate selectivity and the possible biological function of NtAQPs were predicted. The transcriptomic data of *N. tabacum* and BY-2 suspension cells represent an excellent resource to guide further analysis of the function of any selected AQP isoform.

#### **4. Materials and Methods**

#### *4.1. Identification and Sequence Analysis of NtAQPs*

The genomes of *N. tabacum*, *N. tomentosiformis*, and *N. sylvestris* available at the Sol Genomics Network (https://solgenomics.net/organism/Nicotiana\_tabacum/genome), were searched for AQPs using BLASTp (http//http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Proteins) tools with the protein sequences of 47 AQPs from *S. lycopersium* (tomato) and 41 AQPs from *S. tuberosum* (potato) as queries. Every sequence from each species was individually compared with functional annotations by browsing the *N. tabacum* databases.

#### *4.2. Phylogenetic Analysis of N. Tabacum AQPs (NtAQPs)*

NtAQPs amino acid sequences were separately aligned with *S. lycopersium* AQPs (SlAQPs) and *S. tuberosum* AQPs (StAQPs) using the Clustal Omega program (https://www.ebi.ac.uk/Tools/msa/clustalo/) and a phylogenetic tree was built using Molecular Evolution Genetic Analysis (MEGA), version 7.0 [67]. The phylogenetic analysis was conducted using the Maximum Likelihood method, based on the Jones–Taylor–Thornton (JTT) matrix-based model with 1000 bootstraps. The identified NtAQPs were classified into different subfamilies according to the phylogenetic relationships with SlAQPs and StAQPs.

#### *4.3. Identification of NtAQP Gene Structure and Transmembrane Helices*

Gene structures were determined by the GSDS 2.0 software (http://gsds.cbi.pku.edu.cn/) using the *NtAQP* gene and CDS sequences as input. The TM α-helices were predicted by TMpred (http://www. ch.embnet.org/software/TMPRED\_form.html) and SOSUI (http://bp.nuap.nagoya-u.ac.jp/sosui/).

#### *4.4. Prediction of Subcellular Localization*

The subcellular localization of NtAQPs was predicted by using the WoLF PSORT (http://wolfpsort. org/), TargetP (www.cbs.dtu.dk/Services/TargetP), Cello prediction system (http://cello.life.netu.edu. tw/), and MultiLoc2 (www.abi.inf.uni-tuebingen.de/Services/MultiLoc2) tools.

#### *4.5. Identification of Substrate Specificity Determining Positions (SDPs)*

The aligned NtAQP sequences were searched manually for SDPs by following the prediction explained previously [9,17] and clustered into different functional groups. The functional group sequences were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/).

#### *4.6. Expression Profile of NtAQP Genes*

Transcript levels as FPKM (Fragments per Kilobase of Transcript per Million Mapped Reads) values of *NtAQP* genes in different organs (mature flowers, leaves and roots) were obtained from the Gene Expression Omnibus (GEO) repository and GenBank Sequence Read Archive (SRA) under the accession code SRP029183 (SRX338104: *N. tabacum* TN90 root; SRX338101: *N. tabacum* TN90 leaf; SRX495520: *N. tabacum* TN90 mature flower). Three biological replicates were obtained from each organ. The FPKM values of the respective *NtAQP* genes were extracted from the databases and transformed into logarithmic (log10) values to generate the heatmap. A heatmap showing the logarithmic *NtAQP*s transcript levels in root, leaf, and flower was generated using Microsoft Excel conditional formatting, based on the normalized FPKM values. In our analysis, a logarithmic FPKM value > 0 was used as a threshold to consider whether a gene is expressed.

#### *4.7. RNA-Seq Experiment*

*N. tabacum* cv. BY-2 suspension cells were grown in the dark at 25 ◦C with agitation on a rotary shaker (90 rpm) in liquid MS medium (4.4 g/L Murashige and Skoog salts (MP BIOMEDICALS, Solon, OH), 30 g/L sucrose, 0.2 g/L KH2PO4, 2.5 mg/L thiamine, 50 mg/mL myo-inositol, and 0.2 mg/L 2,4-D, pH 5.8 (KOH)). Cultures were grown in 50 mL of medium in a 250 mL Erlenmeyer flask and a 5% inoculum was transferred each week into fresh medium. BY-2 cells (100 mg) were collected three days after inoculation (exponential phase) and the total RNA was extracted from three biological replicates and sent to the Macrogen Company, which performed the library preparation, RNA sequencing, and data analysis. For the library preparation, the mRNA was purified from total RNA and transformed into a template molecule library, appropriate for subsequent cluster generation using the Illumina® TruSeq™ RNA Sample Preparation Kit. The first step in the workflow encompassed purifying the poly-A-containing mRNA molecules using poly-T oligo-attached magnetic beads. After purification, the mRNA was split into small pieces using divalent cations under high temperature. The cleaved RNA fragments were copied into first strand cDNA using reverse transcriptase and random primers. This was followed by the second strand cDNA synthesis using DNA polymerase I and RNase H. These cDNA fragments then went through an end repair process, the addition of a single "A" base, and then the ligation of adapters. Finally, the products were purified and enriched with PCR to generate the final cDNA library. The library was then submitted for paired-end 2 × 100 bp sequencing in Illumina HiSeq2000. Sequencing data were analyzed through the Trinity pipeline, which permitted de novo transcriptome reconstruction. The transcript abundances were calculated using RSEM (1.2.15) software [68]. Blast-X (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE\_TYPE= BlastSearch&LINK\_LOC=blasthome) was used to compare the six-frame translation products of a nucleotide query sequence against a protein sequence database (go\_v20150407). Finally, the FPKM values for the respective *AQP* genes were identified from the annotated BY-2 cell transcriptomic data. A heatmap was generated based on the transformed logarithmic (log10) FPKM values. Similar to organ specific expression data, FPKM values > 0 were used as a threshold to consider whether a gene is expressed.

#### **Supplementary Materials:** Supplementary Materials can be found at http://www.mdpi.com/1422-0067/21/13/4743/s1.

**Author Contributions:** All authors designed the experiments. J.A. and S.M. performed the experiments. All the authors analyzed the data. J.A. and F.C. wrote the manuscript. J.A., M.B., and F.C. contributed to the discussion and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Belgian National Fund for Scientific Research (FNRS, FRFC 2.4.501.06F), the "Communauté française de Belgique-Actions de Recherches Concertées" (grants ARC16/21-075). J.A. and S.M. were supported by a research fellow at the Fonds de Formation à la Recherche dans l'Industrie et l'Agriculture.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
