**1. Introduction**

ATP-binding cassette (ABC) proteins form a very large family across all domains of life, responsible for the primary active uptake and export of nutrients, toxins, lipids, peptides and other metabolites. ATP hydrolysis is carried out at two cytoplasmic nucleotide-binding domains (NBDs), and the energy released is coupled to conformational changes in two transmembrane domains (TMDs) to power transport of substrate. In mammals, ABC proteins are divided into seven subfamilies, A–G, although the ABCE and F families lack TMDs and are associated with ribosome function [1]. Often, members within a subfamily, though sharing common descent, can have very different functions. The family investigated here is the G subfamily of ABC transporters in mammals (ABCGs). In most mammals, there are five members of this subfamily [2]. All mammalian ABCGs share a common arrangement of domains, all being "half-transporters" with just a single NBD and TMD in the primary amino acid sequence. A unique property of ABCG arrangement is that the NBD is N-terminal to the TMD, so they are referred to as "reverse" half-transporters.

Four of the mammalian ABCGs have a repertoire of substrates limited to lipids. Two of these, ABCG1 and ABCG4, have sequences much more similar to one another than they are to the rest of the ABCGs. They also seem to share much of their function, regulating cholesterol metabolism by transporting cholesterol into high-density lipoprotein [3]. Precise differences in their function are yet to be determined, but they do seem to differ significantly in their tissue expression profiles [3–5].

**Citation:** Mitchell-White, J.I.; Stockner, T.; Holliday, N.; Briddon, S.J.; Kerr, I.D. Analysis of Sequence Divergence in Mammalian ABCGs Predicts a Structural Network of Residues That Underlies Functional Divergence. *Int. J. Mol. Sci.* **2021**, *22*, 3012. https://doi.org/10.3390/ ijms22063012

Academic Editor: Thomas Falguières

Received: 1 February 2021 Accepted: 12 March 2021 Published: 16 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The two other lipid-transporting ABCGs, ABCG5 and ABCG8, are also more closely related to one another than they are to the other ABCGs, though to a lesser extent than ABCG1 and ABCG4. They have taken on necessarily different roles by forming an obligate heterodimer, neither protomer being trafficked to the membrane if expressed alone [6]. ABCG5/G8 expressed in the liver and intestine limits the uptake of toxic plant and shellfish sterols and is responsible for 35% of the efflux of cholesterol in the intestine. The ABC dimer G5/G8 has only one functional ATP-binding site, indicating that ABCG5 and ABCG8 have diverged in function in this respect.

The final ABCG, ABCG2, has a much broader substrate specificity. It was first isolated in placental tissue and breast cancer cell lines [7,8], and has been since identified as a multidrug resistance (MDR) protein. It can export a wide variety of substrates, including many chemotherapy drugs, making it a target of great therapeutic interest. For this reason, it is the best studied of the ABCG subfamily.

Recently, structures have been solved for ABCG5/G8 and ABCG2. First came the structure of ABCG5/G8 [9], which was used to model a structure of ABCG2 [10]. Docking substrates to this model identified multiple possible binding sites, already suggested by previous biochemical work [11]. With the first structures of ABCG2 [12–14], its broader substrate specificity was explained through a relatively large internal cavity, compared to ABCG5/G8′ s deep, slit-like cavity, forming part of the transport pathway, though in more recent structures, the cavity is only present in structures with substrates bound [15]. In spite of these structural advances, the molecular basis for differences in function between ABCG family members is largely unknown. As their differences ultimately arise from differences in their sequence, it is possible that comparison of conservation between ABCGs could provide clues to help ascertain this molecular basis.

Families of genes can occur when a gene duplicates and the different copies start to take on different functions, a process known as functional divergence [16,17]. When this happens, the evolutionary pressures on the duplicated genes start to differ, with impact on the sequences of the proteins encoded. Non-synonymous mutations in structural elements with functional importance are less likely to persist [18]. In two functionally divergent proteins, a structural element may be more important to the function of one than the other, which will be reflected in this region being better conserved in the protein for which the element is more important. This has been called type I divergence [16]. A similar phenomenon, type II divergence, occurs if the same element is important for the function of both proteins, but the important properties of the amino acid found there are different. This is reflected in the region being conserved in both proteins, but with different amino acids being conserved. The differences in sequence conservation caused by functional divergence have been used to identify important sites in proteins.

In order to analyse the conservation between the members of the G subfamily of ABC transporters in mammals, we have calculated functional divergence of residues based on Shannon entropy [19] from a large multiple sequence alignment of ABCGs. We have examined residues with particular patterns of type II divergence between ABCGs reflecting some of the functional divergence responsible for their differences. Hypotheses regarding the structural basis of these functional differences were derived by mapping positions in the alignment that share particular types of conservation onto the apo-closed structure of ABCG2. Specifically, we have identified a top-to-bottom signature, passing through the polar relay of ABCG5/G8 [9,20], which may contribute to allosteric differences in the G subfamily responsible for differences in substrate specificity.
