*4.1. DNA and Protein Sequences from Databases*

DNA sequences of the 5′UTRs of the 12 human ABC protein-coding genes grouped together in the ABCA subfamily *ABCA1-10, ABCA12*, and *ABCA13* and amino acid sequences of the whole proteins were downloaded from the Ensembl database (EMBL-EBI; https://www.ensembl.org/index.html) in the FASTA format. The transcripts of the main principal isoforms were chosen for further analyses based on the APPRIS classification system, UniProt annotation score and MANE Select system (description of transcript flags on the Ensembl web pages - https://www.ensembl.org/info/genome/genebuild/ transcript\_quality\_tags.html). A survey of the number of protein-coding isoforms of ABCA genes and genome positions of their 5′UTRs was performed at the beginning. We found that the number of protein-coding isoforms of individual ABCA genes ranges from three to seven transcripts. There are 48 protein-coding transcripts altogether (36 non-principal isoforms). Fifteen out of the 36 non-principal isoforms have their 5′UTRs located within the same genomic regions as the relevant principal isoforms and smaller or equal to the 5′UTRs of principal isoforms in length. Another eight out of the 36 non-principal isoforms do not have 5′UTR sequences annotated. Because of this heterogeneity and inequality of data, we decided to include just one 5′UTR representative for each of the ABCA genes.

5 ′UTR sequences of orthologous genes from 10 other vertebrate species were selected and downloaded from the same database in relation to each human ABCA gene according to the same criteria. Based on the availability of species in the Ensembl, members of the five subgroups of vertebrates increasingly phylogenetically distant from humans—primates, rodents, other placental mammals, reptiles and birds, and ray-finned fishes—were covered. Table S1 discloses the names, IDs and basic characteristics of all the transcripts downloaded. The numbers, positions and lengths of the 5 ′UTR introns were also collected.
