*3.1. Construction of Ribosomal Proteins S1 Dataset*

To make a representative dataset of records for the family of ribosomal proteins S1 from the UniProt database, all records for the bacteria containing any one of the keywords «30s ribosomal protein s1», «ribosomal protein s1», «30s ribosomal protein s1 (ec 1.17.1.2)», «30s ribosomal protein s1 (ribosomal protein s1)», «ribosomal protein s1 domain protein», «rna binding protein s1», «rna binding s1 domain protein», «s1 rna binding domain protein» in the protein name were selected (UniProt release 2018\_04). Then the obtained array of data was used to choose only proteins encoded by the rpsA gene or its analog; for example, rpsA\_1, rpsA\_2, rpsA\_3, etc. Only this gene, coding the ribosomal protein S1, in the European nucleotide archive (ENA, http://www.ebi.ac.uk/ena) is affiliated to the STD class, that is, the class of standard annotated sequences. From the obtained dataset records, those with six-digital identification numbers (annotated records in the UniProt database) were selected. All data were collected in one file that was the basis for further analysis, namely for collection of data on the number of structural domains and for phylogenetic grouping in the main bacterial phyla (http://bioinfo.protres.ru/other/uniprot\_S1.xlsx). Records characterized by the presence of the word "candidate" were removed from our dataset. The automated advanced exhaustive analysis allowed us to choose 1374 records corresponding to these search parameters.

### *3.2. Number and Identification of Structural Domains in Protein Sequences*

The values of the number of S1 domains corresponding to the SMART database (about 1200 domains), were selected for each analyzed record. If no data on the number of domains in one of the analyzed bases was available (None), this number was taken to be zero (these records were removed from investigated dataset). Accurate borders for each S1 domain for each record were taken from the UniProt database (position, domain and repeats field).

#### *3.3. Prediction of Disordered Regions and Tendency for Intrinsic Disorder*

#### 3.3.1. FoldUnfold and IsUnstruct Programs

The FoldUnfold program is accessible at http://bioinfo.protres.ru/ogu/. The principle of its operation is described elsewhere [26,27]. Such a property of residues as the observed average number of contacts in a globular state, closed at a given distance, was used. To predict IDRs (intrinsically disordered regions) in the protein chain using the amino acid sequence, every residue was given an expected number of contacts in the globular state. Then averaging was done by the residue equal to the window width. The obtained average value of expected contacts was ascribed to the central residue in the chosen window. After that the window was shifted by one residue, and the procedure was repeated. On the profile of expected contacts, a boundary was marked that separated structured and unstructured residues. The mean expected number of closed residues, estimated from the sequence, was equal to the sum of expected contact residues divided by the number of amino acid residues in the protein. According to the algorithm of the program, the size of disordered (flexible) regions in such a protein must be equal to or greater than the size of the averaged window. Therefore, the number

of predicted regions depended on the window size. The window size in 11 amino acid residues was optimal for the search for relatively short disordered regions in the polypeptide chain. In the case of searching for long disordered regions in partially disordered proteins, the window size must be increased to several tens of amino acid resides. At the same time, for searching for short loops one should use the averaged window size of five amino acid residues, which is optimal for this task.

The IsUnstruct program (v.2.02) is accessible at http://bioinfo.protres.ru/IsUnstruct/. The algorithm of the IsUnstruct program is based on the Ising model. For estimation the energy of any state, the energy of the border between ordered and disordered residues and the energies of initiation of disordered state at the ends were used [39]. After the optimization procedure [28], 20 energetic potentials for residues were obtained which were considered to be in a disordered state, the energy of border, and the energies of initiation of disordered state at the ends. The energy of the completely ordered state was taken to be zero.
