**2. Results**

Even though bacterial proteomes are relatively poor in IDPs/IDRs [42–44], we restricted our search to bacterial IDP–partner interactions to ensure a sufficient number and diversity of orthologous sequences for residue co-variation analysis. Although there are already a few hundred eukaryotic genomes available, they are not evenly distributed among phylogenetic groups (i.e., a large fraction of them are from mammals) and thus do not show enough sequence diversity on the level of proteins. This is well supported by the fact that recent large-scale residue co-variation analyses have all focused on bacterial protein complexes [9,14,15]. Therefore, we analyzed the 42 bacterial IDPs bound to their folded partners available in the Database of disordered binding sites (DIBS) [45] because there the structural states of the constituent protein chains are backed by experimental evidence. After getting rid of 4 redundant structures and one where the IDP chain was only four residues long, the phylogenetically fairly wide-spread 19 complexes (those with >130 sequences for the IDP in Pfam database 31 [46] full alignments) were selected for co-variation analysis (Supplementary Tables S1 and S2). This way, species–specific complexes, such as virulence factors, certain toxin–antitoxin, effector–chaperone and effector–immunity protein pairs could be counter-selected instead of needlessly occupying co-evolution analysis servers. The sequences were trimmed for the interacting regions/domains or extended to reach the minimum length of 30 residues required for Gremlin analysis as indicated in Table 1, and the Gremlin [15] and EVcomplex [14] servers

were used to identify interprotein ECs (See Methods for further details). These methods perform co-variation analyses along similar lines. They first prepare a so-called paired alignment for the protein pair provided, which means that they detect the closest homolog of each of the two provided proteins in each analyzed proteome, and build an alignment wherein the interacting sequences are linked together and filtered for similarity. If the resulting paired alignment has enough sequences, they proceed with detecting co-varying residue pairs (evolutionary couplings, ECs) therein. They use somewhat different approaches to score the pairs of residues based on evolutionary co-variation. From the outputs we exclusively take interprotein ECs, where the coupled residues come from the two different proteins because those provide clues on the interaction. Gremlin could be successfully run on 13 complexes (Table 1), while it stopped in 6 cases due to the alignments being insufficient for analysis. For 7 of the 13 successfully analyzed complexes, Gremlin detected coupled interprotein residue pairs with a scaled score ≥1.3 and probability in the top 12% (*p* > 0.88), hereafter referred to as ECs. For EVcomplex, interprotein ECs with an EVcomplex score >0.9 have been accepted as ECs. EVcomplex also identified ECs for 7/19 complexes (Supplementary Table S1). Both the identified complexes and the detected ECs showed a good overlap between the two methods. Since Gremlin ran the sequence search on a more up-to-date sequence database, it obtained better sequence coverage values and consequently identified more ECs than EVcomplex for almost all the analyzed protein pairs. Based on their residue–residue distances, almost all Gremlin EC pairs fell spatially close, implying that they could be correctly identified co-varying pairs. Thus, we decided to continue the analysis and show the results for Gremlin ECs. The ECs were then checked by PDBe PISA [47] in order to see if they are at the interface (IF) and if they engage in physical interaction (hydrogen bonds or salt bridges; see Table 1).
