*2.5. General Trends Observed for the Protein Complexes with High-Scoring Interprotein ECs*

The seven protein pairs yielding high-scoring ECs mostly had >2 Gremlin sequence/residue alignment coverage values, while the ones that stopped running or did not provide high-scoring ECs tended to have values <1 (Table 1). The coverage values did not correlate with the number of IDP homologs in PFAM. Longer-interacting IDP regions with wider phylogenetic spread (for the corresponding phylogenetic groups see Supplementary Table S1) had a higher chance for sufficient coverage values, while short IDP chains that required significant extensions for reaching the minimum length of 30 residues did not have a good chance for >1 coverage values even if mediating a phylogenetically widely conserved interaction. Therefore, the complexes where the IDPs interact with their partners in the typical IDP manner—through ELMs/SLiMs or molecular recognition features (MoRFs)—namely the enolase- and PNPase-interacting motifs of RNase E [53], the SspB-interacting region of the N-terminal part of RseA, and the C-terminal interaction motif of single-stranded DNA-binding protein (SSB) recruiting diverse partner proteins [54,55], did not have enough PFAM sequences for analysis or, despite a vast amount of sequences in PFAM, did not show any high-scoring ECs. The lack of a sufficient number or diversity of detectable homologs for these important interaction motifs/regions could be due to their short length, the relatively small fraction of the actual specificity-determining residues and the fast evolutionary turnover of the surrounding other residues [31].

In the case of the interactions mediated by the SSB C-terminal motif, that are among the phylogenetically most widespread ones, the lack of detected ECs could be attributed to different reasons. The C-terminal motif is 9 residues in length, so it had to be extended by 21 residues from the poorly conserved SSB linker region to reach a total length of 30 residues. The lack of conservation in the linker segment dilutes the information in the motif, whereas the multitude of interaction partners simultaneously restricting the evolution of the motif [55,56] leads to a complete lack of sequence variation in most of its residues, which could both contribute to the lack of detected ECs.

While EC residues showed a strong preference for interface helices, only 58.8% of ELM instances form secondary structure elements, with only 16.2% of them being mostly helical and 7.6% being partially helical according to a large-scale analysis of the eukaryotic linear motif (ELM) database [31]. Although our dataset only contains a few bacterial short linear motifs, which are not part of the ELM database, they show a similar distribution among secondary structure types as proposed for their eukaryotic counterparts. The SSB C-terminal motif does not form a secondary structure with any of its 5 partners, while the SspB binding motif of RseA forms two very short helices. The PNPaseand enolase-binding motifs of RNase E were not subjected to EC analysis because they did not have the sufficient amount of PFAM sequences, but the former binds through beta sheet augmentation, while the latter forms a short helix. Thus, the identified SLiMs do not show a preference for helical conformation, and by mostly spanning only 4–9 residues, even the helix-forming ones are not long enough to form extended helical structures on the surface of binding partners, which could also contribute to the lack of detected ECs. In all, the complete lack of ECs for SLiM-mediated interactions regardless of their phylogenetic spread definitely represents a major limitation of applying residue co-variation-based approaches for the analysis of IDPs.

Although multisubunit enzymes often contain predicted disordered chains that occupy a completely extended conformation in the complex, and such cases would be perfect candidates for IDP-partner co-evolution analysis, the structural features of these subunits are rarely analyzed on their own. Due to this reason, our DIBS-derived dataset only has 4/37 permanent protein complexes, while most of them depict transient interactions (Supplementary Table S1). The seven complexes with identified ECs contained both permanent (1) and transient (6) complexes in similar fractions as seen for the overall dataset, so permanence of the interactions does not seem to largely affect their co-evolution patterns at first glance. However, it is interesting to note that all the complexes with identified ECs had IDP–partner interface areas >1000 Å2, except for the sole complex with the permanent interaction of two subunits of the ATP synthase that only had 610.5 Å2. This might imply that, in permanent protein complexes, which are phylogenetically widely distributed, co-evolution of interacting subunits is so prominent that it can be detected even if the corresponding interaction surfaces are relatively small.
