**4. Materials and Methods**

There are 49 heterodimeric proteins in the MFIB database. Entries belonging to the "coils and zippers" structural class were excluded, as in the case of homodimers. Since 25 of the 49 heterodimers are histones, filtering of the dataset was necessary to avoid overrepresentation and sequence redundancy of this protein class. Proteins were assigned to the same cluster if their sequence identity was over 90% using the BLASTClust toolkit 2.2.26 [26]. The 2mv7 entry was discarded because it was an outlier due to its fuzzy NMR structure in SASA calculations. One representative structure was kept for the remaining 30 clusters, creating the filtered MFHE dataset (see Table S3). A reference dataset was created of globular heterodimers (GLHE) from the PDBSelect [27] database with a total number of residues less than 240 to match the size distribution of the heterodimer MFIB dataset (see Table S3).

We described the methods in the latest article [7], but briefly: the interface term is used for the contact surface area of the two subunits in the heterodimeric structures. In cases where the term "monomeric structure" is used, calculations were carried out on single polypeptide chains, where the other chain was removed from the PDB files. Residues belonging to the interface region were identified based on solvent accessible surface area (SASA) calculations. All-atom SASA values were calculated using the FreeSASA 2.03 [28] program, residues where the SASA value calculated for the dimeric structure was less than or equal to 20% of the monomeric value, were defined to belong to the interface.

We were looking for residues in the interface that have solvent accessible spots in their main-chain in the monomeric structure, which become buried in the dimeric structures. We identified residues where the main-chain SASA in the dimeric form was less than 20% of the monomeric form value. Only residues with exposed main-chains, with a relative main-chain SASA larger than 0.2 in the monomeric structure, were taken into account. These residues with solvent accessible main-chain patches are called RSAMPs and are believed to be important for structural ordering upon dimerization of the disordered polypeptide chains collected in the MFIB database.

We used an additional Small Globular Protein (SGP) dataset to determine the minimal buried core size of proteins (see Table S3). We collected monomeric single-domain proteins X-ray structures from the PDBSELECT database with less than 120 residues, which do not contain disulfide bonds. Since there was a significant hole in the size distribution of the X-ray structures, monomeric single-domain NMR structures without disulfide bonds were added to the dataset. We excluded rod-like and fuzzy NMR structures using a volume/surface cutoff criterion. Protein volumes were calculated using the ProteinVolume 1.3 program [29].

Secondary structural elements were identified using the DSSP [21] program. Hydrogen bonds were identified using the find\_pairs PyMol command using 3.5 Å distance and 45-degree angle criteria [30]. Wrapping of hydrogen bonds was calculated using the dehydron\_ter.py program [31]. Stabilization centers (SCs) are special pairs of residues involved in cooperative long-range interactions. The two residues that form a stabilization center are called stabilization center elements (SCEs). SCEs were identified using our SCide server [32]. Ion pairs were defined as pairs of positively and negatively charged residues, with a distance of less than a cutoff value between the charged groups. For strong ion pairs, this value is 4 Å [33], but we introduce additionally, a weak ion-pair definition with a distance cutoff value of 6 Å. Histidine residues were assumed to be neutral in these calculations because of the uncertainty of their protonation states. Ion pairs were identified using our own C++ program. We calculated the total charge of the proteins simply by adding the number of Arg and Lys residues and subtracted the sum of Asp and Glu resides.

Amino acid compositions were determined using MEGA7 software [34]. The amino acid composition of the protein subunits and complexes were visualized in Nonmetric multidimensional scaling (NMDS) in PAST3 [35]. In the plot, one point for each amino acid composition, where close points were more similar in composition (with Bray-Curtis distances). This was followed by a SIMPER analysis (also based on Bray-Curtis distances, in PAST3) to identify those amino acids that contributed most to the observed differences among the type of subunits and complexes. Disorder predictions were revealed by IUPred2A [17] and MoRFpred [16] algorithms.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/20/5136/ s1. Table S1. Contribution of the amino acids for the observed differences in NMDS by using SIMPER analysis in the subunits (A) and the complexes (B) Table S2. Pfam domains and the intermolecular SCs in globular (A) and MFIB (B) heterodimeric complexes. Table S3. List of PDB entries in the MFHE, GLHE and SGP datasets. Figure S1. The number of burial residues in MFHO (A) and GLHO (B) complexes (black: number of all residues in a complex, red: number of buried residues in a heterodimeric complex, blue: sum of number of buried residues in the two monomeric subunits. Figure S2. The number of total and buried residues of SGP, GLHE, GLHO, MFHE and MFHO.

**Author Contributions:** Conceptualization, I.S., and C.M.; methodology, A.M., E.F.; software, A.M., E.F., C.M.; validation, A.M., C.M; formal analysis, C.M.; investigation, A.M., E.F.; resources, A.M, E.F.; data curation, A.M., E.F., C.M.; writing—original draft preparation, A.M., C.M., I.S.; writing—review and editing, E.F., A.M., C.M; visualization, A.M.; supervision, I.S.; project administration, I.S.; funding acquisition, I.S.

**Funding:** This work was financially supported by the National Research, Development and Innovation Office (grant no. K115698). IS was supported by project no. FIEK\_16-1-2016-0005 financed under the FIEK\_16 funding scheme (National Research, Development and Innovation Fund of Hungary). The work of AM was supported through the New National Excellence Program of the Ministry of Human Capacities (Hungary).

**Acknowledgments:** The authors acknowledge the support of ELIXIR Hungary (www.elixir-hungary.org).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
