**1. Introduction**

Intrinsically disordered proteins (IDPs) are crucial elements of the molecular machinery indispensable for complex life [1,2]. IDPs are parts of regulatory pathways [3], control the cell cycle [4,5], function as chaperones [6,7], and regulate protein degradation [8,9], amongst other functions. In accord, IDPs are typically under tight regulation at several levels [3,10]. While some IDPs fulfill their functions directly through their lack of structure, such as spring-like entropic chains, the majority of disordered proteins interact with other macromolecules, most often other proteins [11]. IDP-mediated interactions

are essential for many hub proteins [12,13], and several IDPs serve as interaction scaffolds/platforms for macromolecular assembly [14,15]. Mounting evidence also shows that protein disorder plays a crucial role in the assembly of liquid–liquid phase separated non-membrane-bounded organelles [16].

Depending on the partner protein and the specifics of the interaction, IDPs can bind through several mechanisms. Several IDPs recognize and bind to ordered protein domains, usually through a linear sequence motif [17]. While some IDPs retain their inherent flexibility in the bound form as well [18], in most known cases the complex structure lends itself to standard structure determination methods, such as X-ray crystallography or NMR. These cases of coupled folding and binding have been studied intensively [19–21]. However, IDPs can utilize a fundamentally different molecular mechanism for interaction, through which they reach a folded state as well. Complexes that contain only IDPs as constituent protein chains, without the presence of a previously folded domain, are formed via a process called mutual synergistic folding (MSF) [22]—a much less understood way in which protein folding and binding can merge into a single biophysical process.

A major advancement in the field of IDP interactions in recent years was the development of specialized interaction databases for various mechanisms including coupled folding and binding [23,24], fuzzy complexes [25], mutual synergistic folding [26], and proteins driving liquid–liquid phase separation [27]. Out of these aspects, possibly the most understudied one is mutual synergistic folding, owing to the fact that these are the only interactions where none of the partner proteins have a well-defined structure outside of the complex, forcing us to revise our current approaches used for describing protein structures and complexes. The biological and biophysical properties of these interactions are markedly different from those mediated by other types of proteins. While in other interaction types a stable, folded hydrophobic core is already present in at least one partner, here the folding and binding happen at the same time for all partners. Comparative analysis has not only shown that MSF complexes constitute a separate biologically meaningful class, but also highlighted that these complexes are highly heterogeneous in terms of sequence and structure propreties [28–30].

We now have knowledge of over 140,000 protein structures deposited in the Protein Data Bank (PDB) [31], a major part of which contains several proteins in complex. In each of these cases, the proteins achieve stability either before or upon interacting. A major question is how is stability achieved? Can this be a basis of the definition of biologically meaningful classification? In the case of ordered proteins, current hierarchical classification schemes are rooted in the tertiary protein structures, such as in the case of methods/databases as SCOP (Structural Classification of Proteins) [32] and CATH (Class, Architecture, Topology, Homologous superfamily) [33]. While these methods are extended to classify protein complexes as well, they do not explicitly factor in parameters that describe the interactions or the differences in sequence composition between complexes of similar overall structures. However, in the case of MSF complexes, these differences are defining features, as the interaction is the primary reason for the emergence of the structure itself, and this interaction usually requires highly specialized residue compositions [28]. While other classification methods were developed specifically for protein–protein interactions, they only aim to describe the interface, without taking the overall resulting structure into account [34].

Here we present the first classification method designed to identify biologically relevant types of protein complexes formed via mutual synergistic folding. Our work aims to answer specific questions about the types of MSF complexes based on the currently known more than 200 examples. Are there intrinsic classes of MSF complexes or are all known examples basically unique in terms of sequence and structure? If meaningful groups are definable in an objective way, what are the characteristics of each group in terms of sequence composition and adopted structure? In addition, how is the formation of MSF complexes regulated? Are mechanisms known to be important for other molecular interactions relevant to these complexes as well? If so, are there differences between various MSF groups regarding these regulatory mechanisms and other biologically relevant properties, such as binding strength and subcellular localization?
