**About the Editor**

**Pier Luigi Gentili** received his Ph.D. in Chemistry from the University of Perugia in 2004. His research and teaching activities are focused on Complex Systems. He is the author of the book "Untangling Complex Systems: A Grand Challenge for Science" published by CRC Press in 2018. Being aware that inanimate matter is driven by force-fields, whereas the interactions between biological systems are also information-based, Gentili is led by some questions like the following: When does a chemical system become intelligent? Is it possible to develop a "chemical artificial intelligence?" For the development of chemical artificial intelligence, Pier Luigi Gentili relies upon the theory and tools of Natural Computing. In particular, he is tracing a new path in the field of Neuromorphic Engineering by using non-linear chemical systems and by encoding information mainly through UV-visible signals. He is proposing methods to process fuzzy logic by molecular, supramolecular, and systems chemistry.

## *Editorial* **The Fuzziness in Molecular, Supramolecular, and Systems Chemistry**

#### **Pier Luigi Gentili**

Department of Chemistry, Biology, and Biotechnology, Università degli Studi di Perugia, Via elce di sotto 8, 06123 Perugia, Italy; pierluigi.gentili@unipg.it; Tel.: +39-075-585-5573

Received: 6 August 2020; Accepted: 7 August 2020; Published: 10 August 2020

#### **1. Introduction**

The global challenges of the XXI century require a more in-depth analysis and investigation of complex systems [1]. A promising research line to better understand complex systems, and propose new algorithms and computing devices is natural computing. Natural computing is based on a fundamental rationale: every causal phenomenon can be conceived as a computation and every distinguishable physicochemical state of matter and energy can be used to encode information. Any physicochemical law can be exploited to make computations. For instance, quantum mechanics laws can be exploited to make quantum computing; the chemical kinetic laws can be used to make chemical computing; the laws of chaos to make chaos-computing, etc. On the other hand, we might draw inspiration from living beings with the exclusive attribute of using matter and energy to encode, collect, store, process, and send information [1,2]. Living beings show different information systems. Their basic information system is the cell, also called the biomolecular information system (BIS). In most multicellular organisms, we encounter nervous systems that constitute neural information systems (NISs). The defense systems that help repel antigens and disease-causing organisms are defined as immune information systems (IISs). Finally, most living beings live in societies, and the resulting aggregations constitute the so-called social information systems (SISs).

#### **2. Artificial Intelligence and Fuzzy Logic**

Among the natural information systems, particularly alluring for facing XXI century challenges, is the human nervous system (HNS). Its performances are astonishing. Based on a complex architecture of billions of nerve cells, our nervous system allows us to handle accurate and vague information by computing with numbers and words. Furthermore, it allows us to recognize variable patterns quite easily and make decisions in complex situations. Therefore, it is worthwhile trying to understand how it works and mimic it by developing artificial intelligence (AI). Within AI, fuzzy logic stands out as a good model of the human ability to compute with words and make decisions in complex circumstances [3,4]. Its descriptive and modeling power hinges on the structural and functional analogies it has with the HNS [5,6]. The entire architecture of the HNS is related to that of any fuzzy logic system. Our natural sensors play as fuzzifiers, our brain as a fuzzy inference engine, and our effectors as defuzzifiers. Every sensory system, physical and chemical, such as the visual or olfactory system, is constituted by a tissue of a spatially distributed array of sensory cells that behave as fuzzy sets [5,6]. Within each sensory cell, there is a multitude of sensory proteins that work as molecular fuzzy sets. The multiple information of any stimulus, i.e., its modality, intensity, spatial distribution, and time-evolution, is encoded hierarchically as degrees of membership to the molecular and cellular fuzzy sets. The imitation of these features allows us to design new artificial sensory systems with enhanced discriminative power due to different molecular fuzzy sets' parallel activity. A concrete example is the recent implementation of biologically inspired photochromic fuzzy logic systems that extend human vision to the UV [7,8].

#### **3. Neuromorphic Engineering and Chemical Artificial Intelligence**

The mimicry of nonlinear neural dynamics is a promising alternative strategy to approach human intelligence performances. Surrogates of neurons can be achieved through either oscillatory or excitable or chaotic chemical systems in solution (i.e., wetware) [9,10] or the solid phase (i.e., hardware) [11–13]. In this Special Issue, Szaciłowski and his team present an experimental characterization of an optoelectronic device, constituted by a polycrystalline cadmium sulfide electrode [14]. Such a device realizes a type of short-term plasticity, i.e., the paired-pulsed facilitation (PPF). The PPF consists of an enhancement in the postsynaptic current when the excitatory signal frequency increases. This short-term memory effect confers to the device an appreciable power of recognizing hand-written numbers. Szaciłowski's work blazes a trail for the optoelectronic implementation of neural network architectures that will allow the processing of fuzzy logic and recognition of variable patterns. Suffice to think that fuzzy logic has already been implemented through a pacemaker neuron model, such as the Belousov-Zhabotinsky reaction [15], and a chaotic neuron model, such as the "photochemical oscillator" [16,17]. When UV–visible radiation is chosen as a signal, it is straightforward to implement neuromodulation [18] and hence, fuzzy logic.

In the orthodox AI, fuzzy logic is processed through software running in digital electronic computers; it is even better if the electronic circuits are analog, since fuzzy logic is an infinite-valued logic. In the burgeoning field of chemical artificial intelligence (CAI) [19], unconventional chemical systems have been put forward for implementing fuzzy logic systems. Some examples can be found in the references [20–26]. The fundamental requirement is to have smooth analog input–output relationships between physicochemical variables, either linear or hyperbolic, but certainly not sigmoid. Sigmoid functions are adequate for processing discrete logics [27,28].

#### **4. Cellular Fuzziness**

The relentless investigation of the working principles of the cells or BISs has been revealing cellular fuzziness. Some proteins play within any cell as if they were the neurons of the "cellular nervous system" [6]. They are the protagonists of the signaling and genetic networks, and they make the cell capable of responding to ever-changing environmental conditions. As Fuxreiter points out in her perspective included in this Special Issue [29], often, a protein exists as a heterogeneous ensemble of conformers. For these proteins, the deterministic inference "amino-acidic sequence → 3D structure → function" is not applicable. In fact, the conformational ensemble may perform multiple functions, depending on the context. Such a collection of conformers looks like a macromolecular fuzzy set. The dynamical power of a protein to autonomously select a context-dependent function constitutes what we might name as its fuzzy inference engine. In their review, Jeffery and Liu tell us that there are moonlighting proteins in the metabolic network of a cell, in which one polypeptide chain performs more than one physiologically relevant biochemical or biophysical function [30]. The kind of function that is executed might depend on cellular localization, concentrations of substrates or ligands, or environmental stress. Any type of moonlighting protein is fuzzy because some of its copies can perform one function, some another, and some both functions simultaneously. As cellular conditions change due to metabolism and environmental conditions, the functions of these proteins change as well. Uversky informs us that within a cell, the supramolecular interactions between specific intrinsically disordered proteins and hybrid proteins, having ordered domains and intrinsically disordered protein regions, drive biological liquid–liquid phase transitions that form proteinaceous membrane-less organelles (PMLOs) [31]. PMLOs are intracellular hot spots that serve as organizers of cellular biochemistry. Such PMLOs are fuzzy, and their fuzziness resides in their compositional and compartmental variety and variability. Dodero and her team made the tangible experience of supramolecular fuzziness by investigating the interaction between a Transcription Factor and double-stranded DNA [32]. After annealing a proper DNA sequence and synthesizing a photosensitive surrogate of the GCN4 Transcription Factor, Dodero and her colleagues furnish experimental evidence of the protein-DNA complexation fuzziness by using different techniques, such as NMR, electrophoretic

mobility shift assay, and circular dichroism spectroscopy. To monitor the conformational fuzziness of macromolecules and smaller molecules, Gentili relies upon the maximum entropy method to extract the distributions of conformers from any kinetic trace [6,33]. After determining the distribution, quantifying its fuzzy entropy is also possible [6,34].

#### **5. Non-Arrhenius Kinetics**

If we consider the conformational distributions of compounds, the original transition-state theory and the Arrhenius law might appear far-fetched. There is a peculiar distribution of conformers at every temperature, and every conformer traces its unique reactive path. It is not fair to define just one kinetic constant and one activation energy for all the coexistent conformers. It is necessary to add that both the original transition-state theory and the Arrhenius law have been already questioned by the most recent theoretical and experimental developments, as evidenced by Carvalho-Silva, Coutinho, and Aquilanti [35]. Quantum mechanical effects, such as tunneling and resonance, stochastic motions of particles in condensed environments, and non-equilibrium effects in classical and quantum formulations, are responsible for deviations from the traditional Arrhenius equation. In such situations, the transitivity function, defined in terms of the reciprocal of the apparent activation energy, measures the propensity for a reaction to proceed. The transitivity function provides a tool for implementing phenomenological kinetic models. In reference [36], Machado, Sanches-Neto, Coutinho, Mundim, Palazzetti, and Carvalho-Silva document the general scope of a transitivity code that can estimate the kinetic and thermodynamic parameters of physicochemical processes and deal with non-Arrhenius behavior.

#### **6. Conclusions and Perspectives**

This Special Issue's multidisciplinary contributions highlight that the theory of fuzzy set and fuzzy logic are valuable conceptual tools to understand the molecular and supramolecular world. Of course, quantum-mechanics already exists for this purpose, but fuzzy logic is becoming an alternative approach that might have still undiscovered common points with quantum logic [37,38]. Fuzzy logic appears particularly suitable for dealing with conformers. Although this approach is in its infancy, it is worthwhile pursuing it. It will allow us to describe any cell's activities, the constitutive elements of the human nervous system, and the immune system's performances more deeply. Such knowledge will be translated into new strategies to control the cellular processes and develop chemical artificial intelligence and chemical robots [6]. If cutting-edge technologies emerge from this approach, then, biomolecular, supramolecular, and systems chemistry will surely be considered fuzzy worldwide!

**Funding:** This research was funded by ANVUR grant number n.20/2017.

**Acknowledgments:** P.L. Gentili acknowledges all the contributors to this Special Issue, the anonymous reviewers of the papers published within this Special Issue, and Lola Huo for her valuable editorial assistance.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Perspective* **The Fuzziness of the Molecular World and Its Perspectives**

#### **Pier Luigi Gentili**

Dipartimento di Chimica, Biologia e Biotecnologie, Università di Perugia, Via Elce di sotto 8, 06123 Perugia, Italy; pierluigi.gentili@unipg.it; Tel.: +39-075-585-5573

Received: 30 July 2018; Accepted: 17 August 2018; Published: 19 August 2018

**Abstract:** Scientists want to comprehend and control complex systems. Their success depends on the ability to face also the challenges of the corresponding computational complexity. A promising research line is artificial intelligence (AI). In AI, fuzzy logic plays a significant role because it is a suitable model of the human capability to compute with words, which is relevant when we make decisions in complex situations. The concept of fuzzy set pervades the natural information systems (NISs), such as living cells, the immune and the nervous systems. This paper describes the fuzziness of the NISs, in particular of the human nervous system. Moreover, it traces three pathways to process fuzzy logic by molecules and their assemblies. The fuzziness of the molecular world is useful for the development of the chemical artificial intelligence (CAI). CAI will help to face the challenges that regard both the natural and the computational complexity.

**Keywords:** fuzzy logic; complexity; chemical artificial intelligence; human nervous system; fuzzy proteins; conformations; photochromic compounds; qubit

#### **1. Introduction**

The scientific method, officially born in the seventieth century with the contributions of Galileo Galilei and Isaac Newton, has allowed humanity to become acquainted with the natural phenomena as never before. The acquisition of new scientific knowledge has also promoted an outstanding technological development in the last three hundred years or so. A mutual positive feedback relationship subsists between science and technology. To date amazing scientific and technological achievements have been reached. For example, we can explore the regions of the universe that are 1026 m far apart from us. At the same time, we can detect subatomic particles that have radii of the order of 10−<sup>15</sup> m. We can record microscopic phenomena that occur in 10−<sup>18</sup> s, but we can also retrieve traces of cosmic events happened billions of years ago. Our technology allows us to send robots to other planets of our solar system (e.g., the NASA Spirit rover on Mars), manipulate atoms and interfere with the expression of genes in living beings. Despite many efforts, there are still challenges that must be won. For instance, (I) we cannot predict catastrophic events on Earth (such as earthquakes and volcanic eruptions); (II) we strive to avoid the climate change; (III) we would like to exploit the energy and food resources without deteriorating the natural ecosystems and their biodiversity; (IV) there are diseases that are still incurable; (V) we would like to eradicate the poverty in the world; (VI) we make efforts to avoid or at least predict both economic and political crisis. Whenever we try to address such challenges, we experience frustrating insurmountable obstacles. Why? Because whenever we cope with one of them, we deal with a complex system. A complex system is one whose science is unable to give a complete and accurate description. In other words, scientists find difficulties in rationalizing and predicting the behaviors of complex systems. Examples of complex systems are the geology and the climate of the Earth; the ecosystems; each living being, in particular humans, giving rise to economic and social organizations, which are other examples of complex systems. The description of complex

systems requires the collection, manipulation, and storage of big data [1], and the solution of problems of computational complexity. The description of complex systems from their ultimate constituents, i.e., atoms, is beyond our reach since the computational cost grows exponentially with the number of particles [2]. Moreover, many complex systems exhibit variable patterns. These variable patterns are objects (both inanimate and animate) or events whose recognition is made difficult by their multiple features, variability, and extreme sensitivity on the context. We still lack universally valid and effective algorithms for recognizing variable patterns [3]. Therefore, the obvious question is: How can we try to tackle the challenges regarding complex systems which involve issues of computational complexity? There are two principal strategies [4,5]. One consists in improving current electronic computers to make them faster and faster, and with increasingly large memory space. The other strategy is the interdisciplinary research line of natural computing. Researchers working on natural computing draw inspiration from Nature to propose: (I) new algorithms, (II) new materials and architecture for computing, and (III) new models to interpret complex systems. The sources of inspiration are the natural information systems, such as (a) the cells (i.e., the biomolecular information systems or BIS), (b) the nervous system (i.e., the neural information systems or NIS), (c) the immune system (i.e., the immune information systems or IIS), and (d) the societies (i.e., the societal information systems or SIS). Alternatively, we may exploit any causal event, involving inanimate matter, to make computation. In fact, in a causal event, the causes are the inputs and the effects are the outputs of a computation whose algorithm is defined by the laws governing the transformation (see Figure 1).

**Figure 1.** The contribution of the natural computing in coping with the challenges of the computational and natural complexity.

Among the natural information systems, the attention of many scientists worldwide is focused on the human nervous system that has human intelligence as its emergent property. The imitation of human intelligence is having a revolutionary impact in science, medicine, economy, security and well-being [6]. In fact, conventional quantitative techniques of system analysis are intrinsically unsuited for dealing with biological, social, economic, and any other type of system in which it is the behavior of the animate constituents that plays a dominant role. For such "humanistic systems", the principle of incompatibility holds [7]: as the complexity of a system increases, our ability to make accurate and yet significant statements about its behavior diminishes until a threshold is reached beyond which accuracy and significance (or relevance) become almost mutually exclusive characteristics. An alternative approach is based on the human intelligence that has the remarkable power of handling both accurate and vague information. Information is vague when it is based on sensory perceptions. Vague information is coded through the words of our natural languages. Therefore, humans compute by using not only numbers but also and especially words. We have the remarkable capability to reason, speak, discuss and make rational decisions without any quantitative measurement and any numerical computation, in an environment of uncertainty, partiality, and relativity of truth. Moreover, we recognize quite easily variable patterns, such as human faces and voices. Therefore, a major challenge of the artificial intelligence research line is the comprehension and implementation of the capabilities of the human intelligence to compute with words [8]. The use of classical, Aristotelian, divalent logic implemented in electronic circuits and computers has allowed reproducing and even overcoming the human ability to compute with numbers. The imitation of human ability to compute with words is still challenging. Fuzzy logic is a good model. In fact, fuzzy logic has been defined as a rigorous logic of the vague and approximate reasoning [9]. In this paper, after describing the principal features of fuzzy logic, it is demonstrated that one reason why fuzzy logic is a valid model of the human power to compute with words can be found at the molecular level. Therefore, we propose the use of molecular, supramolecular, and chemical systems as an innovative strategy for implementing fuzzy logic. This article wants to pursue the idea of developing a chemical artificial intelligence [10], i.e., an artificial intelligence that is based not on electronic circuits and software, but on chemical reactions in a wetware. Probably, the chemical artificial intelligence will promote the design of a new generation of computational machines, more similar to the brain rather than to the electronic computers. These new brain-like "chemical computers" should help to cope with the challenges regarding the complex systems, aforementioned in this Introduction.

#### **2. Some Features of Fuzzy Logic**

Fuzzy logic is based on the theory of fuzzy sets proposed by the engineer Lotfi Zadeh in 1965 [11]. A fuzzy set is different from a classical Boolean set. A classical set, also named as a crisp set, is a container that wholly includes or wholly excludes any given element. The theory of classical sets is based on the Law of Excluded Middle formulated by Aristotle in the fourth century BC. The Law of Excluded Middle states that an element x belongs to either set S or to its complement, i.e., set not-S. Zadeh proposed a refinement of the theory of the classical sets. In fact, a fuzzy set is more than a classical set: it can wholly include or wholly exclude elements, but it can also partially include and exclude other elements. The theory of fuzzy sets breaks the Law of Excluded Middle because an element x may belong to both set S and its complement not-S. An element x may belong to any set, but with different degrees of membership. The degree of membership (μ) of an element to a fuzzy set can be any real number included between 0 and 1. If μ = 0, the element does not belong at all to the set; if μ = 1, it completely belongs to the set; if 0 < μ < 1, the element belongs partially to the set. The Law of Excluded Middle is the foundation of the binary logic. In binary logic any variable is partitioned in two classical sets after fixing a threshold value: one set includes all the values below the threshold, whereas the other one contains those above. In the case of a positive logic convention, all the values of the first set become the binary 0, whereas those of the other set become the binary 1. The shape of a classical set is like that shown in Figure 2A. The degree of membership function for such a set discontinuously changes from 0 (below the threshold) to 1 (above the threshold). On the other hand, fuzzy sets can have different shapes. They can be sigmoidal, triangular, trapezoidal, Gaussian (see Figure 2), to cite a few. For a fuzzy set, the degree of membership function (μ) changes from 0 to 1. μ is the fuzzy unit of information, called "fit". It derives that fuzzy logic is an infinite-valued logic.

**Figure 2.** Shapes of the membership functions (μ) for a generic variable x: the case of a classical Boolean set in **A**; examples of fuzzy sets with sigmoidal, triangular, trapezoidal, and Gaussian shapes are shown in **B**–**E** plots, respectively.

Fuzzy logic can be used to describe any non-linear cause and effect relationship by building a fuzzy logic system (FLS). The construction of an FLS requires three fundamental steps. First, the granulation of all the variables in fuzzy sets. The number, position, and shape of the fuzzy sets are context-dependent. Second, the graduation of all the variables. A word, often an adjective, labels every fuzzy set. Third, the relationships between input and output fuzzy sets are described through syllogistic statements of the type "IF ... , THEN ... .", called fuzzy rules. The "IF ... " part is the antecedent and involves the linguistic labels chosen for the input fuzzy sets. The "THEN ... " part is the consequent and involves the linguistic labels chosen for the output fuzzy sets.

When we have multiple inputs, these are connected through the AND, OR, NOT operators [12]. AND corresponds to the intersection (e.g., the intersection of two fuzzy sets, whose membership functions are *<sup>μ</sup>S*<sup>1</sup> and *<sup>μ</sup>S*<sup>2</sup> , can be *<sup>μ</sup>S*1∩*S*<sup>2</sup> = min- *μS*<sup>1</sup> , *μS*<sup>2</sup> or *μS*1∩*S*<sup>2</sup> = *μS*<sup>1</sup> × *μS*<sup>2</sup> ); OR corresponds to the union (e.g., the union of the two sets S1 and S2 can be *<sup>μ</sup>S*1∪*S*<sup>2</sup> = max- *μS*<sup>1</sup> , *μS*<sup>2</sup> or *μS*1∪*S*<sup>2</sup> = *μS*<sup>1</sup> + *μS*<sup>2</sup> − *μS*<sup>1</sup> × *μS*<sup>2</sup> ); NOT corresponds to the complement (e.g., the membership function for the Fuzzy complement of S is *μ<sup>S</sup>* = 1 − *μs*). Fuzzy rules may be provided by experts or can be extracted from numerical data. After the granulation, the graduation of all the input and output variables, and the formulation of the fuzzy rules, we have a FLS that is a predictive tool or a decision support system for the particular phenomenon it describes. The way an FLS works is schematically depicted through an example in Figure 3.

**Figure 3.** The flow of information in a fuzzy logic system where AND, OR and the implication have been implemented through the minimum, the maximum, and the minimum operators, respectively.

The information flows along the path traced by the arrows. First, the two crisp inputs are transformed in degrees of membership to the input fuzzy sets. This step is the so-called fuzzification process. It turns on all the fuzzy rules that involve the input Fuzzy sets "activated" by the crisp inputs. Second, the logic operators (AND, OR in Figure 3) combine the degrees of membership of the input fuzzy sets belonging to the two input variables. Third, the fuzzy implication method transforms the output fuzzy sets of each activated fuzzy rule through either the minimum or the product operator (in Figure 3, the minimum operator is used). Fourth, the activated output fuzzy sets are in turn aggregated through the maximum operator. Finally, the defuzzification procedure coverts the output Fuzzy sets in a crisp output value. The defuzzification method can be "the mean of the maxima", "the centroid", and others (for more information, see the tutorial by Mendel [12]). In a control-system application, the crisp output corresponds to a control action. In a signal processing application, such a number corresponds to a forecast or the location of a target. Fuzzy logic systems with adaptive capabilities are also used to predict chaotic time series [13,14]. The Fuzzy logic rules work as patches covering the chaotic attractors in their phase space. The rules are established through a learning procedure requiring a training data set.

The simulation and analysis of the dynamics of complex systems can be accomplished by the fuzzy cognitive maps (FCMs) [15]. The FCMs are an extension of the cognitive maps introduced by Axelrod [16]. An FCM is a graph, which consists of nodes and edges. The nodes represent concepts relevant to a given complex system, and edges represent the causal relationships among the nodes. Each edge is associated with a number that determines the degree of causal relation. The strengths of the relationships are usually normalized to the [−1, +1] range. Value of −1 is full negative, +1 full positive, and 0 denotes no causal effect. The structure of an FCM is represented by a square matrix, called connection matrix, which reports all the weight values for edges between corresponding concepts represented by rows and columns. A complex system with *n* nodes will be represented by *n* × *n* connection matrix. The prediction of the evolution of a complex system is carried out after assigning (I) a vector of initial values to the states of the nodes and (II) a function that transforms the product of the connection matrix with the vector of the initial values into a vector representing the values of the nodes at an instant later. The transformation function can be discrete (such as the

Heaviside function) or continuous (such as the logistic function). In the case of discrete functions, the complex systems can evolve into an attractor constituted by a stable node or limit cycle. In the case of continuous functions, even strange attractor can emerge [17].

Both fuzzy logic systems and fuzzy cognitive maps can be built either by human experts or automatically through learning algorithms. It may happen that the membership functions of the fuzzy sets are not certain but have definite degrees of uncertainty. For these cases, Zadeh introduced [18] the concept of type-2 fuzzy sets that is an extension of the concept of an ordinary fuzzy set, i.e., a type-1 fuzzy set. Type-2 fuzzy sets have grades of membership that are themselves fuzzy. At each value of the primary variable *x*, the membership is a function and not just a point value: it is the secondary membership function (*w*). The domain of *w* is in the interval [0, 1] and its range is also in [0, 1] (see Figure 4). Therefore, the membership function of a type-2 fuzzy set is three-dimensional [19]. If projected on a plane, it gives rise to the footprint of uncertainty, which is bound by a lower membership function (LB) and an upper membership function (UB). In Figure 4, LB and UB are represented as continuous black lines. The footprint of uncertainty embeds the type-1 fuzzy set delimited by dashed lines. Type-2 fuzzy sets find many applications in intelligent control, pattern recognition, intelligent manufacturing, time series prediction, and other fields [20].

**Figure 4.** An example of type-2 fuzzy set. The original type-1 fuzzy set is the dashed triangular set. The lower (LB) and upper (UB) bounds define the footprint of uncertainty. The plot on the left shows the trend of the secondary membership (*w*) when *x* = *x*

#### **3. Fuzzy Logic and the Human Nervous System**

.

Fuzzy logic is a valid model of the human capability to compute with words because there are structural and functional analogies between the human nervous system (HNS) and a Fuzzy logic system [21,22]. The HNS is a complex network of billions of nerve cells distributed throughout our organism [23]. It monitors the environment and our body, and it masters our behavior after collecting information, processing it, taking decisions. The HNS comprises three elements: (I) the sensory system; (II) the central nervous system; (III) the effectors' system. The sensory system catches physical and chemical signals and transduces them in electrochemical information that is sent to the brain. Into the brain, information is integrated, stored and processed. The outputs of the cerebral computations are electro-chemical commands sent to the components of the effectors' system, i.e., glands and muscles. Our sensory system encompasses eight sensory subsystems: a visual system to detect light; an olfactory and a gustatory system to probe chemicals in the air we breathe and in what we uptake through our mouth, respectively; an auditory, tactile, and proprioceptive system provided with mechanoreceptors that perceive either steady or vibrating or instantaneous mechanical forces; thermoreceptors to distinguish cold from warm stimuli; nociceptors to alert our body in the presence of noxious situations. Each sensory subsystem has a hierarchical structure. At the lowest level, there is a collection of receptor proteins. At an upper level, there are receptor cells that contain several replicas

of the receptor molecules. We have many copies of the receptor cells properly distributed in space, often covering a tissue. The tissue may be located in an organ provided with an accessory structure that conveys the stimuli to the receptor cells. Every sensory subsystem encodes four aspects of a stimulus: its modality (*M*), intensity (*IM*), spatial distribution (*IM*(*x*, *y*, *z*)), and time evolution (*IM*(*t*)). This multiple information is encoded hierarchically. In fact, the modality is encoded at the molecular level. The ensemble of the molecular receptors of a specific sensory subsystem works as a collection of molecular fuzzy sets: they granulate the modality of the kind of stimulus they sense. Signals that are perceived by the same sensory subsystem but have distinct modalities belong to the collection of the molecular fuzzy sets at different degrees. In other words, the modality of the signals is encoded as fuzzy information at the molecular level through the molecular Fuzzy sets that work in parallel.

An example is shown in Figure 5. It regards our visual system. The modality is the spectral composition of the light. We have three types of photoreceptor proteins, labeled as "Blue", "Green", and "Red", respectively. They allow us to distinguish colors. Their absorption spectra granulate the visible spectral region in three molecular fuzzy sets. Each band is due to the vibrational energies of the lowest excited π∗ state of the retinal chromophore. Light beams having distinct spectral compositions belong to the three molecular fuzzy sets at different degrees (in Figure 5, the memberships of a green and a red light are depicted).

**Figure 5.** Absorption spectra of the "Blue", "Green", and "Red" photoreceptors that partition the visible spectral region in three fuzzy sets. Beams having different colors belong to the three molecular Fuzzy sets at different degrees. The degrees of membership of one pure green and one pure red beam to three Fuzzy sets are shown (see the arrows).

In living cells, when a stimulus actively interplays with a molecular receptor that is a protein, it promotes its structural change. Within cells, there are several copies of the molecular sensors (see Figure 6A). The number of molecular receptors that are activated in a cell depends on the intensity of the stimulus. Each cell plays like a cellular fuzzy set, and the degree of membership of a stimulus to a cellular fuzzy set encodes the intensity of the stimulus. The molecular structural modifications induced by the stimulus trigger intracellular cascade reactions, finally modifying the electrochemical permeability of the receptor cells membranes. The extent of the change in the electrochemical permeability depends on how many molecular receptors have changed their structure and hence on the intensity of the stimuli. The receptor cells produce graded potentials that are analog signals. The information of such signals is usually converted in the firing rate of the action potential trains. Often, the action potentials are produced by an architecture of afferent neurons that integrate the information regarding the spatial distribution of the stimuli (see Figure 6B). In fact, every afferent neuron has a receptive field that works as a fuzzy set encompassing specific receptor cells. For instance, in the visual subsystem, the photoreceptor cells are granulated by the bipolar cells. Light shining on

the center of a bipolar cell's receptive field and light shining on its surround produce opposite changes in the cell's membrane potential. The purpose of the bipolar fuzzy sets is to improve the contrast and definition of the visual stimuli. The center-surround structure of the receptive fields of the bipolar cells is transmitted to the ganglion cells. The accentuation of contrasts by the center-surround receptive fields of the bipolar cells is thereby preserved and passed on to the ganglion cells. The presence of overlapping receptive fields (like overlapping fuzzy sets) allows processing the information of a light stimulus in parallel and increasing the acuity by highlighting the contrasts in space and time. The action potentials generated by the afferent neurons are the ideal code for sending the information up to the brain. In the cerebral cortex, there are areas having different intrinsic rhythms [24–26]. They form a neural dynamic space partitioned in overlapped cortical fuzzy compartments (see Figure 6C). Such cortical fuzzy sets are activated at different degrees by separate attributes of the perceptions and produce a meaningful experience of the external and internal worlds.

**Figure 6.** Scheme of the action of a sensory subsystem made of three principal elements described as three collections of fuzzy sets. First, the sensory cellular Fuzzy sets (**A**) that encode the information of a signal as graded potentials. Second, the afferent neurons (**B**) whose receptive fields are fuzzy sets: they encode the information as firing rates of the action potential trains. Third, the cortical areas (**C**) that are partitioned in different dynamic regimes giving rise to an infrastructure of fuzzy sets encoding distinct syntactic and semantic attributes of the original signals.

Based on this description, it might seem that sensory perception is objective, universal, reproducible, and deterministic. However, this is not the case. In fact, sensory perception depends on the physiological state of the perceiver, his/her past experiences, and each sensory system is unique and not universal. Moreover, every human brain must deal with the uncertainty in the perception. Under uncertainty, an efficient way of performing tasks is to represent knowledge with probability distributions and acquire new knowledge by following the rules of the probabilistic inference [27,28]. Therefore, it is reasonable to assume that the human brain performs probabilistic reasoning, and the human perception can be described as a subjective process of Bayesian probabilistic inference [29,30]. In fact, the frequentist probability can be used only in the case of a large number of trials. According

to the Bayesian probabilistic inference, the perception of a signal *IM*(*x*, *y*, *z*, *t*) by cortical cells *CCM* is given by the "posterior probability" *p*(*IM*|*CCM*):

$$p(I\_M|\text{CC}\_M) = \frac{p(\text{CC}\_M|I\_M)p(I\_M)}{p(\text{CC}\_M)},\tag{1}$$

In (1), *p*(*CCM*|*IM*) is the "likelihood", *p*(*IM*) is the "prior probability", and *p*(*CCM*) is the "plausibility". The plausibility is only a normalization factor. In agreement with the theory of Bayesian probabilistic inference generalized in fuzzy context [31], the likelihood may be identified with the hierarchical and deterministic fuzzy information described previously in this paragraph (see also Figure 7). The prior probability *p*(*IM*) comes from the knowledge of the regularities of the signals and represents the influence of the brain on human perception. In fact, human perception is a trade-off between the likelihood and the prior probability [32]. If the likelihood represents the deterministic and objective part of the human perception, on the other hand, the prior probability represents its subjective contribution. The noisier and ambiguous are the features of a signal, the more prior probability driven will be the perception, and the less reproducible and universal will be the sensation.

**Figure 7.** Hierarchical mechanism of encoding the information of a stimulus.

Sometimes, we receive multimodal signals that interact with more than one sensory subsystem. Each activated sensory subsystem produces its own mono-sensory fuzzy information. Physiological and behavioral experiments have shown that the brain integrates the mono-sensory perceptions to generate the final sensation [33]. Multisensory processing pieces signals of different modality if stimuli fall on the same or adjacent receptive fields (according to the "spatial rule") and within close temporal proximity (according to the "temporal rule"). Since sensory modalities are not equally reliable, and their reliability can change with context, multisensory integration involves statistical issues, and it is often assumed to be a Bayesian probabilistic inference [34]. Clearly, the experience of the world is influenced by the past perceptive events, stored in the memory presumably under the shape of fuzzy rules. These stored events and rules confer to the humans the remarkable power of making decisions in complex situations and recognizing variable patterns.

#### **4. The Methodologies to Implement Fuzzy Sets and Process Fuzzy Logic at the Molecular Level**

Fuzzy logic is routinely implemented in digital electronic circuits. However, the best accomplishments of FLSs have been achieved through analog electronic circuits. Whereas the digital circuits are based on electrical signals that vary steeply in sigmoid manner, the analog circuits are based on signals that vary smoothly in hyperbolic or linear manner. The analog circuits guarantee the best implementations of an infinite-valued logic that is fuzzy logic.

In the recent years, fuzzy logic has been implemented by using even molecules and chemical reactions. Three principal strategies can be outlined:

The first strategy is an imitation of the sensory subsystems described in the previous paragraph. In every sensory subsystem, there is a collection of distinct sensory cells that works as an ensemble of cellular fuzzy sets embedding molecular fuzzy sets. The cellular fuzzy sets work in parallel. The information of a stimulus is encoded as a vector of degrees of membership of the stimulus to the cellular fuzzy sets. This strategy will be called the "fuzzy parallelism" approach.

The second strategy is an imitation of how the proteins work in the immune and the biomolecular information systems. Almost every protein is a fuzzy set because it exists as an ensemble of many conformers that have context-dependent dynamic behavior. The macromolecular conformers are adaptable and subjected to the laws of the natural selection. They are the "words" of the cellular language. The imitation of the proteins of the cells and the immune system allows to implement the so-called "conformational fuzziness" strategy.

Finally, the third strategy derives from the fuzziness of the quantum world and it will be called "quantum fuzziness". When superimposed quantum states undergo decoherent phenomena, it is possible to exploit heaps of molecules to process fuzzy logic through macroscopic, smooth, analog input and output variables.

Examples of the three strategies are described in the following three subparagraphs.

#### *4.1. The "Fuzzy Parallelism" Approach*

In Section 3, we have discovered that the absorption bands of the three photoreceptor proteins present on the fovea of the retina play as three molecular fuzzy sets. Lights that differ in their spectral compositions belong to the three bands at distinct degrees, and they are perceived as different colors. Moreover, the millions of replicas of the three photoreceptor proteins within each photoreceptor cell allow determining the intensity of the signals at every wavelength. The imitation of the way we distinguish colors has allowed the design and implementation of chemical systems that extend human vision to the UV [35,36]. Such chemical systems are based on direct thermally reversible photochromic compounds. A thermally reversible photochromic compound is a species that in the absence of any radiation, it exists in a structure (i.e., A in Figure 8) that absorbs just in the UV and it is uncolored. Upon UV, it transforms in B that also absorbs in the visible region. When B is formed, the system becomes colored (see Figure 8). The transformation of A into B is thermally reversible. In other words, if we discontinue the UV irradiation, the color bleaches because the B molecules transform back to the original structure A, spontaneously at room temperature. Mixtures of properly chosen direct thermally reversible photochromic compounds extend the human capability of distinguishing electromagnetic spectra to the UV region. Such mixtures, called biologically inspired photochromic fuzzy logic (BIPFUL) systems, are designed by the following procedure. First, the absorption bands of the uncolored forms, *Ai*, are assumed to be input fuzzy sets. Second, the absorption bands of the colored forms, *Bi*, are assumed to be output fuzzy sets. Third, the algorithm expressing the degree of membership of the UV radiation, having intensity *I*0(*λirr*) at the wavelength *λirr*, to the absorption band of the *Ai* compound is:

$$
\mu\_{UV, A\_i} = \Phi\_{P\mathbb{C}, A\_i}(\lambda\_{irr}) \, l \mathbf{0}\_{(irr)} \left(1 - 10^{-x\_{A\_i}\mathbb{C}\_{0, i}l}\right), \tag{2}
$$

**Figure 8.** Example of a direct thermally reversible photochromic compound.

In Equation (2), Φ*PC*,*Ai* (*λirr*) is the photochemical quantum yield of photo-coloration for *Ai*, *εAi* is the absorption coefficient at *irr* for the *Ai* photochromic species, and *C*0,*<sup>i</sup>* is its analytical concentration. Finally, the equation expressing the activation of the *Bi* output fuzzy sets is:

$$A\_{B\_i} = \frac{\varepsilon\_{B\_i}(\lambda\_{nn})}{k\_{\Delta, i}} \mu\_{UV\_r\mathcal{A}\_i}.\tag{3}$$

In Equation (3), *ABi* is the absorbance at the wavelength *λan* into the visible and due to the coloured form of the *i*-th photochromic species; *εBi* (*λan*) is its absorption coefficient, and *k*Δ,*<sup>i</sup>* is the kinetic constant of the bleaching reaction for *Bi*. Each absorption spectrum recorded at the photo-stationary state will be the sum of as many terms represented by equation (3) as there are photochromic components within the BIPFUL system. The BIPFUL systems that have been devised are made of naphthopyrans and spiroxazines, and they allow to discriminate the three regions of the UV spectrum, i.e., UV-A, UV-A, UV-B, and UV-C.

The imitation of all the other sensory subsystems, conceived as hierarchical fuzzy systems where a collection of distinct molecular and cellular fuzzy sets work in parallel (see Section 3), should allow to devise artificial sensory systems that have the power of extracting the essential features of stimuli and recognizing variable patterns.

#### *4.2. "Conformational Fuzziness"*

Within every living cell, there are many proteins that work as if they were the neurons of the "cellular nervous system". They participate in the signaling and genetic networks and allow the cell to respond to the ever-changing environmental conditions. Specific proteins, called antibodies, are also the fundamental ingredients of the immune system that protects our bodies from intruders. A limited set of flexible antibodies can bind a wide range of antigens. Proteins are ubiquitous in living beings and they play multiple roles, due to their "dynamism and evolvability" [37]. In fact, proteins are conformationally dynamic and exhibit functional promiscuity. Conformational dynamism and heterogeneity enable context-specific functions to emerge in response to changing environmental conditions and, furthermore, allow a single structural motif to be used in multiple settings [38]. The conformational flexibility and heterogeneity of proteins represent their fuzziness.

Conformational fuzziness is not a prerogative feature of proteins. Even the long polymer of chromatin in the nucleus of eukaryotic cells is Fuzzy. Some portions contain heterochromatin made of DNA packed tightly around histones. Some other areas contain euchromatin that is DNA loosely packed. Usually, genes in euchromatin are active, whereas those in heterochromatin are inactive. Euchromatin exposes a broader and rougher surface to the proteins scanning for their target sequences. Heterochromatin is flatter, smoother, and with a less extended surface [39]. Chromatin organization is highly dynamic, varying both during the cell cycle and among different cell types [40].

Conformational fuzziness is not unique to macromolecules, but it can be experienced even with simple molecules. An example is the fuzziness of the merocyanine (MC) that is generated by UV irradiation of the spirooxazine (SpO) shown in Figure 9 [41]. Since MC has a flexible molecular skeleton, it gives rise to many conformers. The number and type of conformers depend on the physical and chemical context (for example, temperature, solvent, and the presence of a docking glycine).

**Figure 9.** Just of a few of all the possible conformers of a merocyanine (MCi) produced by irradiation of a spirooxazine (SpO).

Whatever the compound is, being either a macromolecule or a molecule, the ensemble of its conformers plays like a molecular Fuzzy set. Its fuzziness may be quantified by determining its fuzzy entropy. A definition of fuzzy entropy based on Shannon's function of information entropy is [42,43]:

$$H = -K \sum\_{i=1}^{n} (\mu\_i \log\_{10}(\mu\_i) + (1 - \mu\_i) \log\_{10}(1 - \mu\_i)),\tag{4}$$

where *μ<sup>i</sup>* is the relative weight of the *i*-th conformer, *n* is the total number of conformers, and *K* = (1/*n*) is a normalization factor. The fuzzy entropy of a compound is context-dependent, like the meaning of a word in natural language. In fact, conformationally heterogeneous structures are adaptable to many different contexts. Of course, the fuzzy entropy of a macromolecule is significantly larger than that of a simple molecule. Among proteins, those completely or partially disordered [44] are the fuzziest. Their pronounced fuzziness makes them multifunctional and even able to moonlight [45], i.e., play distinct functions, depending on their context.

#### *4.3. "Quantum Fuzziness"*

Isolated microscopic systems exist in a superposition of states. For instance, if there are two accessible states, indicated as |0 and |1, the isolated microscopic system exists in a quantum state |Ψ that is a linear combination of |0 and |1:

$$|\Psi\rangle = a|0\rangle + b|1\rangle,\tag{5}$$

where *a* and *b* are complex numbers that verify the normalization condition |*a*| <sup>2</sup> <sup>+</sup> <sup>|</sup>*b*<sup>|</sup> <sup>2</sup> = 1. The states |0 and |1 can be imagined as two fuzzy sets. Their vagueness, i.e., their fuzziness is outlined by the Heisenberg's Uncertainty Principle. The |Ψ state belongs to |0 and |1 with degrees that are |*a*| <sup>2</sup> and |*b*| 2 , respectively. |Ψ is a qubit, i.e., the elementary unit of the quantum information. The qubit can be described as a unit vector in a two-dimensional Hilbert space. The state of the qubit can be also represented by the following equation:

$$|\Psi\rangle = \cos\left(\frac{\theta}{2}\right)|0\rangle + e^{i\varphi}\sin\left(\frac{\theta}{2}\right)|1\rangle\tag{6}$$

where *θ* and *ϕ* define a point on the unit three-dimensional sphere, called the Bloch sphere. Logic operations on qubits can be visualized as rotations of the unit vectors on the Bloch sphere, preserving the norm of the quantum states. If a microscopic system is a superposition of *n* qubits, it has 2*<sup>n</sup>* accessible states, simultaneously. If we make an operation on this system, we manipulate 2*<sup>n</sup>* states, at the same time. Therefore, it is evident the alluring computational power of quantum logic. However, the main difficulty is to avoid the decoherence of the superimposed quantum states, which can be induced by deleterious interactions with the surrounding environment [46]. The decoherence induces the collapse of a qubit in one of its two originally accessible states, either |0 or |1, with probabilities |*a*| <sup>2</sup> and <sup>|</sup>*b*<sup>|</sup> 2 , respectively. Whenever the decoherence is unavoidable, the single particles can be used to process discrete logics, i.e., binary or multi-valued crisp logics [47,48]. Of course, specific microscopic techniques, reaching the atomic resolution, are needed to carry out the computations. Alternatively, we may think of making computations by exploiting large assemblies of particles, e.g., molecules. Vast collections of molecules (amounting to the order of the Avogadro's number) appear as bulky materials. The inputs and outputs for making computations become macroscopic variables that can change in a continuous manner. The relations establishing between the inputs and the outputs can be either steep or smooth. Steep, sigmoid functions are suitable to implement discrete logics, whereas both linear and nonlinear smooth functions are suitable to build fuzzy logic systems [49]. Some fuzzy logic gates and operations have been implemented by the hybridization reaction of DNA [50,51] and the supramolecular interactions between carbohydrates and proteins [52]. Other fuzzy logic systems have been built by exploiting the dependence of the fluorescence quantum yield on physical and chemical inputs. One example is the dependence of the fluorescence of 6(5*H*)-phenanthridinone (see Figure 10A) on the hydrogen bonding donation ability of the solvent (HBD) and the temperature [53]. Another example is given by tryptophan, both as isolated molecule and bonded to the serum albumin, whose fluorescence depends on the temperature and the amount of the quencher flindersine (see Figure 10B) [54]. A further example is a ruthenium complex, whose fluorescence depends on Fe2+ and F− (see Figure 10C) [55]. A final example is the fluorescence of europium bound to a metal-organic framework, which depends on metal cations, such as Hg2+ and Ag<sup>+</sup> (see Figure 10D) [56]. The emission of light is a preferable output because it bridges the gap between the microscopic and the macroscopic world. A multi-responsive chromogenic compound, belonging to the class of spirooxazine, has been used for the implementation of the all fundamental fuzzy logic gates, AND, OR, and NOT [57]. The protons, Cu2+, and Al3+ ions were used as inputs, and the color coordinates (R, G, B) or the colorability [41] of the chromogenic compound as outputs. Then, other platforms have been proposed. For example, a multi-state tantalum oxide memristive device [58] and an anthraquinone-modified titanium dioxide electrode [59]. Even, the Belousov-Zhabotinsky reaction, carried out in oscillatory regime and in an open system [60], allows to implement all the fundamental fuzzy logic gates by using bromide and silver ions as chemical inputs and the period of the oscillations as outputs. Finally, the "hydrodynamic photochemical oscillator", which is a thermally reversible photochromic compound combined with the convective motion of the solvent, is suitable to implement fuzzy logic systems when it works in chaotic regime [61]. All these examples show that fuzzy logic can be processed not only by conventional electronic circuits but also by unconventional chemical systems exhibiting analog input-output relationships in either the liquid or the solid phase.

**Figure 10.** Dependence of the fluorescence quantum yield of 6(5*H*)-phenanthridinone (**A**), tryptophan (**B**), a ruthenium complex (**C**), and europium bounded to a metal-organic framework (**D**) on physical and chemical inputs.

#### **5. Perspectives of the Fuzziness of the Molecular World**

Fuzzy logic is a valid model of the human power to compute with words and take decisions in complex situations. The closer one looks at the real-world problems, the fuzzier become their solutions. Fuzzy logic is playing a relevant role in the field of artificial intelligence when we deal with complex systems.

This work highlights that even the molecular world is fuzzy. In fact, quantum logic is fuzzy ("quantum fuzziness"). A qubit is a superposition of two distinct quantum states that are like fuzzy sets. Therefore, quantum logic might be considered a particular kind of fuzzy logic. When decoherent phenomena induce the collapse of qubits, it is not possible to process quantum logic. However, by working with large collections of molecules, it is feasible to implement fuzzy logic systems, when causal, macroscopic, smooth, analog input-output relationships are found.

In the human sensory system, the sensory cells that are fuzzy sets, containing molecular fuzzy sets, collect a large amount of data. The hierarchical architecture of the afferent and cortical neurons, which is based on the overlapping of their receptive fields, allows extracting only the meaningful information of the big data contained in the stimuli. The imitation of the principal features of the sensory system, in particular of what we called as "fuzzy parallelism", should allow devising artificial sensory system able to extract the essential characteristics of the complex stimuli. Hence, such artificial sensory systems should be suitable to recognize variable patterns.

The computational power of the cells and the human immune system derives from the "conformational fuzziness" of their macromolecules. By exploiting the conformational elasticity of molecules, especially proteins, it is possible to process fuzzy logic. In fact, the "conformational fuzziness" makes molecules adaptable to their microenvironment. This feature is suitable to implement the dependence of the information on the context.

By processing fuzzy logic at the molecular level, we want to promote the development of the chemical artificial intelligence. The purpose of chemical artificial intelligence is to mimic the performances of the human intelligence by using not software or hardware, but rather chemical and photochemical reactions in wetware. In fact, there exist chemical systems that can work as surrogates of the neural dynamics [62–65]. These systems can interact and communicate by exploiting both chemical and electrical and optical signals. They are the fundamental components of a futuristic opto-/electro-brain-like computing machine that should be suitable to recognize variable patterns and compute with words. There is a long path before the concrete implementation of this new generation of computing machines, more similar to the brain rather than to the electronic computer from both the structural and the functional point of view. Further analysis of the human nervous system and further development of the theory of fuzzy logic are needed. For example, the receptive field of a neuron can inspire a new kind of fuzzy set (i.e., Type-III fuzzy set) where we distinguish inhibitory and excitatory actions. With this new kind of fuzzy set, implemented somehow artificially, the recognition of variable patterns should become easier. Moreover, the chemical artificial intelligence will boost the development of the soft robotics. Soft robots, also called "chemical robots", will be easily miniaturized and implanted in living beings [66–71]. They will interplay with cells and organelles for biomedical applications. They will become auxiliary elements of the human immune system to defeat diseases that are still incurable.

Finally, this field of research could give clues about the origin of the life on Earth. In fact, the appearance of the life on Earth, occurred roughly 3.5 billion of years ago, was like a "phase transition". It was a transition from inanimate chemical systems, unable to encode, process, communicate and store information, to the living chemical systems, able to exploit the matter and energy to encode, process, send, and store information. The development of chemical artificial intelligence could unveil how that unique "phase transition" happened.

**Funding:** Please add: "This research was funded by ANVUR grant number [n.20/2017]."

**Acknowledgments:** P.L. Gentili is grateful to his stricter collaborators, who are B.M. Heron of the Huddersfield University (UK), R. Germani of the University of Perugia (Italy), I.R. Epstein and M. Dolnik of the Brandeis University (MA, USA), H. Gotoda of the Tokyo University of Science (Japan), and J.-C. Micheau of the Université Paul Sabatier-Toulouse III (France).

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Hardware Realization of the Pattern Recognition with an Artificial Neuromorphic Device Exhibiting a Short-Term Memory**

## **Dawid Przyczyna** †**, Maria Lis** †**, Kacper Pilarczyk \* and Konrad Szaciłowski \***

Academic Centre for Materials and Nanotechnology, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland

**\*** Correspondence: kpilarcz@agh.edu.pl (K.P.); szacilow@agh.edu.pl (K.S.)

† These authors contributed equally.

Received: 20 June 2019; Accepted: 26 July 2019; Published: 28 July 2019

**Abstract:** Materials exhibiting memory or those capable of implementing certain learning schemes are the basic building blocks used in hardware realizations of the neuromorphic computing. One of the common goals within this paradigm assumes the integration of hardware and software solutions, leading to a substantial efficiency enhancement in complex classification tasks. At the same time, the use of unconventional approaches towards signal processing based on information carriers other than electrical carriers seems to be an interesting trend in the design of modern electronics. In this context, the implementation of light-sensitive elements appears particularly attractive. In this work, we combine the abovementioned ideas by using a simple optoelectronic device exhibiting a short-term memory for a rudimentary classification performed on a handwritten digits set extracted from the Modified National Institute of Standards and Technology Database (MNIST)(being one of the standards used for benchmarking of such systems). The input data was encoded into light pulses corresponding to black (ON-state) and white (OFF-state) pixels constituting a digit and used in this form to irradiate a polycrystalline cadmium sulfide electrode. An appropriate selection of time intervals between pulses allows utilization of a complex kinetics of charge trapping/detrapping events, yielding a short-term synaptic-like plasticity which in turn leads to the improvement of data separability. To the best of our knowledge, this contribution presents the simplest hardware realization of a classification system capable of performing neural network tasks without any sophisticated data processing.

**Keywords:** photoelectrochemistry; wide bandgap semiconductor; artificial neuron; in materio computing; neuromorphic computing

#### **1. Introduction**

Pattern recognition is one of the basic cognitive functions, which, due to its complexity and required accuracy, has challenged researchers for decades in a strive to mimic it in an artificial setup. The development of such systems is fueled by various possible applications in medicine, security, economics and many other fields of human's activity. At the moment, the majority of available solutions are based on various software implementations of the machine learning approach, including above all the use of artificial neural networks (ANN) of different architectures. In most of the cases, the ANNs principle of operation is based on the optimization of weights associated with individual connections between nodes (neurons) and the information flow is inspired by the functions of biological structures found in the nervous system. It has been proven on numerous occasions that these algorithms provide an excellent efficiency in various classification tasks with both supervised and unsupervised learning procedures [1–3].

In spite of this, the use of software implementations for ANN algorithms often requires heavy preand post-processing of the analysed data and/or a high degree of network complexity translating into a high energy consumption [4]. The use of ANN-based methods may also be associated with a potentially low tolerance towards deliberate attacks [5], emphasized in the case of the one-pixel attack capable of deceiving certain deep neural networks [6]. To meet the discussed drawbacks, some researchers propose the development of hardware implementations for neural networks architectures, incorporating novel materials, non-classic electronic elements and unconventional computing paradigms, such as multi-valued and fuzzy logic systems [7].

The development of neuromorphic computing (NC) can be perceived as one of the manifestations of this trend. The key idea here is to design a brain-inspired hardware computing platform which is optimized towards the implementation of selected aspects of ANN algorithms [7,8]. Among the most advantageous concepts, the use of circuitry employing spiking artificial synapses has been proven to be more energy efficient than the implementation in silico [8]. The construction of these systems is currently being investigated in terms of new materials [9–11] which are applied within integrated networks capable of performing sophisticated information processing [12–14]. At the same time, a number of studies aim at simplifying the circuitry realizing the neuromorphic computations. Wang et al. [15] demonstrated a hybrid convolutional neural network with only one spiking synapse based on a HfO2 memristor. The system was capable of recognizing handwritten digits with the efficiency of 784 neurons. It was achieved through the time division multiplexing access technique. Nonetheless, the "network" required multilayer information pre-processing and several thousands of software neurons to operate.

The research on photochemical and photoelectrochemical in materio computing devices indicates the possibility of their integration into larger computing systems with the use of optical [16,17] or electrical [18] signals. This can lead to the construction of more complex photoelectrochemical circuits [18], molecular arithmetic-logic units [17] or molecular-scale neural networks [16] and communication systems [19] capable of the sophisticated information processing. The studies on artificial photoelectrochemical synapses [20], devices that may realize elementary learning processes (e.g., paired pulse facilitation, PPF), stimulated further development of the neuromorphic systems combining the neuromimetic approach towards data processing with in materio computing concepts [21–25]. The operation principle of a photoelectrochemical synapse is based on the competition between light-induced charge carrier generation, charge carrier trapping and other interfacial processes affecting the photocurrent generation. Whereas the information processing realized with the use of unconventional molecular or nanoscale devices has several drawbacks compared to classic, silicon-based electronics (usually low speed, some problems with data encoding and concatenation) [26], its combination with classic techniques and algorithms seems to be promising. The term heterotic computation encompasses hybrid systems, in which information processing is performed on various platforms depending on the optimal scenario, utilizing the speed and maturity of in silico computing or the flexibility of the unconventional approaches [27,28].

Here, we present an extremely simplified, robust circuit made of only one photoelectrochemical element, the operation of which is similar to a simple classification system. In the discussed case an Modified National Institute of Standards and Technology Database (MNIST) set of handwritten digits serves as the input data under consideration [29]—without the use of any data pre-processing or software ANNs. The presented optoelectronic device realizes the paired-pulse facilitation (PPF)—a type of short-term plasticity (STP)—seen as an enhancement of the postsynaptic current resulting from the increase in stimulating events frequencies [30,31]. Therefore, we are testing the information processing capability of a single artificial neuron made of nanocrystalline cadmium sulfide. The obtained results show, that the use of a such simple system may improve the separation in the phase space based solely on the characteristics of the input data (unsupervised learning). It seems possible that the proposed approach could be scaled up and a network of similar, interconnected devices could serve as a complex hardware neural network implementing the fuzzy logic formalism and selected concepts of reservoir computing [32].

#### **2. Results and Discussion**

#### *2.1. Material Characterization*

In order to determine the band gap width (Eg) of cadmium sulfide (CdS), the reflectance spectrum was recorded. Kubelka–Munk's function FKM was calculated based on the raw data and a Tauc plot was made (Figure 1a). CdS is typically considered a direct semiconductor with the Eg value equal to 2.42 eV for the hexagonal phase and 2.33 eV for the tetragonal polymorph. The value determined for the discussed material (2.33 eV, Figure 1a) may suggest the dominance of the latter polymorph, but this value is usually also observed for mixtures of both crystalline phases [33].

The powder X-ray diffraction measurements have been employed to assess the CdS sample composition. The obtained data was analyzed using HighScore Plus software [34] in which so called Rietveld refinement was applied [35]. This method allows evaluation of certain parameters including the volume fraction of phases. The analysis conducted for the diffraction pattern shown in Figure 1b indicates that both the tetragonal hawleyite [36] and hexagonal greenockite [37] phases are present in approx. 1:1 volume ratio. The energy dispersive X-ray spectroscopy (Figure 1c) indicates the absence of significant impurities, therefore electronic trap states (vide infra) most likely originate from CdS lattice defects. The SEM image (Figure 1d) reveals heavily agglomerated material, for which the particle size statistics were calculated using Image J software [38]. The distribution of crystalline diameters is relatively narrow, ranging from 25 to 110 nm, with an average diameter of 71 ± 3 nm, whereas the distribution maximum is found at 53 ± 1 nm.

**Figure 1.** The Tauc plot (**a**), the powder X-ray diffractogram (**b**), the energy-dispersice X-ray spectrum (EDS) (**c**) and the crystallite diameter distribution of the CdS sample (**d**) discussed in this study. Inset shows the SEM image of the studied sample.

#### *2.2. Plasticity of the Artificial Neuron*

The composites of cadmium sulfide with multiwalled carbon nanotubes reported in our previous works exhibited memory features that can be functionally associated with the neuronal facilitation (particularly, the paired-pulse facilitation—PPF) in terms of the short-term synaptic plasticity [20]. In the neuroscience, the PPF is considered a neuronal enhancement mechanism which consists of four distinctive processes characterized by different time constants and different physiology [39]. The PPF causes an amplification of the postsynaptic response as a consequence of the increase in stimulating event frequencies at the presynaptic axon [40]. It is believed that the PPF is realized mainly through the accumulation of depolarizing Ca2<sup>+</sup> ions in the presynaptic neuron [30]. High frequency components of the PPF mechanism (i.e., these characterized by low time constant values) may be useful from the point of view of information processing. These include the fast-decaying facilitation F1 and the slow-decaying facilitation F2 [30,39,41]. The influence of both components manifests itself in the double exponential decay of the postsynaptic response depicted in Figure 2.

**Figure 2.** An example of the paired-pulse facilitation observed in nervous system. Adapted from [31].

A similar phenomenon was observed for the photoelectrodes made of nanocrystalline cadmium sulfide (Figure 3). Like in the case of multiwalled carbon nanotubes (MWCNT)-CdScomposite [20], when the interval between light pulses is sufficiently long (over 300 ms for this study) the subsequent photocurrent spikes are unafected by previous states the device was in. However, if the interval between irradiations becomes shorter (e.g., 80 ms) the amplifiation of the second photocurrent response becomes significant. Detailed analysis reveals that the ratio of pulse intensities (the facilitation rate) vs. the time interval between stimuli is best fitted with a biexponential function (1):

$$\frac{A\_2}{A\_1} = a\_1 e^{-\frac{t}{\tau\_1}} + a\_2 e^{-\frac{t}{\tau\_2}} + y\_0 \tag{1}$$

which is fully consistent with the previous reports on MWCNT-CdS composite photoelectrodes [20]. The result of the fitting procedure is presented in Figure 3b and the parameters equal to: *a*<sup>1</sup> = 0.218 ± 0.023, *a*<sup>2</sup> = 0.340 ± 0.014, τ<sup>1</sup> = 19.8 ± 4.6 ms, τ<sup>2</sup> = 167 ± 23 ms and *y*<sup>0</sup> = 1.008 ± 0.013. The time constants values, which are representative for polycrystalline CdS samples, are slightly higher than those obtained for CdS/MWCNT composites [20]. Interestingly, the determined values are consistent with the parameters typically observed in the case of biological structures [31].

The double exponential decay can be associated with two distinctive trapping/detrapping events characterized by two time constants τ<sup>1</sup> and τ2. This diversity may originate from the presence of two CdS polymorphs, of which charge trapping states most likely differ. At the same, through the comparison with selected natural learning processes, these two mechanisms may be associated with two components of neuronal plasticity: the fast-decaying facilitation F1 and the slow-decaying facilitation F2. Alternatively, they can be described as manifestations of short- and long-term memory, respectively [42].

**Figure 3.** The photocurrent spikes resulting from the pulsed light illumination of CdS-based photoelectrodes (**a**) and the analysis of the photocurrent amplification vs. time interval between subsequent pulses (**b**).

The overall mechanism of photocurrent generation and spikes amplification is summarized in Figure 4. The photoexcitation leads to the electrons transition from the valence to the conduction band (1) and the electron-hole recombination occurs spontaneously afterwards (1'). Electrons in the conduction band can be subsequently transferred through the interface to the conducting substrate (2) and holes can migrate to the surface and react with redox mediators in the electrolyte (2'). At the same time a fraction of electrons from the conduction band becomes trapped within interband states in a very fast process (3). This process efficiently competes with the interfacial electron transfer (2), but once the traps are filled this pathway becomes inactive. The trapped electrons undergo relaxation with the time constants τ<sup>1</sup> (3') and τ<sup>2</sup> (3").

**Figure 4.** A tentative mechanism of the photocurrent generation and charge carriers trapping in the nanocrystalline CdS sample under consideration.

This simple mechanism provides a platform for the implementation of neuronal dynamics in an artificial, fully inorganic system. Due to its simplicity it can be applied for signal and pattern processing and could be integrated into larger neuromimetic systems. Furthermore, along with bioinspired neuromorphic computing, other information processing paradigms may be implemented within the same system: Boolean logic [43], ternary logic [44] and fuzzy logic [45]. The latter one is especially tempting, as it may contribute to the development of novel neuro-fuzzy information processing devices [46–48].

#### *2.3. Recognition of Digits*

A dataset containing 1000 handwritten digits (100 samples of each 0, ... , 9 digit) was randomly selected from the MNIST database (Figure 5a). All the images were transformed into binary strings and used for the modulation of a light source. In order to eliminate possible errors resulting from the photoelectrode equilibration or photodegradation, first 20 and last 20 recorded photocurrent profiles were discarded and the remaining 60 patterns were subjected to further processing. First of all, a set of simple classification rules have been developed. These are based on pixel counting, therefore cannot provide a significant separation of the input data. In the first step, each sign (in a form of 28 × 28-pixel image, Figure 5a) was divided into four quadrants labelled κ1, ... , κ<sup>4</sup> (Figure 5b) and the sum of black pixels confined within each quadrant was calculated. In other words, an individual character was associated with a vector [Σκ1, Σκ2, Σκ3, Σκ4] or a point in 4-dimensional space. Subsequently, four 3-dimensional projections were formulated in the following manner: [Σκ1, Σκ2, Σκ3], [Σκ1, Σκ2, Σκ4], [Σκ1, Σκ3, Σκ4] and [Σκ2, Σκ3, Σκ4]. For each type of input class (0, ... , 9) an ellipsoid with the confidence level of 65% was fitted using 3D Confidence Ellipsoid toolbox in OriginPro 2019.

**Figure 5.** A small sample of the MNIST handwritten digits (**a**) and an image depicting the definition of quadrants for the 28 × 28-pixel image (**b**).

The collection of data points representing all 600 characters under consideration for [κ1, κ2, κ3] combination of quadrants is shown in Figure 6a and the fitted ellipsoids in Figure 6b,c. It can be noticed, that the applied analysis procedure provides a rather poor separation, as the fitted ellipsoids excessively overlap in most of the cases, with the exception of "1" and "9" pair. This result is fully consistent with the initial assumption—simple pixels counting cannot serve as an efficient method for handwritten character recognition.

**Figure 6.** A complete collection of input data points (before feeding them into the single-node neural network) for a set of 600 handwritten digits in the one, arbitrary chosen 3D projection (**a**) and an example of a relatively well-separated pair, which is associated with digits "1" and "9" (**b**). Other ellipsoids overlap significantly, e.g., those for digits "3" and "4" (**c**) or "2" and "5" (**d**).

It can be noticed that only two pairs are completely separated, whereas three others are close to complete separation. Most of these cases concern digit "1" which is substantially different (when the symmetry and number of pixels are taken into account) from any other handwritten digit.

In order to quantify the efficiency of digits recognition in various scenarios, a separability index was defined. Let *V* - m - κ*i*, κ*j*, κ*<sup>k</sup>* be the volume of an ellipsoid fitted to the digit *m* representation in κ*i*, κ*j*, κ*<sup>k</sup>* projection. Then the separability index of the digit *m* to *n* will be defined as a ratio between the relative complement of *n*-ellipsoid in *m*-ellipsoid to the volume of *m*-ellipsoid for *m n* (2):

$$\zeta\_{\mathbf{m}/\boldsymbol{\mu}} = \frac{V\Big(\mathbf{m}\big(\kappa\_{i\prime}, \kappa\_{j\prime}, \kappa\_{k}\big) \big(\mathbf{n}\big(\kappa\_{i\prime}, \kappa\_{j\prime}, \kappa\_{k}\big)\big)}{V\big(\mathbf{m}\big(\kappa\_{i\prime}, \kappa\_{j\prime}, \kappa\_{k}\big)\big)}}\tag{2}$$

The calculated separability indices for the input data in one of the possible projections are collected in Table 1. It is noteworthy that the matrix containing separability indices for a given combination of quadrants is not symmetrical, i.e., ξ*m*/*<sup>n</sup>* ξ*n*/*m*, since ellipsoids have different volumes. If *m* = *n* then ξ*m*/*<sup>n</sup>* = 0.


**Table 1.** The collection of separability indices for the input data in κ1, κ2, κ<sup>3</sup> projection. The efficiency of data separation is color coded (vide infra) from red (no separation, ξ = 0) to green (perfect separation, ξ = 1).

In most of the cases the separation is insufficient to allow the unequivocal recognition of handwritten shapes. The exception is a pair {1, 9} for which the separability index is equal to one, as the corresponding ellipsoids do not overlap. It is due to the fact, that "1" differs significantly in terms of pixels distribution between the quadrants. Poor separation can be however greatly improved with the use of even the simplest, the single-node hardware neural network.

In the first step all the characters were converted row-by-row into a stream of bits ("0" for a white pixel and "1" for a black one) and used to modulate the light source according to the scheme presented in Figure 7a,b. The recorded photocurrent spikes (Figure 7c) reflected the sequence of light pulses, but their intensity varied according to the previous states the photoelectrode was in (the short-term memory, vide supra, Figure 3). The photocurrent patterns were subsequently normalized: the amplitude of each signal was divided by the highest intensity recorded for the particular character. The application of various threshold values (Figure 7c,d) acted as a filter for the photocurrent spikes depending on their amplitude. The obtained images with the lowest intensity pixels removed at different thresholds are shown in Figure 8.

**Figure 7.** A 28 × 28 pixels image of a handwritten character with a marked row (**a**) translated into a sequence of bits and corresponding light pulses (**b**). A pattern of photocurrent spikes for a given binary input with three thresholds indicated (**c**). An image of the character reconstructed from the normalized photocurrent amplitudes (**d**).

The application of different thresholds (from Θ = 0.3, with virtually no signal filtration to Θ = 0.9, corresponding to the removal of all but the most intense pixels) leads to the evolution of the character image, which depends directly on the neighbors of each particular pixel in the row. Significantly, the "distance" (formerly in space, translated into time intervals between the light pulses) from the closest preceding black pixel determines the weight of the subsequent photocurrent spike amplification. As

the result, a simple type of classification according to the scattering of pixels can be achieved. Like in the case of the input data, the output images are subjected to the evaluation of respective separability indices at various threshold values. An example is shown in Table 2.

Figure 9a shows a collection of all data points obtained for one selected projection (κ1, κ2, κ3) and one threshold value (Θ = 0.7). They seem to be equally scattered as points for unprocessed data (cf. Figure 6a), but the fitting procedure reveals significant differences. Some ellipsoids, that were initially well separated (e.g., the {1, 9} pair) overlap significantly (Figure 9b). Some other remain unchanged (Figure 9c). More interestingly there are numerous pairs (e.g., {2, 5}, Figure 9d) which are significantly separated upon the data processing with the neuromimetic element.

**Figure 8.** An image of the character from Figure 7 reconstructed from the normalized photocurrent spikes at different threshold values.

**Table 2.** The collection of separability indices for the output data in κ1, κ2, κ<sup>3</sup> projection at the threshold Θ = 0.7. The efficiency of data separation is color coded from red (no separation, ξ = 0) to green (perfect separation, ξ = 1).


Upon data treatment with the neuromimetic element the separability is significantly improved. Six pairs of digits are completely separated and two others are close to complete separation. Furthermore, these pairs are different than those separated with the use of the pixel counting method (cf. Table 1).

The improvement of digits classification can be visually evaluated by the comparison of color-coded Tables 1 and 2. In order to perform a global quantitative evaluation of the separation efficiencies and to assess the improvement associated with the use of the single-node hardware neural device, a separability ratio (Ξ*m*/*n*) was defined as a ratio of the separability index calculated for the processed data to the separability index determined for the input data (for *m n*) (3):

$$
\Xi\_{m'n} = \frac{\xi\_{m'n}^{output}}{\xi\_{m'n}^{input}} \tag{3}
$$

and Ξ*m*/*<sup>n</sup>* = 0 for *m* = *n*. Detailed analysis of calculated values provides information on the discussed procedure efficiency even in the case of significantly overlapped ellipsoids. Selected separability ratios for κ1, κ2, κ<sup>3</sup> projection at the threshold Θ = 0.7 are presented in Table 3.

**Figure 9.** A complete collection of output data points (after feeding them into the single-node neural network) for a set of 600 handwritten digits in the one, arbitrary chosen 3D projection (corresponding to the one shown in Figure 6) (**a**) and an example of significantly overlapped ellipsoids, corresponding to digits "1" and "9" (**b**), "3" and "4" (**c**). The majority of other ellipsoids are separated better, than in the case of untreated data—e.g., those associated with digits "2" and "5" (**d**).

**Table 3.** The collection of separability ratios (Ξ*m*/*n*) for the output data in κ1, κ2, κ<sup>3</sup> projection at the threshold Θ = 0.7. The efficiency of data separation is color coded from red—significantly decreased separability (Ξ < 0.5), through yellow—slightly decreased separability (0.5 < Ξ < 1), green—moderately improved separability (1 < Ξ < 1.5), blue—significantly improved separability (1.5 < Ξ < 2) to navy blue—outstanding improvement of separability (Ξ > 2).


It can be noticed that the overall improvement of separability is achieved with at least twofold increase of the index value for 10 pairs compared to the twofold decrease in five instances. A similar situation is also observed for other projections at this threshold. This qualitative picture suggests an improvement of handwritten digits recognition upon application of a neuromimetic element in data processing. A quantitative estimation of classification improvement can be obtained through simple numerical analysis of output data. Due to the analysis complexity (four quadrant combinations for eight different threshold values), various separation scenarios (depending on the chosen threshold and projection) are possible. Their overall efficiency can be evaluated using an integral separability index Ω, which acts as a global parameter indicating performance of the system for all of the above-mentioned variables. For each separation scenario it can be defined as follows (4):

$$\Omega\left(\kappa\_{i'},\kappa\_{j'},\kappa\_{k'}\Theta\right) = \frac{\sum\_{m}\sum\_{n}\xi\_{m'n}^{output}}{\sum\_{m}\sum\_{n}\xi\_{m'n}^{input}}, m \neq n \tag{4}$$

The above-mentioned dependency of the Ω function values is depicted in Figure 10. It can be noted, that in the majority of investigated separation scenarios the integral separability index after the treatment with the single-node hardware neural network is significantly higher than the value calculated for the unprocessed input data. In three cases the selection of a low threshold value (a situation which results in an insufficient filtration of pixels due to inadequate exploitation of memory features) leads to the inferior separation. On the other hand, when the short-term memory of the system is optimally utilized, the recognition of handwritten digits increases. For two quadrant combinations an optimal threshold value exists, which is fully consistent with the expectations—too deep filtration removes too many pixels and all the data points (cf. Figures 6a and 9a) interfuse at the origin of the coordinate system.

#### **3. Conclusions**

Surprisingly, even a primitive hardware realization of the neural network architecture based on a single-node exhibiting the short-term memory can significantly improve pattern recognition. Classification based solely on the number of black pixels encompassed by each of the four quadrants the character image is divided into (cf. Figure 5b) is insufficient—only a few characters of a specific symmetry (e.g., "1") could be distinguished using this primitive procedure. The application of the optoelectronic element with PPF functionality enhances tremendously (for such a simple device) the classification efficiency. This change is based on the extraction of a new feature of the studied data—the

scattering of pixels within the 28 × 28 matrix. The high dispersion leads to the negligible photocurrent amplification. On the contrary, the digits with large groups of pixels are characterized by a higher number of counts for relatively high thresholds, for which the short-term memory of the system and resulting photocurrent amplification is strongly pronounced.

The system presented in this work is a single node neural device, the operation of which is based on the unsupervised learning paradigm involving the short-term memory of the device. The simple pixel counting method gives precise information on the size of the characters (therefore digit "1" separates well in all of the cases) and indirectly on their symmetry (it can be achieved by an appropriate selection of quadrant combinations). Application of a neuromimetic element allows further information processing, particularly the extraction of information on pixel "agglomeration", at least at the single row level. This process is analogous to Sammon mapping [49], but does not involve the reduction of data space dimensionality (Figure 11).

**Figure 10.** The dependence of the integral separability index Ω vs. the threshold value Θ. The horizontal line indicates the integral separability values determined for the input data. Threshold Θ = 0 corresponds to unprocessed input data.

**Figure 11.** A diagram depicting the data flow and the efficiency evaluation of the unsupervised classification system under consideration.

The presented optoelectronic single-node neural device is superior compared to software implementations in terms of the error resistance and the energy efficiency. These features of the discussed system make it a potential low-cost pre-processing unit. Furthermore, due to the operation based on time-series and the intrinsic short-term memory, it can be combined with selected aspects of the reservoir computing paradigm—in one of the possible scenarios, where a delayed feedback loop is used, the virtual neurons could significantly affect the system performance. At the same time, the analog character of the system allows the implementation of the fuzzy logic system, yielding a new class of hardware neuro-fuzzy devices.

The research presented in this paper supports the concept of heterodic computation [27,28]. It clearly shown that the performance of a simple numerical algorithm (classification based on pixel counting) can be improved by in-materio computational component, which itself cannot perform any classification tasks.

#### **4. Materials and Methods**

Commercially available cadmium sulfide (POCH, Las Condes, Chile), potassium nitrate (Avantor, Radnor, PA, USA), potassium iodide (Aldrich, St. Louis, MO, USA) and iodine (Aldrich, St. Louis, MO, USA) were used as received.

Working electrodes were prepared from polyethylene terephthalate (PET, Camarillo, CA, USA) foil coated with indium tin oxide (Aldrich, St. Louis, MO, USA). These substrates were washed carefully with diluted detergent solution, deionized water and isopropanol. They were then dried in air. The cadmium sulfide was ground with deionized water in an agate mortar to a thick paste and deposited onto the freshly cleaned substrates using a screen-printing machine (MikMetal, Masis St, Yerevan, Armenia) equipped with a 80 mesh polymer grid.

The UV/Vis spectra were recorded using Lambda 750 spectrophotometer (Perkin Elmer, Waltham, MA, USA) within the wavelength range of 200–2000 nm. Barium sulfate of spectral purity was used as a reference material. The X–ray diffraction patterns were recorded with Empyrean (Cu λKα<sup>1</sup> = 1.54060 Å) diffractometer (PANalytical, Malvern, UK) at room temperature with 2θ values ranging from 20 to 80 degrees. The scanning electron images were taken on Versa 3d (FEI, Lausanne, Switzerland) scanning electron microscope operating at 20 kV with an Everhart-Thornley detector. The chemical composition of the CdS sample was confirmed using the energy dispersive X-ray spectroscopy. All electrochemical measurements were performed with the use of SP-300 potentiostat (BioLogic, Cary, NC, USA). Luxeon Star/O Royal Blue diode (465 nm, the total radiometric power of 110 mW) was used as the light source. It was powered through the WA-301 wideband amplifier (Aim-TTI, Cambridgeshire, England). Pulse sequences were generated with TG2512A arbitrary function generator (Aim-TTI, Cambridgeshire, England) triggered with Arduino Uno R3 system.

The photoelectrochemical experiments were performed in air-equilibrated electrolytes using a three-electrode configuration. As the photoactive component the screen-printed CdS electrodes, immersed in an aqueous electrolyte containing 0.1 M KNO3, 0.001 M KI and 0.0001 M I2, were used. A saturated Ag/AgCl electrode was used as a reference electrode and a platinum wire as a counter electrode. The positive voltage (400 mV) was applied to the working electrode and the photocell was irradiated with short light pulses (300 μs).

The experiment automation was realized based on the program written in Arduino C++ language. All the necessary data processing was performed with the use of programs written in Python 3.7.2.

**Author Contributions:** M.L., D.P. and K.P. have performed all photoelectrochemical experiments. D.P. and K.P. have selected appropriate data sets. K.P. and K.S. have developed algorithms for data processing and designed the experimental setup. K.P. wrote the software necessary for data handling and measurements automation. All the authors contributed to the data analysis and the manuscript preparation.

**Funding:** This research was funded by National Science Centre (Poland) within the MAESTRO project, contract No. UMO-2015/18/A/ST4/00058. D.P. acknowledges partial support from European Union within the EU Project POWR.03.02.00-00-I004/16. K.P. was supported by the Foundation for Polish Science (FNP) and acknowledges partial support from National Science Centre within the PRELUDIUM project, contract No. UMO-2016/21/N/ST3/00469.

**Acknowledgments:** The authors thank Grzegorz Cios and Marianna Marciszko for the XRD measurements and Kapela Pilaka for countless discussions of an extremely fruitful character on the design of hardware neural networks.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Sample Availability:** Samples of the polycrystalline CdS are available from the authors on request. The full set of output data with separability indices for all possible classification scenarios is available on request. The programs used for the data processing are available on request.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Perspective* **Towards a Stochastic Paradigm: From Fuzzy Ensembles to Cellular Functions**

#### **Monika Fuxreiter \***

MTA-DE Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, H-4032 Debrecen, Hungary

Academic Editor: Pier Luigi Gentili Received: 21 October 2018; Accepted: 16 November 2018; Published: 17 November 2018

**Abstract:** The deterministic sequence → structure → function relationship is not applicable to describe how proteins dynamically adapt to different cellular conditions. A stochastic model is required to capture functional promiscuity, redundant sequence motifs, dynamic interactions, or conformational heterogeneity, which facilitate the decision-making in regulatory processes, ranging from enzymes to membraneless cellular compartments. The fuzzy set theory offers a quantitative framework to address these problems. The fuzzy formalism allows the simultaneous involvement of proteins in multiple activities, the degree of which is given by the corresponding memberships. Adaptation is described via a fuzzy inference system, which relates heterogeneous conformational ensembles to different biological activities. Sequence redundancies (e.g., tandem motifs) can also be treated by fuzzy sets to characterize structural transitions affecting the heterogeneous interaction patterns (e.g., pathological fibrillization of stress granules). The proposed framework can provide quantitative protein models, under stochastic cellular conditions.

**Keywords:** protein dynamics; conformational heterogeneity; promiscuity; fuzzy complexes; higher-order structures; protein evolution; fuzzy set theory; artificial intelligence

#### **1. The Structure-Function Paradigm**

Protein functions take place in space and time. Structure-function principles, however, relate a protein sequence to biological activity, only via the spatial coordinates of the residues [1,2]:

$$\begin{aligned} \text{SEQUENCE} & \rightarrow \text{STRUCTURE} \rightarrow \text{FUNCTION} \\ (\text{x}, \text{y}, \text{z}) & \quad (\text{x}, \text{y}, \text{z}, \text{t}) \end{aligned} \tag{1}$$

The three-dimensional organization of amino acids brings different chemical groups into proximity [3,4], creating specific microenvironments for biological activities. The emerging active sites, for example, can catalyze chemical reactions at significantly faster rates, than in solution [5,6]. The classical, deterministic Paradigm 1 establishes an unambiguous connection between the protein sequence and its function, via a unique structure.

#### **2. The Ensemble View**

The energy landscapes of proteins are, in reality, more complicated. Proteins fluctuate among various conformations ('macrostates') and sub-states ('microstates'), which need to be considered for their relevant functioning [7–9]. A wide spectrum of dynamical transitions [10,11]—from local movements (e.g., sidechain rotations [12]) to large-amplitude collective motions (e.g., domain repositioning [13])—generates conformational ensembles, which, however, are not trivial to link to the function. How can structure-function relationships account for protein dynamics? If protein structure is described as an ensemble, the populations of the relevant sub-states, as well as the rate of interconversion between them, must be experimentally determined for each biological activity.

$$\begin{aligned} \text{SEQUENCE} & \rightarrow \text{CONFORDATIONAL ENSEMBLE} \rightarrow \text{FUNCTION} \\ & \{p\_{\text{Cs}\_1}, \dots, p\_{\text{Cs}\_N}\} \\ & \{k\_{\text{Cs}\_1}, \dots, k\_{\text{Cs}\_N}\} \end{aligned} \tag{2}$$

where *pCSi* is the probability of the given conformational sub-state *CSi*, *N* is the number of sub-states, and {*kCSi* } is the set of rates, corresponding to the conversions between *CSi*→*CSj*, where *j* corresponds to all the other sub-states. Even if the number of conformational states is reduced to a few functionally relevant ones, characterizing both their thermodynamic and kinetic properties is a daunting task [14,15]. Furthermore, the deterministic relationship between the ensemble parameters and a unique function is also influenced by the environmental conditions.

#### **3. Adaptation to Stochastic Cellular Conditions**

Proteins function under rapidly changing extracellular signals and intracellular milieu, which is shaped by cellular diffusion and transport, stochastic gene expression, degradation, and other environmental fluctuations. These factors present stochastic conditions for protein evolution [16–18] leading to 'noise' in biological innovations [19], which is reflected by redundancies and ambiguities in sequences [20], structures [21], and functions [22]. On the one hand, proteins attempt to minimize functional noise. For example, higher-order structures emerge to reduce noise-to-signal ratio for low-affinity substrates [23–25]. On the other hand, ambiguities and redundancies in sequence, structure and function facilitate dynamic adaptation [26]. Proteins evolve under these two opposing constraints to optimize fitness under given cellular conditions.

#### **4. Ambiguity and Redundancy in Sequence, Structure, and Function**

The re-formatted paradigm (2), still implies that a given sequence generates a well-defined ensemble, which belongs to a specific function. The stochastic cellular conditions lead to the following observations, which violate the classical paradigm: (i) A considerable proportion of proteins exhibit multiple, simultaneous activities, often referred to as promiscuity or moonlighting [27]. (ii) Certain biological activities (i.e., signaling) are related to heterogeneous conformational ensembles, which are mixtures of different functional ensembles [28]. (iii) Some proteins exhibit a weak sequence dependence, i.e., a large degree of tolerance towards sequence modifications [29]. These observations stem from redundancies in sequence or structure, coupled to ambiguities in function. The same ensemble may perform multiple functions (*functional promiscuity*); the same sequence may be organized into multiple functional ensembles, depending on the context (*conformation and interaction heterogeneity*); and multiple sequences may encode the same conformational ensemble (*sequence redundancy*). These problems, which reflect a more complex relationship between the sequence, structure, and function of proteins, are detailed below.

#### **5. Functional Promiscuity**

Metabolic enzymes often catalyze reactions on the non-canonical substrates, some of which are also relevant physiologically [27,30,31]. Functional promiscuity may parallel organism complexity [32], or be driven by network context [33]. Promiscuous activities can serve as starting points to engineer new enzymes [34]. Tailored selection pressures may optimize latent activities to overcome the primary function by >109-fold [35]. Functional transitions are usually initiated by 'neutral drifts', with a negligible impact on the original activity [36,37]. That is, the optimization of a promiscuous function initially exploits the inherent variations in structure [38] and dynamics [39]. Functional transition of a phosphotriesterase to arylesterase [35], for example, is coupled to increasing structural divergence between the two subunits, until the two activities become comparable (Figure 1A). In contrast, specialization for the new activity is accompanied by structural convergence (Figure 1A). Similarly, 'freezing' out unnecessary motions offers another route to optimize enzymatic efficiency [6]. Along these lines, principal modes derived from structure [40] often presage or follow the evolutionary changes [41,42].

**Figure 1.** Towards a stochastic structure-function relationship. (**A**) Structural diversity increases with functional promiscuity. The distance between the L5 (*lime*, *green*) and L7 (*wheat*, *orange*) loops (A204 C–G273 C) deviates in the two subunits (*superimposed*) of a dimeric phosphotriesterase (PTE) enzyme (PDB code: 4xag [39]). During laboratory evolution into arylesterase, the structural difference increases as the two activities become comparable (R1 → R6), while it decreases during specialization (R8 → R22). (**B**) Free energy landscape changes upon adaptation of proteins. Functional alterations shift the relative populations of conformational sub-states, but may not impact the ruggedness of the landscape. (**C**) Conformational sub-states (CSs) contribute to multiple free landscapes. The functional noise (uncertainty of F1, F2, F3) of the main activity (*bold*) can be quantified by fuzzy membership functions. (**D**) The fuzzy structure-function model. In the fuzzy inference system, the logical relationship is established between the fuzzy sets of the input and output (*top*). In proteins, fuzzification generates sets of interaction patterns amongst functional sequence motifs, which can be linked to conformational sub-states. The connection between structure and function is a knowledge-based logical rule between the set of conformational sub-states and the set of alternative functions, from which the most likely activity can be selected (*bottom*).

#### **6. Conformational Heterogeneity**

Dynamic signals perturb conformational ensembles by changing the relative populations of the different sub-states [43] (Figure 1B). The co-existence of functionally different conformations, in a broad regime, may enable the same protein to be simultaneously engaged in multiple pathways [44]. An agonist binding to a β2-adrenergic receptor, for example, does not stabilize the active conformation of the cytoplasmic domain; it rather increases the conformational heterogeneity of the active, intermediate, and inactive states, for the complex signaling outputs [28].

Intriguing observations indicate that specific biomolecular recognition can also be achieved in heterogeneous conformational ensembles [45–47]. Although the underlying molecular forces are often puzzling [48,49], conformational ambiguities often enable context-dependent responses, via alternative interaction patterns [50,51]. Conformational heterogeneity along the binding trajectory, has recently been concluded to critically influence the structures in a complex, with different partners [52,53]. Structural ambiguities might even be a pre-requisite, for example, for efficient transcription [54] via a fuzzy 'free-for-all' mechanism [55].

Conformational heterogeneity often leads to dynamic interaction profiles, where the functional output (specificity, signal, and polymerization) is controlled by transient contacts [56,57]. Dynamic interactions may also balance between the auto-inhibited and active states [58,59] and can be significantly influenced by post-translational modifications (PTMs) [60,61]. Although the modification pattern inducing the functional response can be defined, its impact on the underlying heterogeneous conformational ensembles often remains unclear.

#### **7. Redundant Sequence Motifs**

Multiple, weakly-restrained sequence motifs are frequently distinguished in signaling pathways, via mediating protein interactions [62]. Regions linking the motifs exhibit increased conformational plasticity and reduced sensitivity to mutations or scrambling [63], leading to a phenomenon, often referred to as 'sequence independence' [64]. Tandem repeats of a few residues, for example, are often involved in the organization of higher-order structures [65], ranging from amyloids to signaling complexes and nuclear pores [66]. Motif redundancy leads to the redundancy of interaction patterns and the co-existence of different contact topologies. Although the interactions of the individual motifs are often sub-optimal, their cooperativity may result in high-affinity associations [25,67].

Both the dynamics of the motif-linking regions, and the variations in contact patterns, lead to conformational heterogeneity in higher-order assemblies [68]. The Fused in Sarcoma (Fus) protein, for example, is involved in the formation of stress granules, via a liquid–liquid phase transition, which is driven by its low-complexity (LC) domain, composed of 27 [S/G]Y[S/G] repeats. The NMR spectra of the LC domain in the droplet, is similar to that of the unbound state, witnessing

conformational heterogeneity in the assembly [69]. Single-point mutations may gradually decrease conformational heterogeneity, leading to pathological aggregation [70]. Additional studies corroborate the finding that pathological mutations initially induce minor perturbations [71], which simultaneously affect multiple conformations/interaction patterns and induce their shift towards the fibril form.

#### **8. Generalized Structure-Function Ensembles**

The experimental data summarized in the above three sections are difficult to interpret via the classical structure-function paradigm (2). We may attempt to solve these problems by treating the sequences, conformations, and functions as generalized ensembles:

$$\text{NEQUENCE } (\mu, \sigma) \to \text{CONFORDATIONAL ENSEMBLE } (\mu, \sigma) \to \text{FUNCTIONN} \ (\mu, \sigma) \tag{3}$$

where μ is the mean, and σ is the variance of the respective distribution.

Evaluating the structure-function paradigm in the form (3), requires decoupling of all the respective activities, to analyze the underlying distributions of conformational ensembles and sequences. Careful experimental studies, along these lines [72], demonstrate that these approaches are hardly feasible. First, because the dimensionality of the problem is overwhelming, and second, the deconvolution of different functionalities may not be possible in vivo, owing to the intricate connections.

#### **9. Fuzzy Sets Quantify Sequence and Conformation Ambiguities**

I propose that the fuzzy set theory [73] offers a quantitative framework to derive stochastic structure-function relationships. In fuzzy sets *U* = {*x*1, *x*2, ..., *xN* } a membership function *m*(*xi*) → [0, 1] ; *xi* ∈ *U* is assigned to each element, which characterizes to what extent *xi* belongs to the given set. For example, the membership of a protein conformational sub-state (*CSi*), in a specific functional set (*F*1), can vary between 0 and 1 (*m*1(*CSi*) : *F*<sup>1</sup> → [0, 1]), allowing the conformation to contribute to additional activities (e.g., *F*<sup>2</sup> and *F*3, Figure 1C). Memberships for other possible biological functions could also be defined, using this formalism (Figure 1C). In a similar manner, memberships of sequences in given conformational ensembles, (*m*1(*SEQi*) : *CS*<sup>1</sup> → [0, 1]), or in given functions (*m*1(*SEQi*) : *F*<sup>1</sup> → [0, 1]), could also be quantified.

The structure-function paradigm could thus, be reformulated by treating the sequences and conformational ensembles as fuzzy sets:

$$\begin{aligned} \text{SEQUENCE} & \rightarrow \text{CONFORDATIONAL ENSEMBLE} \rightarrow \text{FUNCTION} \\ \text{m\_i(PI)}: \text{CS}\_i & \rightarrow [0, 1] & \text{m\_i(CS)}: \text{F}\_{\text{i}} \rightarrow [0, 1] \\ \text{m\_i(PI)}: \text{F}\_{\text{i}} & \rightarrow [0, 1] \end{aligned} \tag{4}$$

where *mi*(*PI*) is the respective membership function of a sequence, defined with respect to the conformational states (*CSi*) or biological activity (*Fi*), as a pattern of interacting elements/motifs (*PI*). *mi*(*CS*) is the membership function of the conformational sub-state/ensemble (*CS*), in a given function.

Here sequence, structure, and function are considered as different co-existing distributions (Figure 1C), and their contributions change according to the cellular conditions. For example, in the case of a β2-adrenergic receptor, the active, intermediate, and inactive states (represented by three ensembles) are mixed differently, depending on the signaling input. The fuzzy formalism handles combinations of activities aiming to determine the individual contributions of the different conformational ensembles.

#### **10. The Stochastic Structure-Function Relationship**

Within this framework, the structure-function relationship can be quantified by a fuzzy inference system [74,75] (Figure 1D). Parameters describing the elements of the sequence (motifs) or

conformational space (distinguished secondary structures) are used as the input, and the different biological activities serve as the output of the system. The first step is the fuzzification of the input, when the fuzzy sets and their membership functions are defined to describe the interaction patterns, and the corresponding conformational sub-states (Figure 1D). The fuzzy inputs are then combined and knowledge-based logical rules ('IF-THEN') are applied to obtain the output membership functions of the different biological activities in the system. These rules could be derived using machine-learning or neural network algorithms. Defuzzification of the output can select the most likely activity, under a given condition, while also accounting for other, promiscuous activities (Figure 1D).

The fuzzy model quantifies the functional ambiguities of the conformational sub-states:

$$
\Delta F\_{main} = \left(\sum\_{i}^{n} \delta F\_{i,main}\right) / n \tag{5}
$$

where *n* is the number of alternative (promiscuous) activities, and *Fmain* is the main function with membership function *mmax*. The contribution of function *Fi*, with respect to the main function, is computed from the corresponding membership functions: *δFi*, *main* = *mi*/ *mmax*.

Here, the challenge is to define the membership functions. To this end, the efficiencies of the alternative activities (e.g., catalytic rates) are determined via functional assays on well-characterized conformations (e.g., crystal structures, chip-bound proteins, or those selected by conformational antibodies) or ensembles (solution techniques, NMR, FRET, and single-molecule methods). Different membership functions could be probed computationally, based on the regulatory characteristics (e.g., changing an auto-inhibited to an active state).

The fuzzy formalism (4) is particularly useful to relate the sequence sets to function. Here structural features, which could be predicted from the sequence (e.g., secondary structure elements, disordered regions, or post-translational modification sites) may serve to generate the pattern of interaction elements (PI), to define the fuzzy sets. This approach has been implemented in simulations of higher-order protein organizations [76].

#### **11. Conclusion and Outlook**

Proteins deal with uncertain information, regarding cellular conditions. The information is not only imprecise, but various components are unknown or are unpredictable, owing to the non-random fluctuations in the system. The functional characteristics of proteins need to be adjusted to this poorly defined environment. The classical models in protein science, such as the structure-function paradigm, are based on well-defined properties and cannot deal with the ambiguities related to "noise". The fuzzy set theory offers a quantitative framework to reformulate the structure-function paradigm for describing the stochastic cellular behavior of proteins (Figure 1D). This approach will provide a more holistic protein model, which can be applied to generate interaction or metabolic networks of different cell lines as well as more reliable pharmacophore models.

**Funding:** Financial support is provided by GINOP-2.3.2-15-2016-00044, HAS 11015 and the DE Excellence Program.

**Conflicts of Interest:** The author declared no conflicts of interest.

#### **References**


© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Moonlighting Proteins in the Fuzzy Logic of Cellular Metabolism**

### **Haipeng Liu <sup>1</sup> and Constance J. Je**ff**ery 2,\***


Academic Editor: Pier Luigi Gentili Received: 30 May 2020; Accepted: 23 July 2020; Published: 29 July 2020

**Abstract:** The numerous interconnected biochemical pathways that make up the metabolism of a living cell comprise a fuzzy logic system because of its high level of complexity and our inability to fully understand, predict, and model the many activities, how they interact, and their regulation. Each cell contains thousands of proteins with changing levels of expression, levels of activity, and patterns of interactions. Adding more layers of complexity is the number of proteins that have multiple functions. Moonlighting proteins include a wide variety of proteins where two or more functions are performed by one polypeptide chain. In this article, we discuss examples of proteins with variable functions that contribute to the fuzziness of cellular metabolism.

**Keywords:** moonlighting proteins; fuzzy logic; intrinsically disordered proteins; metamorphic proteins; morpheeins

#### **1. Introduction**

Fuzzy logic systems include variables that can be any real number between 0 and 1 instead of being limited to the Boolean logic variables of only 0 and 1. This enables expression of complexity, uncertainty, and imprecision. In general, the vast interconnected biochemical pathways that make up the metabolism of a living cell can appear fuzzy because they are complex and hard to predict, and we have incomplete and not yet accurate knowledge. A single cell contains thousands of proteins performing a wide variety of activities, and the proteins have complex and constantly changing levels of expression, levels of activity, and patterns of interactions with other proteins and other molecules. Adding even more layers of complexity is the ability of many proteins, called moonlighting proteins, to perform more than one function. Moonlighting proteins are proteins in which one polypeptide chain performs more than one physiologically relevant biochemical or biophysical function [1–3] (Figure 1). The MoonProt Database (www.moonlightingproteins.org) contains annotations for over 300 experimentally confirmed moonlighting proteins, of which about 130 proteins are from human [4,5]. Although the mechanisms by which one protein performs two different functions are not always understood, it is clear that the function (or functions) performed at any specific time can be affected by multiple factors, and sometimes combinations of factors, including targeting to different cellular compartments, changes in the intracellular concentration of ligands, and changes in environmental conditions. In this paper, we describe examples of moonlighting proteins and some of the mechanisms by which they change function. These examples help illustrate and complement the ideas in this collection of papers on the topic of "The Fuzziness in Molecular Supramolecular, and Systems Chemistry", where Gentili presents the "Fuzziness of the Molecular World" and describes natural information systems that involve fuzzy logic in large part due to proteins having multiple features

and functions that vary in a context-dependent manner [6]. In addition, the paper by Fuxreiter describes using fuzzy set theory in a quantitative framework for describing the relationships between changing protein structures, interactions, and functions under changing, and somewhat unknown or unpredictable, cellular conditions [7].

**Figure 1.** In a moonlighting protein (purple oval), more than one physiologically relevant biochemical or biophysical function is performed by a single polypeptide chain. Note: This figure was "Created with BioRender.com".

#### **2. Examples of Moonlighting Proteins and Factors that A**ff**ect Function**

#### *2.1. Cellular Localization*

The most often observed subclass of moonlighting protein includes proteins that perform different functions in different cellular localizations. Over 100 enzymes and chaperones that catalyze reactions in the cytosol can be secreted and act as cytokines that modify the host's immune system or become bound to the cell membrane where they serve as cell surface receptors, and in some cases these second functions contribute to virulence [8–11].

Enolase is one of these intracellular/surface moonlighting proteins in many species, including eukaryotes as well as prokaryotes. Inside the cell, it catalyzes the conversion of 2-phosphoglycerate to phosphoenolpyruvate in glycolysis. When displayed on the cell surface, it binds to host proteins (Figure 2a). The enolases from *Aeromonas hydrophila*, *Bacillus anthracis*, *Neisseria meningitidis*, *Streptococcus pneumoniae*, *Trichomoniasis vaginalis* and *Lactobacillus crispatus* can bind to host plasminogen [12–17]. The binding of plasminogen plays an important role in invasion of host tissues because, once bound to the cell surface receptor, the plasminogen becomes converted to the active protease, plasmin, which can aid in breaking down host extracellular matrix and invasion of tissues [18,19]. In some species, surface-located enolase and other intracellular/surface moonlighting proteins bind to other host proteins for colonization or for modulating the host immune system. *Streptococcus suis* enolase can also bind to host fibronectin, and *Staphylococcus aureus* enolase exhibits laminin binding activity [20,21].

Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) is another commonly found intracellular/ surface moonlighting protein. It catalyzes the conversion of glyceraldehyde 3-phosphate to glycerate 1,3-bisphosphate in glycolysis in the cytoplasm. Some commensal bacteria that colonize the human gut use GAPDH on the cell surface to bind to host mucin and enable colonization of the gut [22,23]. When expressed on the cell surface, *Streptococcus pyogenes* GADPH can bind to plasminogen and fibronectin, and can also function as a ADP-ribosylating enzyme and assist with neutrophil evasion [24–27]. In addition, *Streptococcus agalactiae* GADPH can act as a modulator of the host's immune system [28].

As the bacterial HSP70 (heat shock protein 70), DnaK is abundantly expressed in the cytosol as a stress-inducible chaperone. *Mycobacterium tuberculosis* DnaK can be displayed on the cell surface to bind to plasminogen, and it can also bind to CD40 (cluster of differentiation 40) to stimulate the synthesis of monocyte chemokines and the maturation of dendritic cells [29,30].

**Figure 2.** Examples of moonlighting proteins. (**a**) Enolase is a cytosolic enzyme in glycolysis and also a plasminogen receptor when displayed on the cell surface (PDB ID: 1W6T [31]). (**b**) Aconitase is an enzyme in the citric acid cycle when it contains an iron/sulfur cluster bound in the active site of the protein (PDB ID: 2B3Y [32]). When the cellular iron level decreases and the iron/sulfur cluster disassociates, aconitase undergoes a large conformational change that enables it to bind to iron-responsive elements in mRNA (PDB ID: 3SNP [33]) to promote the expression of proteins involved in iron uptake. (**c**) Under normal cellular conditions, peroxiredoxin is predominantly a dimer (PDB ID: 5B6M [34]) that functions as a peroxidase that converts hydrogen peroxide to water. Under heat shock or oxidative stress, it converts to a higher molecular weight form, a decamer, that acts as a molecular chaperone that assists with protein folding (PDB ID: 6E0F [35]). Note: This figure was "Created with BioRender.com", and the visualizations of the protein structures were created with Mol\* [36] on the RCSB PDB website (rcsb.org) [37,38].

#### *2.2. Interactions with Other Proteins and Molecules*

Changes in cellular concentration of substrates or other ligands can serve as a trigger for changing protein functions. Aconitase is an enzyme in the citric acid cycle that contains an iron–sulfur cluster in the active site (Figure 2b). When the intracellular concentration of iron is high, the iron–sulfur cluster in the active site enables the isomerization of citrate to isocitrate [39]. In contrast, when the cellular level of iron is low, the iron–sulfur cluster is lost, and the enzyme changes conformation to expose an RNA binding surface and becomes an iron-responsive element binding protein (IRBP). As an IRBP, it binds to Iron Responsive Element (IRE) sequence motifs in mRNA, leading to increased translation of proteins involved in cellular iron uptake [40,41]. Another example is BirA, which performs functions as an enzyme and a transcription repressor in the biotin regulatory system [42]. The enzyme's function is determined by the cellular need for biotin. When the need is high, when the cells are growing rapidly, BirA functions as a biotin–[acetyl–CoA-carboxylase] ligase that transfers a biotinyl moiety to the biotin carboxyl carrier protein subunit of acetyl–CoA carboxylase. When the demand decreases, under slower growth rates, BirA binds to DNA and functions as a biotin–operon repressor to inhibit the production of biotin [43].

As with aconitase and BirA, the different functions of many moonlighting proteins require interactions with different proteins, multiprotein complexes, DNA, RNA and other macromolecules. Many ribosomal proteins are moonlighting proteins, such as ribosomal proteins S3, S13, S14, L2, L4, L5, L7, L11, L13a, L23 and L26 [44–54]. These proteins have a second function when they dissociate from the ribosome and interact with other molecules. For example, the ribosomal protein S3 is a component of the 40S subunit of the ribosome, which is located in the cytoplasm. When S3 dissociates from the ribosome, it can enter the nucleus and act as a deoxyribonuclease (DNase) that cleaves apurinic/apyrimidinic sites during DNA repair [55]. Like the cytosolic aconitase mentioned above, mitochondrial aconitase acts as an enzyme in the citric acid cycle to catalyze the isomerization of citrate to isocitrate. However, instead of a function in translation, mitochondrial aconitase plays an important role in the maintenance of mitochondrial DNA that is independent of its catalytic role in the citric acid cycle [56].

#### *2.3. Environmental Stress*

Another common factor for moonlighting proteins to switch functions is environmental stresses. One example is peroxiredoxin, which in many species changes function from an enzyme to a protein-folding chaperone in response to oxidative stress or heat shock [57] (Figure 2c). In a lower molecular weight form, peroxiredoxin acts as a peroxidase that reduces hydrogen peroxide to water. However, under stress conditions, peroxiredoxin undergoes a shift to a higher molecular weight homo-oligomeric complex, comprised of five dimers connected by hydrophobic interactions [35]. The high molecular weight complex is a molecular chaperone that helps with folding and stabilizing proteins disrupted by the cell stress conditions.

Another example is the DegP protease, which also has a temperature-dependent change of function, where it transitions between a protease and a molecular chaperone. Under low temperature conditions, DegP functions as a molecular chaperone with the proteolytic site inactivated. When temperatures increase, the proteolytic site is activated and DegP can catalyze protein degradation [58].

#### *2.4. Changes in Protein Structure*

Changes in cellular localization, interaction partners, and environmental conditions can trigger a change in protein function with little or no change in protein structure. For example, in some moonlighting enzymes that are also transcription factors, ligand binding can turn on the transcription factor function by causing a relatively small change in overall structure that increases its DNA binding affinity [59]. Other moonlighting proteins undergo a large conformational change, such as cellular aconitase becoming IRBP, which involves a large relative movement of several domains within a protein subunit to uncover a previously buried mRNA binding surface. The peroxiredoxins described above undergo changes in quaternary structure to become chaperones.

In general, many moonlighting proteins undergo changes in structure, which can range from small movements of surface loops to the more drastic change in tertiary or quaternary structure observed in intrinsically disordered proteins (IDPs), metamorphic proteins, or morpheeins.

Intrinsically disordered proteins have a region or subunit that is unfolded and can in some cases enable a switch from unfolded to multiple folded structures that enable interactions with different proteins. Metamorphic proteins contain a domain or subunit that folds into more than one stable structure. Morpheein proteins have subunits that disassemble, change conformation, and reassemble into different quaternary structures. In some cases, these structural changes are involved in regulation of a single function, but in other cases, the structural changes are correlated with a switch between two different functions.

In this next section, we describe examples of IDPs, metamorphic proteins, and morpheeins and how variability in protein structure contributes to some of these proteins being moonlighting proteins with "fuzziness" in protein function.

#### 2.4.1. Intrinsically Disordered Proteins

Intrinsically disordered proteins contain regions, sometimes the entire polypeptide chain, that are unfolded under physiological conditions. Some unfolded regions are fully functional without becoming completely folded, and others undergo reversible folding when binding with other molecules [60,61]. In some cases, this flexibility enables one IDP to bind to a variety of cellular components, including small molecules, other proteins, DNA, or RNA, which enables many proteins to perform more than one function and contributes to the complexity in cellular metabolism and regulation of transcription and translation, molecular translocation, DNA repair and replication, and cell signaling [62–65].

Thirteen proteins in the database of moonlighting proteins (MoonProt, moonlightingproteins.org) are also found in the database of intrinsically disordered proteins (DisProt.org) (Table 1) [5,66]. Two proteins are discussed further below, p53 and thymosin beta-4.

The tumor suppressor protein p53 is a moonlighting protein and also an IDP with roles in cell cycle regulation, DNA repair and apoptosis [67,68]. In normal cells, the levels of p53 are low, but if cells sense environmental dangers that cause DNA damage, such as toxins, viruses or radiation, the level of p53 rises. Then, p53 binds to regulatory elements in the genome, activating a cascade of cellular responses to stop cell division and prevent cells from uncontrolled growth. In DNA repair, p53 can interact directly with DNA polymerase and AP endonuclease to stimulate base excision repair [69]. Outside the nucleus, p53 also has several cytoplasmic functions, including in centrosome duplication, induction of apoptosis and inhibition of autophagy [68].

In the native state, p53 has both folded and unfolded domains. The folded core domain is a DNA binding domain that recognizes specific regulatory elements. The folded tetramerization domain at the center of the protein joins protein subunits together into a homo-tetramer [70]. The *N*-terminal transactivation domain that interacts with and activates transcription factors is intrinsically disordered [71]. The *C*-terminal domain and the linker regions in between domains are also intrinsically disordered and fairly flexible, which enables the protein to adjust its conformation upon binding to specific regulatory sites in the DNA [72]. The flexibility of the intrinsically disordered domains allows p53 to recognize and bind to a large number of regulatory elements, so that it can regulate transcription in many different sites of the genome.

Thymosin beta-4 (Tβ4) is another moonlighting protein that is also an IDP. Tβ4 mainly functions in sequestering G-actin (monomeric actin) to prevent it from polymerization [73,74]. Tβ4 also has multiple moonlighting functions that are involved in diverse cellular roles including enhancement of endothelial cell differentiation, stimulation of angiogenesis, tissue regeneration and inhibition of inflammatory responses [75–78].

The free form of Tβ4 is intrinsically disordered and predominantly unstructured in solution. However, upon binding with G-actin, Tβ4 becomes fully folded and structured, where an extended conformation in the central region and two helices at the *N*-, *C*-termini can be identified [79,80]. In addition, Tβ4 forms complexes with PINCH (Particularly Interesting New Cys His-containing protein), ILK (Integrin-Linked Kinase) and stabilin-2 (an endocytic receptor for hyaluronic acid) respectively, where weak, transient and structurally ambiguous protein–protein interactions take place [81–83].



#### *Molecules* **2020**, *25*, 3440

Protein structures in which significant conformational heterogeneity, disorder or ambiguity remain after formation of the complex are referred to as "fuzzy interactions" or "fuzzy complexes" and the remaining flexibility or disorder can be important in the assembly or activity of the complexes [116–118]. Fuzzy interactions and complexes can enable interactions with alternative partners and sensitivity to post-translational modifications. Fuzzy interactions also play a large part in several types of supramolecular interactions including in intracellular lipid droplets, which are described in another paper in this collection by Uversky [119].

The fuzzy complexes formed between IDPs and their binding partners, as mentioned above, can be an important feature of IDPs in fulfilling their functional versatility. Moonlighting proteins GCN4, HMGB1, CFTR, and Ure2 are found to be part of fuzzy complexes in the Fuzzy Complex Database (http://protdyn-database.org) [120]. As an example, GCN4 is a transcription activator for several genes, and is also a ribonuclease [86]. As a transcription activator, GCN4 binds Gal11 (an activator) in a weak and low affinity mode with multiple conformations. This conformational ambiguity is a typical example of a fuzzy complex where no single binding conformational state has been identified [117].

#### 2.4.2. Metamorphic Proteins

Metamorphic proteins add another layer of complexity to our understanding of protein structure and function [121,122]. In stark contrast to the dogma of one sequence, one structure, one function, metamorphic proteins have two or more folded structures as their native structures, and in some cases the different structures have different functions. Distinct from the intrinsically disordered proteins, where the native states are folded or unfolded, the native states of metamorphic proteins are both folded and structured. The interconversions between native structures are reversible, meaning that at equilibrium there is a balance between the native structures [123]. Although the fact of two native structures appears to be against the thermodynamic principles of protein folding, in which the native structure of a protein has the overall lowest free energy, it has been shown that the two native structures can have similar energies with a low activation barrier of refolding [124]. So far, a small number of metamorphic proteins have been discovered, including lymphotactin, RfaH, CLIC1, Mad2, KaiB, IscU, Selecase and HIV-1 reverse transcriptase, of which the first three are also moonlighting proteins [125–132].

The C family chemokine lymphotactin (Ltn) is a metamorphic protein and also a moonlighting protein with heparin-binding activity [133]. Under normal physiological conditions, 37 ◦C and 150 mM NaCl, lymphotactin exists in an equilibrium between two native states, Ltn10 and Ltn40. Ltn10 is a monomer possessing a mix of beta sheet and alpha helix in a canonical chemokine fold that undergoes refolding and dimerization to become Ltn40, which contains a beta sandwich (Figure 3). While Ltn10 is an agonist for the X-C G-protein coupled chemokine receptor 1 (XCR1), Ltn40 can't bind to XCR1 but instead can bind to heparin, a glycosaminoglycan component of the extracellular matrix. At equilibrium, there are nearly equal amounts of Ltn10 and Ltn40. The interconversions between Ltn10 and Ltn40 can be controlled by small changes in salt concentration and temperature. When the salt concentration is high and the temperature is low, the presence of Ltn10 is predominant, however, at lower salt concentrations and higher temperatures, but still below 40 ◦C, Ltn40 is the predominant species [125]. Most other chemokines don't appear to undergo these transformations because they contain two disulfide bonds. Because lymphotactin only has one disulfide bond, it is less restricted and more flexible in changing conformations compared to other chemokines, which partially explains the reversibility between two distinct native structures. In experiments where an extra disulfide bond was introduced, lymphotactin could be locked in only one native state and the transition to the other state was prohibited, suggesting that a single amino acid modification can change the functionality of lymphotactin significantly [134].

Another metamorphic protein with moonlighting activity is RfaH, which functions as a transcription factor that inhibits termination and is also a translation factor. RfaH has two domains: a *C*-terminal domain (CTD) and an *N*-terminal domain (NTD). As a transcription factor, these two

domains are tightly bound together, and the CTD is in an all-helical conformation that masks the RNAP (RNA polymerase) binding surface on the NTD, preventing the NTD from interacting with RNAP. When the NTD binds to specific operons in DNA, the two domains are separated, which enables the NTD to interact with RNAP. Meanwhile, the CTD undergoes a transformation from an all-helical conformation to an all-beta one, after which the CTD is able to recruit ribosomes and potentiates the translation of operons controlled by RfaH [126,135].

**Figure 3.** Structures of lymphotactin, a metamorphic protein that is also a moonlighting protein. There are two tertiary folds for lymphotactin, Ltn10 and Ltn40. Ltn10 has a classical chemokine fold with a mix of alpha-helix and beta-sheet (PDB ID: 2HDM [134]). Ltn40 possesses a dimeric form with each subunit composed of mainly beta-sheets (PDB ID: 2JP1 [125]). Note: This figure was "Created with BioRender.com", and the visualizations of the protein structures were created with Mol\* [36] on the RCSB PDB website (rcsb.org) [37,38].

In addition, CLIC1 is a metamorphic protein with two native folded states and also a moonlighting protein with two different functions [136]. In an oligomeric form, CLIC1 functions as a transmembrane chloride ion channel with its *N*-terminus folded in an all-alpha native state. In a monomeric and soluble form, CLIC1 functions as an oxidoreductase with a transformed *N*-terminus that is a mixed structure containing an alpha helix and a beta sheet [137,138].

#### 2.4.3. Morpheein Proteins

The subunits of morpheeins form a multimer that can disassemble, change conformation (without refolding), and reassemble into a different multimer [139].

Porphobilinogen synthase (also known as delta-aminolevulinic acid dehydratase) is the prototype of a morpheein. It has two oligomeric states that correlate with different levels of enzyme activity and binding of allosteric effectors [140]. An octamer can disassemble into dimers. While part of a dimer, domains within the subunits can shift in their relative positions to result in subunits with a different conformation that can then assemble into a hexamer (Figure 4). While these different homomultimers vary in their level of activity of one function, porphobilinogen synthase enzyme activity, the protein is also a moonlighting protein because it has a second function in which it binds to and inhibits the proteasome [141,142].

**Figure 4.** Structures of porphobilinogen synthase, which is both a morpheein and a moonlighting protein. Porphobilinogen synthase can form two homo-multimers, a low activity hexamer (PDB ID: 1PV8 [143]) and a high activity octamer (PDB ID: 1I8J [144]). The two multimers can interconvert through two homo-dimers, with different subunit conformations. In addition to its catalytic function, porphobilinogen synthase has a second function as an inhibitor of the proteasome. Note: This figure was "Created with BioRender.com", and the visualizations of the protein structures were created with Mol\* [36] on the RCSB PDB website (rcsb.org) [37,38].

The ebolavirus VP40 is a morpheein and moonlighting protein that has three different functions in the virus life cycle, and each of these functions corresponds to a different arrangement of subunits [145]. An octameric ring structure binds to viral RNA to regulate its transcription while in host cells. A butterfly-shaped dimer moves to the host cell's plasma membrane. Then a linear hexameric form assembles into a larger structure that is needed for budding.

#### **3. Moonlighting Proteins in Cellular Complexity**

The variety of functions and combinations of functions of moonlighting proteins contribute to the complexity of cellular metabolism. Protein function, and in many cases the structure, is dependent on cellular factors that can vary due to intracellular conditions and the extracellular environment, and the output can change in a dynamic way. The examples given above are just some of the known factors affecting the functions of moonlighting proteins, and one protein often responds to multiple signals or combinations of signals. Some switches in function are reversible, others are not. Within a single cell, some copies of a moonlighting protein can be performing one function, some another, and some both simultaneously, depending on the protein, the cell type, and on the individual cell's metabolic state and environmental conditions.

These factors that make our understanding of the cell difficult are valuable to the cell because they help enable dynamic responses to fluctuations in conditions within the cell and in its environment. Because the function of moonlighting proteins can depend on multiple factors, they can also be components of controllable cellular responses and be involved in processing information. Some moonlighting proteins also help regulate the level of activity of other proteins in the cell, for example by an enzyme with a second function in a cell signaling pathway or as translation or transcription factors. Because moonlighting proteins can have different activities in different cell types, they also contribute to different cell types having different phenotypes with specialized functions.

In addition to actually contributing to the complexity of the many interconnected pathways and processes in the cell, the ability of a protein to perform two different functions adds to the "fuzziness" of our inability to fully understand, predict, and model the activities in a cell and how they interact and are regulated. There are many things we still don't know about moonlighting proteins. First, we don't know how many proteins are moonlighting proteins. Many protein functions were found by serendipity and researchers are often looking for one type of function when they study a protein, not all of the functions that might be there. There are also many proteins identified through sequencing projects for which we don't know any functions. We also don't completely understand the

triggers and mechanisms for switching between different functions—the cellular conditions, ligands, protein–protein interactions, conformational changes, PTMs, etc. involved and how all the different triggers combine.

Our understanding of moonlighting proteins and our ability to predict which proteins are moonlighting proteins and what are their functions is also complicated because a protein can have one second function, but a homologous protein can have a different second function, for example the cytoplasmic and mitochondrial aconitases mentioned above. Leucyl-tRNA synthetase is another example. It is an enzyme that attaches leucine to tRNA, but it has additional functions that vary in yeast and humans. In the yeast *Saccharomyces cerevisiae*, leucyl-tRNA synthetase is involved in intron splicing in RNA [146]. However, in humans it is involved in cell signaling, where it senses the cellular leucine concentration and binds to and activates Rag GTPase, leading to the activation of mTORC1 (mammalian target of rapamycin complex 1) [147].

In fact, a family of homologous enzymes can include enzymes with the canonical catalytic function of the protein family, moonlighting proteins with different combinations of catalytic and non-catalytic functions, as well as enzymes with variations on the canonical catalytic function (i.e., different substrates or chemical reaction catalyzed) and even pseudoenzymes, which can resemble active enzymes but have no catalytic function [148,149]. Though noncatalytic, pseudoenzymes play important roles in regulating the activity of their catalytic homologues, facilitating the assembly of scaffolding complexes and coordinating transcription and translation [150]. The loss of the catalytic functions can be attributed to a variety of aspects, such as the loss of essential amino acid residues needed for catalysis in the active site, a blockage of the entrance to the active site, and mutations of the amino acids involved in binding the substrate [151,152]. The first reported pseudoenzyme was alpha-lactalbumin, which is homologous to the enzyme lysozyme but does not have catalytic activity [153,154]. Instead, alpha-lactalbumin is a component of lactose synthase, serving as a regulatory subunit that increases the substrate binding affinity for the catalytic subunit of the enzyme [155]. Another example is found in the argininosuccinate lyase protein family. The canonical argininosuccinate lyase enzymes catalyze the breakdown of argininosuccinate into arginine and fumarate. Delta1-crystallin is a pseudoenzyme member of the family with a function as a structural protein in the lens of the eye in birds and reptiles. Another member of the protein family, delta2-crystallin, shares 94% amino acid sequence identity with the delta1-crystallin and is both a crystallin and a catalytically active argininosuccinate lyase that catalyzes the breakdown of argininosuccinate into arginine and fumarate [156,157].

#### **4. Conclusions**

The variability in the functions of moonlighting proteins, including some intrinsically disordered, metamorphic and morpheein proteins, contribute immensely to the fuzziness concept of cellular metabolism as described by Gentili [6]. Many types of proteins have multiple functions that interact in a complex pattern of interacting pathways and processes. As cellular conditions change due to metabolism and environmental conditions, the functions of these proteins change, resulting in different combinations of interactions and processes. The fuzziness concept also represents our limited understanding of these players and our inability to fully predict and model their actions and interactions. We don't yet know how many proteins are moonlighting proteins or the full complement of their functions, and in many cases, we also don't know the cellular factors that affect their functions. Moreover, while many advances have been made in our ability to predict protein functions, the variety of functions found even among homologous proteins adds to the fuzziness of our predictions.

**Author Contributions:** Writing—original draft preparation, H.L. and C.J.J.; writing—review and editing, H.L. and C.J.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** Research on moonlighting proteins in the Jeffery Lab is supported by an award from the University of Illinois Cancer Center.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Perspective* **Supramolecular Fuzziness of Intracellular Liquid Droplets: Liquid–Liquid Phase Transitions, Membrane-Less Organelles, and Intrinsic Disorder**

## **Vladimir N. Uversky 1,2,\***


Academic Editor: Pier Luigi Gentili Received: 5 August 2019; Accepted: 6 September 2019; Published: 7 September 2019

**Abstract:** Cells are inhomogeneously crowded, possessing a wide range of intracellular liquid droplets abundantly present in the cytoplasm of eukaryotic and bacterial cells, in the mitochondrial matrix and nucleoplasm of eukaryotes, and in the chloroplast's stroma of plant cells. These proteinaceous membrane-less organelles (PMLOs) not only represent a natural method of intracellular compartmentalization, which is crucial for successful execution of various biological functions, but also serve as important means for the processing of local information and rapid response to the fluctuations in environmental conditions. Since PMLOs, being complex macromolecular assemblages, possess many characteristic features of liquids, they represent highly dynamic (or fuzzy) protein–protein and/or protein–nucleic acid complexes. The biogenesis of PMLOs is controlled by specific intrinsically disordered proteins (IDPs) and hybrid proteins with ordered domains and intrinsically disordered protein regions (IDPRs), which, due to their highly dynamic structures and ability to facilitate multivalent interactions, serve as indispensable drivers of the biological liquid–liquid phase transitions (LLPTs) giving rise to PMLOs. In this article, the importance of the disorder-based supramolecular fuzziness for LLPTs and PMLO biogenesis is discussed.

**Keywords:** intrinsically disordered protein; intrinsically disordered protein region; liquid–liquid phase transition; protein–protein interaction; protein–nucleic acid interaction; proteinaceous membrane-less organelle; fuzzy complex.

#### **1. Introduction to Proteinaceous Membrane-Less Organelles**

It is recognized now that the cellular interior represents a highly crowded space, where various biological macromolecules (such as nucleic acids, polysaccharides, proteins, and ribonucleoproteins) occupy 5–40% of the cellular volume, and where the total concentration of these biological macromolecules can be as high as 80–400 mg/mL [1,2], with the total intracellular concentration of protein being expected to be up to 300 mg/mL, while the RNA levels can range from 20–100 mg/mL [3]. Importantly, recent studies revealed that all these biomacromolecules are distributed within a cell in a highly inhomogeneous manner, often forming different intracellular bodies or intracellular liquid droplets, which are known by different names, such as cellular (or nuclear) micro-domains, cellular (or nuclear, or mitochondrial) subdomains, intracellular (or intranuclear, or intramitochondrial, or intrachloroplast) bodies, non-membranous cytoplasmic (or nucleoplasmic) granules, and proteinaceous membrane-less organelles (PMLOs), which are commonly found in eukaryotic cells and bacteria [4–12]. Since PMLOs reversibly and controllably isolate target molecules in specialized compartments, they

constitute an intricate answer to the cellular need to facilitate and control molecular interactions [5]. In fact, PMLOs serve as an important complement to the common membrane-encapsulated organelles, such as nucleus, mitochondria, Golgi apparatus, Golgi vesicles, smooth endoplasmic reticulum, rough endoplasmic reticulum, lysosomes, peroxisomes, secretory vesicles or granules (e.g., insulin granules), chloroplasts, and vacuoles. These membrane-bound organelles represent evolutionarily conserved compartments with complex barriers (membranes) permitting spatial isolation as well as energy-efficient and passive buffering of stochastic events [13].

Although traditional membrane-encapsulated organelles represent functionally optimized (and evolutionary conserved) compartments, where membranes provide the physical separation within a cell needed for some specialized processes to occur, PMLOs, which are also functionally optimized compartments, are not surrounded by a membrane (as follows from their name). PMLOs represent condensed heterogeneous liquid-like mixtures of proteins and nucleic acids formed via liquid–liquid phase separation (LLPS) or biological liquid–liquid phase transitions (LLPTs).

By concentrating specific proteins (and frequently RNA and/or DNA), biological LLPTs generate PMLOs, which are considered as intracellular functional hot spots that serve as organizers of cellular biochemistry [14,15]. The resulting PMLOs are many, and cytoplasmic granules include centrosomes [16], germline P-granules (germ cell granules or nuage) [17,18], neuronal RNA granules [19], processing bodies or P-bodies [20], and stress granules (SGs) [21]. There is only one type of PMLO in mitochondria and in chloroplasts, chloroplast SGs and mitochondrial RNA granules. On the other hand, the nucleus contains a large realm of nuclear PMLOs, such as nucleoli [22], nuclear pores [23], chromatin [24], Cajal bodies (CBs; [25]), nuclear stress bodies (nSBs) [26,27], nuclear gems (Gemini of coiled bodies or Gemini of Cajal bodies) [28,29], Sam68 nuclear bodies (SNBs) [30], perinucleolar compartment (PNC) [30], promyelocytic leukemia nuclear bodies (PML nuclear bodies) or PML oncogenic domains (PODs) [31], PcG bodies (polycomb bodies, subnuclear organelles containing polycomb group proteins) [32], paraspeckles [33], Oct1/PTF/transcription (OPT) domains [34], nuclear speckles or interchromatin granule clusters [35], histone locus bodies (HLBs) [36], and cleavage bodies [37], to name a few. This list represents only the tip of the iceberg, as recent studies suggest that 50+ different PMLOs can be found in eukaryotic cells and bacteria [4,5], and this number is increasing on a regular basis.

PMLOs are characterized by different physical properties, molecular compositions, subcellular localizations, cell type-specific features, and fast responsiveness to changes in cellular surroundings and environmental cues. In fact, PMLOs are dynamic, cell size-dependent, liquid-like bodies [9] with dimensions ranging from tens of nm to tens of μm and specific cellular distributions [11]. On the other hand, it has been shown that, although many intracellular bodies are liquid-like droplets with highly dynamic organization [10,15,38–43], some other PMLOs, e.g., amyloid bodies, centrosomes, nuclear pores, and Balbiani bodies, are characterized as "bioreactive gels" whose properties vary from solid-like to gels and viscous liquids [44]. Also, PMLOs are characterized by a high variability of their organizational complexity and compositions. In fact, based on their protein compositions (number of droplet-specific proteins), human PMLOs can be arranged in the following order: nucleolus (1626) > chromatin (1350) > nuclear speckles (650) > centrosome (530) > mitochondrial RNA granules (229) > promyelocytic leukemia protein (PML) nuclear bodies (104) > SGs (57) > perinuclear compartment (55) > Cajal bodies (CBs) (54) > polycomb group (PcG) bodies (48) > P-granules (Perinuclear RNA granules specific to the germline) (19) > nuage (18) > cleavage bodies (14) > Gemini (10) > SAM68 bodies (8) > paraspeckles (6) > nuclear SGs (5) = OPT (Oct1/PTF/transcription) domain (5) > histone locus bodies (HLBs) (4) = neuronal ribonucleoprotein (RNP) granules (4) [45]. Furthermore, environmental changes can also affect the protein composition and the physical properties of PMLOs [11], and this variability is controlled by different cellular factors, including (but not limited to) the stage of the cell cycle, the presence of growth stimuli, or stress [11].

LLPTs causing the PMLO formation may be triggered by a variety of environmental factors, such as: fluctuations in levels of biomacromolecules (proteins and nucleic acids) undergoing phase

separation; variations in the concentrations of specific small molecules or salts; changes in temperature, osmolarity, and/or pH of the solution; various alterations of the phase-forming proteins caused by a multitude of posttranslational modifications (PTMs), alternative splicing, or binding of certain partners; or alterations of the environmental conditions modulating the protein–nucleic acid or the protein–protein interactions [8,9,14,46,47]. One should also keep in mind that the biological LLPTs and the related processes of PMLO formation are strongly condition-dependent, completely reversible, and tightly controlled [4,5]. This is schematically represented by Figure 1 showing LLPT and factors triggering these transitions.

**Figure 1.** Schematic representation of liquid–liquid phase transitions (LLPT) and thermodynamic factors (top) and intrinsic disorder-related features controlling liquid–liquid phase transitions in protein solutions. This figure is reprinted from Current Opinion in Structural Biology, Vol. 44, Uversky V.N. Intrinsically disordered proteins in overcrowded milieu: membrane-less organelles, phase separation, and intrinsic disorder, Pages No. 18–30, Copyright 2017, with permission from Elsevier.

Obviously, since there are no membranes around PMLOs, their biogenesis and structural coherence are exclusively governed by the intra-organelle protein–protein, protein–RNA, and/or protein–DNA interactions [48]. Furthermore, due to lack of surrounding membranes, the components of PMLOs are not protected from the environment and rapidly circulate between the organelle and its adjoining surroundings [49,50]. As a result, PMLOs exhibit several features characteristic of liquids. In fact, they show wettability (i.e., they can uphold contact with a solid surface) and possess sufficient surface tension for maintenance of their spherical shape. They can fuse upon contact, flow in response to shear stresses, and drip [17,21,51,52]. Therefore, based on their properties, PMLOs can be classified as a special liquid state (or liquid phase) of cytoplasm, matrix, nucleoplasm, or stroma characterized by the major physico-chemical properties that are rather close to the features of the corresponding intracellular fluids in which they are found [9]. On the other hand, although the intrinsic density and the viscosity of many PMLOs are relatively low, being not very different from those of the cytoplasm or the nucleoplasm [17,21,51–56], the PMLO interior is classified as an overcrowded milieu [4]. This is due to the fact that PMLOs contain noticeably higher total protein concentrations than those found

within the crowded cytoplasm and the nucleoplasm [4]. An illustrative example of this overcrowded nature of PMLOs is given by nucleoli, speckles, and Cajal bodies of the *Xenopus* oocyte nucleus with the total protein concentrations of 215, 162, and 136 mg/mL, respectively. These values are noticeably higher than the total protein concentration of 106 mg/mL in the surrounding nucleoplasm [55]. More globally, although the dilute phase in a cell is maintained at the critical phase separation concentrations of proteins and nucleic acids [11], these biomacromolecules can be concentrated ~10–100-fold within the droplets [53,57], reaching millimolar concentrations [58].

Importantly, recent studies revealed that PMLOs are not homogeneous themselves. In fact, SGs were shown to be characterized by a heterogeneous structure, where the core was more densely packed and less easily accessible than the more diffused shell with easier exchange of the constituents between the SGs and the adjacent cytoplasm [59]. Because the components of a dense core are brought together at early stages of the SG assembly, whereas a diffused shell of these PMLOs is formed at later steps, these different SG phases are kinetically formed at different stages of the SG assembly [59]. Furthermore, using a combination of various in vivo and in vitro approaches with computational modeling, it was recently shown that one of the most studied PMLOs, the nucleolus, possesses layered droplet organization containing internal sub-compartments [60]. These sub-compartments were shown to represent distinct, coexisting, non-coalescing liquid phases formed by LLPTs of specific nucleolar proteins, suggesting that biological phase separation can generate multilayered liquids [60].

PMLOs are crucial for cellular functionality and are now considered as key organizers and regulators of many cellular processes [11]. Since multiple cellular components are concentrated within the PMLOs, they regulate a broad cohort of cellular processes ranging from transcription to translational repression, to RNP assembly and processing, to biogenesis of ribosomes, to transport and degradation of mRNA, and to intracellular signaling [15]. Because the LLPTs causing PMLOs are fast under normal physiological conditions, and because the PMLO components are concentrated in a dynamic, selective, and reversible manner, such intracellular bodies are well suited for processing of local information and for handling rapid and controllable responses to environmental alterations, indicating that at least some PMLOs can serve as dynamic sensors of localized signals [61].

Normally, the highly dynamic structure and composition of polyfunctional and multicomponent PMLOs allow them to provide finely tuned regulation of various intracellular processes. On the other hand, as with many other protein intrinsic disorder-based events and activities [62], even the slightest disruption of the activity of PMLOs and their biogenesis can lead to an imbalance of intracellular regulatory pathways, resulting in the development of various pathological conditions [40,63–73]. For example, although in their normal state, the majority of PMLOs (including SGs) possess liquid-like properties, their aging can promote development of a much less dynamic state that typically coincides with the appearance of fibrous structures [74]. Such aging-related alterations in the mechanical and the physical properties of PMLOs can be of biological and pathological significance [74]. For example, it was shown that the time-dependent changes in the dense core of aging SGs can promote formation of insoluble protein aggregates linked to neurodegenerative diseases [70,75].

#### **2. Proteinaceous Membrane-Less Organelles, Liquid–liquid Phase Transitions, and Intrinsic Disorder**

The facts presented in the previous section indicate that, typically, specific sets of resident proteins can be found in PMLOs. Among the characteristic properties uniting many of these PMLO-residing proteins is the presence of high intrinsic disorder levels, suggesting the overall importance of intrinsically disordered proteins (IDPs) or hybrid proteins with ordered domains and intrinsically disordered protein regions (IDPRs) for LLPTs and PMLOs [4,5,8,45,74,76–84]. In fact, the biogenesis of several PMLOs (e.g., nuages [57], nucleolus [85], P-granules [80], and RNA granules [74]) was shown to be critically dependent on IDPs/IDPRs. This is because the LLPTs driving the PMLO formation are determined by weak multivalent interactions between multi-domain proteins and/or IDPs, hybrid proteins with ordered domains and IDPRs [4,5,86], proteins with RNA-binding domains [87], proteins

containing repeats of amino acids with polar and charged groups, or proteins with low complexity domains (LCDs) [5,9,88].

There are multiple reasons for why IDPs/IDPRs serve as the most appropriate candidates for biological LLPTs leading to PMLO formation. These reasons include: the overall high abundance of IDPs/IDPRs in various proteomes [89–93] {e.g., among the eukaryotic proteins, ~25–30% are mostly disordered [91], long IDPRs (longer than 30 residues) are found in more than half of eukaryotic proteins [89–91], whereas such long IDPRs are present in >70% of signaling proteins [94]}; their lack of fixed structure [95–100]; their high spatio-temporal heterogeneity and mosaic structural organization that constitute a mix of foldons, inducible foldons, morphing inducible foldons, non-foldons, semi-foldons, and unfoldons [86,100–103]; the ability of these proteins to serve as highly promiscuous binders engaged in a multitude of interactions with highly diversified partners and to thereby regulate and control a wide spectrum of cellular processes [95,97–100,104–108]; and their ability to preserve their mostly disordered status within PMLOs that defines the fluidity of these organelles and determines PMLOs as supramolecular fuzzy complexes (see below).

Weak, multivalent, and rather non-specific interactions between one or more IDPs/IDPRs and between IDPs/IDPRs and nucleic acids are expected to drive biological LLPTs, leading to the PMLO formation. The physico-chemical nature of these interactions driving phase separation can be highly diversified and range from π–π contacts to cation–π interactions [15], to hydrophobic interactions, and to heterologous and homologous electrostatic attraction between differently charged biological polymers and differently charged parts of the same protein molecules [4,5]. By virtue of the peculiarities of their amino acid sequences and biophysical properties, IDPs/IDPRs are uniquely positioned in the category of biological macromolecules capable of undergoing LLPTs and controlling the biogenesis of PMLOs. For example, the conformational behavior of IDPs/IDPRs is, at least in part, determined by the presence of a large number of charged residues and depletion in hydrophobic residues [95], which explains the mostly electrostatic nature of their interactions [109]. Since IDPs/IDPRs do not possess stable structures, existing in a form of highly dynamic conformational ensembles of rapidly interconverting flexible structures, mean electrostatic fields are created that are used in polyelectrostatic attraction [110]. Furthermore, since charged residues are typically heterogeneously distributed within the amino acid sequences of many IDPs/IDPRs, patches of similarly charged residues are generated, and such "block co-polymer"-like structure might serve as a good template for the electrostatics-driven LLPTs [5]. More generally, common presence in IDPs/IDPRs of arrays of tandem repeats with different physico-chemical properties [111] creates a foundation of the flexible multivalency needed for LLPTs [5]. Also, IDPs/IDPRs are known to be commonly subjected to various post-translational modifications (PTMs) [112,113]. As LLPTs can be regulated by PTMs [53], this PTM-controlled conformational and functional variability of IDPs/IDPRs is very appropriate for the regulation of PMLO biogenesis [5]. Being the "edge of chaos" systems [63,101,114,115], IDPs/IDPRs are known for their high sensitivity and responsiveness to (even rather subtle) environmental changes. Because of this environmental sensitivity and receptivity as well as the capability to undergo fast, highly controllable, environment-modulated transitions, IDPs/IDPRs play crucial roles in the regulation of LLPTs and PMLOs [5].

#### **3. Dysregulation of the Biogenesis of Intracellular Liquid Droplets and Disease**

It was pointed out that, since the local concentrations of proteins in PMLOs are noticeably higher than those in the surrounding crowded media (and, as a result, the interior of PMLOs is considered as the overcrowded milieu [4]), and since some amyloidogenic proteins can be found in PMLOs and many of these proteins can undergo LLPS both in vitro and in vivo, dysregulation of the biogenesis of intracellular liquid droplets can be related to various human diseases [63]. This suggests the existence of a definite spatio-temporal window of safe existence, where a given PMLO appears at a definite cell location in a response to a definite environmental cue and exists there for a definite amount of time, whereas the pathological conversion from liquid to solid or gel form within the highly concentrated milieu of PMLO might happen outside of this window of safe

existence [40,63,64,67,69–72,74,75,79,116–118]. Generally, molecular mechanisms associated with the said pathological transformations are related to the dysregulated biogenesis of PMLOs, eventually leading to the distortions of their dynamics and the promotion of pathological aggregation. Some of these mechanisms include pathological "aging" of PMLOs (or going beyond the safe time window), increased content of proteins involved in LLPTs, aberrant PTMs, some chromosomal translocation, and pathological mutations [73]. Among proteins for which the aberrant LLPTs are associated with pathological aggregation are TAR DNA binding protein-43 (TDP-43) linked to amyotrophic lateral sclerosis (ALS) [116], microtubule-associated protein tau involved in Alzheimer's disease (AD) [117,118], α-synuclein associated with Parkinson's disease (PD) [66], TDP-43, heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) linked to ALS [66], fused in sarcoma (FUS) associated with the pathogenesis of ALS and frontotemporal lobar degeneration (FTLD) [119], prion protein [120], and many RNA-binding IDPs possessing low complexity domains (LCD) [64]. In other words, pathogenic transformations of PMLOs are often associated with the decreased fuzziness of these intracellular liquid droplets.

#### **4. Supramolecular Fuzziness of Intracellular Liquid Droplets**

An important feature of PMLOs is their fluidity. The liquid-like properties of phase-separated droplets facilitate the functions of their constituents, which are accumulated within droplets at high concentrations and show slowed diffusion but remain dynamic. In fact, the concentrations of proteins residing within these liquid droplets can be ~10–100-higher than the protein content of the dilute phase [53,57]. Furthermore, being intrinsically disordered, these PMLO-residing proteins can be engaged in multivalent interactions. These observations raise an important question regarding how fluidity can be preserved within the overcrowded milieu of the PMLO interior. It is likely that one can find an answer to this question by analyzing the structural properties of proteins within PMLOs or artificial phase-separated liquid droplets. In fact, if an IDP/IDPR would undergo global folding as a result of an LLPT, then the resulting condensed phase would not be liquid but would contain ordered protein–protein or protein–RNA complexes stabilized by multivalent rigid body–rigid body interactions. Therefore, the fact that PMLOs are liquid indicates that IDPs/IDPRs undergoing LLPTs preserve high levels of intrinsic disorder. Several recent NMR-based studies are in agreement with this hypothesis [121]. For example, the DEAD box protein 4 (DDX4), which is a probable ATP-dependent RNA helicase that serves as a primary constituent of nuage or germ granules [122], was shown by 1H-15N HSQC spectroscopy to remain disordered within the droplets [57]. Similarly, the LCD of the fused in sarcoma (FUS) protein, which is associated with two devastating neurodegenerative disorders—amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [123]—remained mostly disordered within the droplet phase [58]. Also, the microtubule associated protein tau, which is an IDP involved in Alzheimer's disease [124,125] and other tauopathies [126], was shown to undergo LLPT in solution in a phosphorylation-dependent manner and preserved disordered state in the condensed phase [117]. Another IDP, BUB3-interacting and GLEBS motif-containing protein (BuGZ), the phase separation of which is involved in spindle matrix formation and function [127], was shown to remain dynamic in the spindle and its matrix [128].

The capability of IDPs/IDPRs to preserve high levels of disorder in their bound states is known as fuzziness, an important phenomenon emphasizing that formation, function, and/or regulation of the protein-based complexes/assembles are critically dependent on the intrinsic disorder of the constituent proteins [129,130]. Furthermore, it was emphasized that the biological activity of the resulting fuzzy complexes could be affected by fuzzy regions, which not only remained disordered but often escalated their conformational diversity in the bound state [131]. In fact, fuzzy regions are engaged in transient interactions, thereby establishing alternate contacts with specific partners. Flexibility and interactability of such regions can be regulated and controlled by PTMs and alternative splicing [131]. Because of the preservation of high disorder levels, PMLOs and artificial phase-separated liquid droplets represent fuzzy supramolecular complexes.

#### **5. Conclusions**

In summary, data accumulated to date indicate that high levels of intrinsic disorder are found in many PMLO resident proteins and show that the PMLO formation often relies on IDPs/IDPRs, indicating that PMLO biogenesis is crucially dependent on intrinsic disorder [8]. In other words, the lack of stable structure in IDPs/IDPRs, the ability of such proteins to be engaged in highly dynamic, weak, multivalent interactions combined with their capability to retain a highly mobile character after undergoing LLPTs define the liquid-like nature of PMLOs [5]. It is likely that the structural resilience of PMLOs and their capability to exist as stable entities in the absence of enclosing membranes combined with the free exchange of the constituents with the environment are also defined by the same properties of IDPs/IDPRs [5]. In summary, PMLOs are an enthralling form of disorder-based protein assemblages [4,5,86], which are formed without noticeable structural changes or ordering of their constituent IDPs/IDPRs when undergoing LLPTs, and which, as a result, are characterized by a highly dynamic nature defining their liquid-like appearance [57]. In other words, supramolecular fuzziness is crucial for many aspects of PMLO biogenesis, stability, and functionality.

**Author Contributions:** Conceptualization, validation, formal analysis, investigation, writing, V.N.U.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Sample Availability:** Not available.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
