2.1.1. Linguistic Constraint

A linguistic **constraint** is a **relation that puts together two or more linguistic elements** such as *linguistic categories* or *parts-of-speech*. Formally, a linguistic constraint is an n-tuple *<sup>A</sup>*1, ..., *An* where *Ai* are linguistic categories. We usually have *n* = 2. For example, the following linguistic categories can be distinguished for this work:


There are four types of constraints in the Fuzzy Property Grammars (FPGr):


The constraints from FPGr that we will work with to describe linguistic universality and complexity are the following (the *A* and *B* are understood as linguistic categories):


2.1.2. Definition of a Fuzzy Property Grammar

**Definition 1.** *A Fuzzy Property Grammar (FPGr) is a couple*

$$FPGr = \langle \mathcal{U}, FGr \rangle \tag{1}$$

*where U is a universe*

$$
\Delta I = \text{Ph}\_{\rho} \times M \text{r}\_{\mu} \times \text{X}\_{\mathcal{X}} \times \text{S}\_{\delta} \times L\_{\theta} \times \text{Pr}\_{\tilde{\mathbb{S}}} \times \text{Ps}\_{\kappa} \,. \tag{2}
$$

*The subscripts ρ*, ... , *κ denote types, and the sets in Equation (2) are sets of the following constraints:*


*The second component is a function:*

$$FGr: \mathcal{U} \to [0, 1] \tag{3}$$

*which can be obtained as a composition of functions Fρ* : *Phρ* → [0, <sup>1</sup>]*,..., Fκ* : *Psκ* → [0, 1]*. Each of the latter functions characterizes the degree in which the corresponding element x belongs to each of the above linguistic domains (with respect to a specific grammar).*

Technically speaking, *FGr* in Equation (3) is a fuzzy set with the membership function computed as follows:

$$FGr(\langle \mathbf{x}\_{\rho}, \mathbf{x}\_{\mu}, \dots, \mathbf{x}\_{\kappa} \rangle) = \min \{ F\_{\rho}(\mathbf{x}\_{\rho}), F\_{\mu}, \dots, F\_{\kappa}(\mathbf{x}\_{\kappa}) \} \tag{4}$$

where *<sup>x</sup>ρ*, *<sup>x</sup>μ*,..., *<sup>x</sup>κ* ∈ *U*.

Let us now consider a set of constraints from an external linguistic input *D* = {*d* | *d* is a dialect constraint}. Each *d* ∈ *D* can be seen as an *n*-tuple *d* = *dρ*, *<sup>d</sup>μ*, ... , *dκ*. Then, the membership degree *FGr*(*d*) ∈ [0, 1] is a degree of grammaticality of the given utterance that can be said in arbitrary dialect (of the given grammar).

#### *2.2. Fuzzy Property Grammars for Linguistic Universality*

To take into account linguistic universality, we have to point out the following considerations to the previous definitions.

We constraint the universe of our FPGr to only the syntactic domain *<sup>X</sup>χ*. At this point, it is only possible to generate all the possible constraints for the syntactic domain. However, we assume that this formulation is a proof of concept for future work on the rest of the domains.

Therefore, *U*-*FPGr* will be understood exactly as shown in Equation (2).

**Definition 2.** *A Universal Fuzzy Property Grammar (U-FPGr ) is a couple*

$$
\langle \mathcal{U} \text{-} FPGr = \langle \mathcal{U} \text{, } FGF \rangle \tag{5}
$$

However, its (linguistic) universe in written language stands for a simplified version of an *FPGr* because only the syntactical domains (*x*) are relevant for the proof of concept that we are presenting in this work: < *x* >, the others are neglected: *FPGr* =< *U*, *FGr* >.

**Definition 3.** *U-FPGr is*

$$
\mathbf{x} = \mathbf{x} > \mathbf{0} \tag{6}
$$

In this case, *U* is generated as a Cartesian product of all the possible constraints.

*U*

$$\mathcal{U} = \operatorname{Pos}\_{\mathfrak{a}} \times \operatorname{Dep}\_{\mathfrak{f}} \times X\_{\chi} \times \operatorname{Pos}\_{\mathfrak{a}} \times \operatorname{Dep}\_{\mathfrak{f}} \times \operatorname{Pos}\_{\mathfrak{a}} \times \operatorname{Dep}\_{\mathfrak{f}}.\tag{7}$$

The subscripts *α*, ... , *γ* denote types, and the sets in Equation (7) are sets of the following constraints:


From the linguistic point of view, each combination of *U* = *Posα* × *Depβ* is interpreted as a linguistic element such as a noun with subject dependency *NOUN*[*nsubj*], a determiner with a determiner dependency *DET*[*det*], or a verb as the root of the sentence *VERB*[*root*]. Therefore, by repeating this three times, we assume that all the rules follow a linguistic constituent, such as a linguistic element (category and dependency) in relation in terms of syntactic linguistic constraints with another element (category and dependency) and a third element (category and dependency). Because of the fact that some constraints do not need this third element, we will include in our universe the possibility of having a rule without the third element.

Any language that will be computed in terms of linguistic universality will need to follow this formalism to describe its universe. The targeted language will be our linguistic input *L* = {*l* | *l* is a language constraint}. Each *l* ∈ *L* can be seen as an *n*-tuple *l* = *l<sup>α</sup>*, *lβ*, ... , *<sup>l</sup>χ*. Then, the membership degree *FGr*(*l*) ∈ [0, 1] is a degree of universality given a language as a set. As seen, this is just an adaptation of how FPGr treats grammaticality. Therefore, the universality of a targeted language is computed in terms of being grammaticality understood as the membership degree of a targeted language set with respect to *U*-*FPGr* . Therefore, our gradient model suggests the convenience of a terminological change. We consider that it is not necessary to define our proposal as a "search for universals" task. However, on the contrary, what we intend is to search for or define a "spectrum of the universal", or what is the same, any linguistic rule that can fit to a membership degree of universality in terms of [0, 1].

Additionally, we have implemented an *IF* − *THEN* rule to assign a weight value to each rule of the *U*-*FPGr* .


This is quite a natural way of representing universality, since our knowledge of the universals is dependent on the system of language that we know. A rule that might be considered as a universal can become a *quasi-universal* in the moment that new languages are discovered, and such languages do not consider such a rule. Therefore, we are always computing universality in terms of a finite representative set out of the infinite sets of languages. In this case, *U*-*FPGr* is flexible and re-usable, since it can update the weight of universality according any new language inserted as a linguistic input.

#### *2.3. Fuzzy Natural Logic Computing Universals and Linguistic Complexity with Words*

In order to better grasp gradient terminology as it relates to linguistic universals and complexity, we propose to compute the continuum with natural language words. For this, the concepts of *universality* and *complexity* are assumed.

Fuzzy natural logic is based on six fundamental concepts, which are the following: the concept of *fuzzy set*, *Lakoff's universal meaning hypothesis*, the *evaluative expressions*, the

concept of *possible world*, and the concepts of *intension* and *extension*. The most remarkable aspect of this work is the theory of *evaluative linguistic expressions*.

An evaluative linguistic expression is defined as an expression used by speakers when they want to refer to the characteristics of objects or their parts [37,38,40–44] such as *length*, *age*, *depth*, *thickness*, *beauty*, and *kindness*, among others. In this case, we will take into account "*universality*" and "*complexity*" as evaluative expressions.

FNL assumes that the simple evaluative linguistic expression has the general form:

$$
\langle \text{intermediate} \rangle \langle \text{TE-head} \rangle \tag{8}
$$

*TE-head* can be grouped together to form a *fundamental evaluative trichotomy* consisting of two antonyms and a middle term, for example *good*, *normal*, *bad*. For our work, we will take into account the trichotomy of *low*, *medium*, *high*. In this sense, as proposed in [45], the membership scale of universality in linguistic rules recognize:


The value of complexity is obtained from *IF* − *THEN* rules such as:

**Definition 4.** *We characterize fuzzy IF* − *THEN rules for complexity as follows:*


*Similarly, we can express:*


•*IF the value of complexity is* low, *THEN the rule is high universal.*

The membership scale of complexity in linguistic rules is [45]:


•*High Complexity*. Lnguistic rules that have a *low* truth value in terms of weight in *U*-*FPGr*: rules satisfied in almost none of the languages.

A *possible world* is defined as a specific context in which a linguistic expression is used. In case of evaluative expressions, it is characterized by a triple *w* = *vL*, *vS*, *vR*. Without loss of generality, it can be defined by three real numbers *vL*, *vS*, *vR* ∈ R where *vL* < *vS* < *vR*.

*Intension and extension*: Our intension will be simply the membership degree [0–1], while our extension will be dependent on the number of languages we are taking into account in a representative set for evaluating universality and complexity.

Figure 1 represents how Fuzzy Natural Logic accounts for the fuzzy-gradient notion of universality in fuzzy sets. The fuzzy limits between sets must be established. In terms of mathematical fairness rather than from a cognitive perspective, the possible world of 7000 languages has been divided into three parts for each fuzzy set. Therefore,

roughly, each set is computed by 2333 language grammars. The proposed cut-off could be changed. However, we consider that there would not be a big change between the perceived perspective of the fuzzy transitions and the three-cut part criteria. We claim that the concept of universality would be better captured with a trichotomical expression of *small* − *medium* − *big* in terms of *low* − *medium* − *high*. This new way of accounting for universals may have advantages over the classical nomenclature found in the literature [29,46,47] (*universal trend*, *statistical universal*, *rara*, *rarisima*, *typological generalization*, etc.).

**Figure 1.** Linguistic Universality as a Evaluative Expression.

The advantages of the proposed model can be summarized as follows:


The proposed model aims to collect the work already done in linguistics and present a universal characterization for the description of fuzzy linguistic universals and linguistic complexity.

#### **3. Materials and Methods**

#### *3.1. A Fuzzy Universal Grammar with a Representative Set*

From the 7000 languages in the world (an oscillating and debatable number), there are still a large number of them without adequate documentation. Therefore, when one wants to predict possible trends in the set of human languages as a whole, one has to investigate a selection of languages, hoping that the results will be extensible to the rest. This extension of languages is what we will consider in FNL our extension regarding the possible world of our evaluative expression of the notions of linguistic *"universality"* and *"complexity"*. To this end, creating a representative and balanced set is essential. However, this task is by no means easy, as there are many other limitations [29,50,51].

For this reason, linguistic typology has classically proposed different ways of configuring a set that is as varied and independent as possible in order to be as close as possible to the reality of the 7000 languages. The selection of this independence between languages can be based on different criteria: typological, genetic, areal or a combination of them. However, it is still very difficult to find perfect samples due to what is known as bibliographical bias: the data available to us are very limited.

The representative set is build under the data from linguistic corpus. Such data allow us to create sets of languages. Working with a linguistic corpus helps us to obtain a deeper and more quantitative knowledge of cross-linguistic tendencies [52,53]. The problem with this methodology is that the available data are still very limited given its novelty and level of depth, especially in comparison with other resources based on manual notes such as the World Atlas of Linguistic Structures (WALS) [48,50]. Therefore, in order to reduce as much as possible the bibliographic bias of the languages present in Universal Dependencies [54], we have opted for a typological balance.

To create our set, we have taken into consideration three basic typological requirements that influence many other grammatical aspects in languages and their behavior [55]:


We have managed to find a good balance on points (2) and (3); however, this has not been the case with the first aspect, since it is very uncommon for the verb to precede the subject as an unmarked order. Therefore, in the representative set, its presence is also lower. We respect the proportion seen in WALS of a one-tenth part. However, it should be noted that the ascription to a particular typological order is a convenient discrete simplification [56,57].

Subsequently, we have also tried to consider the following aspects of languages to set a useful representative set:


After setting all these requirements, primary and secondary, we decided to use the data from the Universal Dependencies corpora [58]. This data source is chosen, firstly, because it annotates a lot of different languages by part-of-speech, constituents, and dependencies, and, secondly, because it is the only formalism in which MarsaGram [59] can be applied to automatically induce sets of syntactic constraints which can be used to match coincidences between them and our *U*-*FPGr* . After looking at the possibilities offered by Universal Dependencies, the set established consists of the following languages:


Our extension will have a value of 9, and the sets of *low*, *medium*, *high* will range as follows in Figure 2:

**Figure 2.** Linguistic Universality of the representative set as a Evaluative Expression.


Additionally, we have implemented an *IF* − *THEN* rule to assign a weight value to each rule of the *U*-*FPGr*.


As we have mentioned, we are aware that we cannot completely avoid bibliographical bias and, surely, the presence of a language representative of an unrepresented area or another language whose verb precedes the subject should be added. However, the model proposed here allows us to enrich the set once Universal Dependencies has such data in the future.

#### *3.2. Application of the Tasks to Computationally Build a Universal Fuzzy Property Grammar*

We have downloaded the Universal Dependency corpora for each of our sets of representative languages [58], and we have applied Marsagram [59]. Universal Dependency provides us with the constraint of dependency between constituents, and Marsagram automatically induces the constraints of *linearity*, "*co-occurrence*", "*exclusion*", and "*uniqueness*" over a Universal Dependency corpus.

Marsagram will provide us already with quantitative data; however, it is impossible to know which rules are coincident in a *U*-*FPGr*. Therefore, the interpretations that we obtain are more related with the notion of complexity rather than the notion of universality.

Marsagram presents data and rules in the following way:

Figure 3 is an extract for the marsagram from Arabic language. The rule means that verb as root excludes adjective as advcl next to ADJ as c-sub. Because of the fact that we are not interested in the other number, we will clean the data, erasing such noise. Therefore, to satisfy the coincidences, we will only keep the elements in #headproperty, symbol1, symbol2.

**Figure 3.** Rule of the corpus or Arabic in Marsagram.

#### 3.2.1. Building the Universal Fuzzy Property Grammar

To build the Universal Fuzzy Property Grammar ( *U*-*FPGr* ), we applied Equation (7). Table 1 is a representation of such a thing. To clarify, we take into account all the categories or part-of-speech (POS) for all languages according to the tagging in the universal dependencies. We then have 17 elements in POS. We consider the 64 dependencies that are present in the whole system of the universal dependencies. We consider the remaining four constraints. We combine this with two contextual linguistic elements, so, again, POS-dep is repeated twice. We obtain therefore, 4,242,536,496 rules. We repeat the same process again but considering the possibility that the rule only needs one contextual element, so POSdep-properties-pos-dep. We obtain from that 4,734,976 rules. After summing up the both output of rules in terms of linguistic constraints, we obtain a *U*-*FPGr* with 4,247,273,472 rules belonging to the syntactic domain.


**Table 1.** Representation of the elements involved in the production of a *U*-*FPGr* for syntactic constraints.

The technical summary is the following:


The MongoDB database is not necessary, but it solves many problems instead of storing universal grammars in a pure text file. The text file may be huge and have an impact on time complexity for searching for grammar rules.

3.2.2. Preparing Languages for the Universal Fuzzy Property Grammar

As mentioned in Figure 3, the data of each language set had to be cleaned and prepared before checking coincidences. Therefore, we have followed these steps:

	- (a) Load all possible files with language grammar rules.
	- (b) Preprocessing (for example: remove empty spaces, replace \*, . . . ).

Finally, we have applied the weights to each rule, so we can measure its universality and complexity:

	- (1) Send query to the MongoDB database for search rule in Universal Grammar.
	- (2) If we found a rule in Universal Grammar, we insert a new row to the Pandas Dataframe, where the Universal Grammar column will have a current rule, and the column for the current language will have 1 and other languages will have 0.
	- (3) If we not found a rule in Universal Grammar, we insert a Universal Grammar rule and 0 for all language columns.
	- (4) Then, we can compute the total numbers for one rule, and we can put them into the Total column.

Table 2 is a visualization of the output, in this case, of the verb with the dependency of conjunction excluding two elements next to each other. We can observe how the final weights vary from one rule to the other. It is also clear how robust and flexible the system is. We only would need to add another column for any other language set that we would like to include to make our final weight of universality more representative.


**Table 2.** Example of output of rules with universality weight.

To evaluate complexity, it is necessary to just negate the value (apply −1), since universality and complexity work as opposites in terms of *low*, *medium*, *high*. Therefore, a rule is that if its universality weights 0.7, its complexity would be 0.3; if its universality is 0.4, its complexity would be 0.6, and so on.
