*Article* **The E**ff**ects of Weighting Functions on the Performances of Robust Control Systems** †

## **Mircea Dulau and Stelian-Emilian Oltean \***

Department of Electrical Engineering and Information Technology, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, Gh. Marinescu Street, no. 38, 540139 Targu Mures, Romania; mircea.dulau@umfst.ro


Published: 25 December 2020

**Abstract:** An important stage in robust control design is to define the desired performances of the closed loop control system using the models of the frequency sensitivity functions S. If the frequency sensitivity functions remain within the limits imposed by these models, the control performances are met. In terms of the sensitivity functions, the specifications include: shape of S over selected frequency ranges, peak magnitude of S, bandwidth frequency, and tracking error at selected frequencies. In this context, this paper presents a study of the effects of the specifications of the weighting functions on the performances of robust control systems.

**Keywords:** H-infinity synthesis; robust control; robust performances; sensitivity functions; weighting functions

## **1. Introduction**

In general, the design objectives of any control system are defined using different models which implement the desired responses to a specified reference. So, the closed-loop control system becomes stable, achieves the imposed performances, rejects the disturbances and measurement noise, and avoids the saturation of actuators, even in the presence of modeling uncertainties or change in the operating point [1–5].

In the H-infinity synthesis, the objectives refer to the optimization of the H-infinity norm of the closed-loop system, considering all the external input variables (references, disturbances, noises) and all the output variables according to the block diagram from Figure 1, where: *Hp*(*s*) and *HR*(*s*) are the plant and controller transfer functions, *r*—reference input, *e*—control error, *yr*—feedback signal, *y*—the plant output, *u*—control signal, *v*, *l*—disturbances, and η—measurement noise.

**Figure 1.** Block diagram of the closed-loop system.

*Proceedings* **2020**, *63*, 46

The input–output behavior of the system is characterized by the energy transfer from the external variables *r*, *v*, η, *l* to the output variable *y* (and sometime to the control variable *u*). Considering the relation *Hd*(*s*) = *HR*(*s*)*Hp*(*s*), there are four important transfer functions to fully describe the system:


If *l* = 0, the relations between *y* and *r*, *v*, η, respectively, between *e* and *r*, *v*, η are given by:

$$y = \frac{H\_d(s)}{1 + H\_d(s)} \cdot r - \frac{H\_d(s)}{1 + H\_d(s)} \cdot \eta + \frac{1}{1 + H\_d(s)} \cdot \upsilon = T \cdot r - T \cdot \eta + S \cdot \upsilon,\tag{1}$$

$$\varepsilon = \frac{1}{1 + H\_d(\mathbf{s})} \cdot r - \frac{1}{1 + H\_d(\mathbf{s})} \cdot v + \frac{H\_d(\mathbf{s})}{1 + H\_d(\mathbf{s})} \cdot \eta = \mathbf{S} \cdot r - \mathbf{S} \cdot v + T \cdot \eta \tag{2}$$

The sensitivity function *S* describes the input–output behavior from input *v* to the output *y*, if the other input variables are *r* = 0, η = 0, *l* = 0. In (1), if η = 0, *T* = 1, *S* = 0, the reference tracking and disturbance rejection result.

In the block diagram from Figure 1, it is often necessary to include some weighting cost functions, chosen to reflect the design objectives and information about noise and disturbance [5]. The modified block diagram including these weighting functions is presented in Figure 2.

**Figure 2.** Block diagram of the closed-loop system including the weighting functions.

Although there are some recommendations for choosing the weighting functions, these depend on the designer skills and involves several iterations until a final form is achieved, which guarantees the control performances imposed to the closed loop system.

The paper [6] presents the robust analysis of a positioning control system where the weightingfunctions-based tuning method simplifies the H-infinity design procedure. In [7], the μ-synthesis robust design method is used for a multi-model control problem. The selection of the weighting functions is made for low, medium and high frequencies. The studies from [8] presents the disadvantages of the H-infinity design method. At the same time, the authors had chosen the weighting functions for the μ-synthesis of a Proportional-Integrative (PI) controller, in order to improve the performances of the robust system.

A theoretical guide for choosing the weighting functions and the design procedure which assures the system gains are developed in the paper [1].

The paper [9] is focused on determining the weighting functions under two aspects: initial selection and tuning procedure which improves the performances of the closed-loop system. An interesting procedure for choosing the weighting functions for the optimal H-infinity design formulated as an optimization problem is presented in [10]. The paper [11] contains the synthesis issue for a nominal controller with unstable weighting functions. The authors proposed a simplification of the robust controller design procedure.

The papers [2,12,13] present some techniques for choosing the weighting functions, reducing the order of the transfer functions and designing the robust controllers using the Matlab Toolbox. Robust control methods are developed in [3,4] for both linear and nonlinear systems, and approaches of robust control theory based on weighting functions are also addressed in [5].

In [14], the authors proposed a weighting function modeling method which is used in H infinity loop-shaping design and tested in the numerical simulation. Other methodologies for selecting the sensitivity functions are shown in [15]. For multiple input multiple output (MIMO) systems, an application of choosing the reduced order weighting function is developed in [16].

The paper presents in Sections 2 and 3 the models for choosing the weighting functions in accordance with the robust theory, a short analysis of two of these models (*S* and *RS*) and the influence of some parameters (magnitude, bandwidth frequency, tracking error). Section 4 contains the analysis of the performances of the closed-loop control system which depend on the parameters of the weighting functions. The conclusions are highlighted in Section 5.

## **2. Recommendations for Choosing the Models for the Weighting Functions**

The robustness performance requirements depend on the sensitivity functions, whose specifications are included in the frequency behavior models. If these sensitivity functions remain inside the imposed limits, the robustness objectives are met [2–5].

So, for a standard second order model, the sensitivity function depends on the damping ratio and natural frequency according to Figure 3a and relation:

$$S(s) = \frac{s^2 + 2\xi\omega\_n s}{s^2 + 2\xi\omega\_n s + \omega\_n^2}. \tag{3}$$

**Figure 3.** Damping ratio effect on: (**a**) sensitivity function; (**b**) magnitude (peak sensitivity).

On the other hand, the magnitude *MS* depends on the damping ratio, according to the relation:

$$M\_S := \|\mathbf{S}\|\_{\infty} = \frac{\mathbf{x}\sqrt{\mathbf{x}^2 + 4\xi^2}}{\sqrt{4\mathbf{x}^2\xi^2 + \left(1 - \mathbf{x}^2\right)^2}}, \mathbf{x} = \sqrt{0.5 + 0.5\sqrt{1 + 8\xi^2}}. \tag{4}$$

As a result, the performance specifications can be given by:

$$\left| S(s) \right| \le \left| \frac{s}{s/M\_S + \alpha \nu\_{\rm bs}} \right| \cdot s = j\omega \,, \forall \omega \, \mu \nu\_{\rm bs} \text{ is the bandwidth.} \tag{5}$$

In the ideal case, relation |*WeS*| ≤ 1 provides the reference tracking to a step input signal (and a zero steady state control error), meaning:

$$\mathcal{W}\_{\mathfrak{c}} \le \frac{\mathfrak{s}/M\mathfrak{s} + \omega\_{\mathfrak{bs}}}{\mathfrak{s}}.\tag{6}$$

Important for the practical situations is to have a steady state error less than an imposed value ( ( ( (*S*(0) ( ( ( <sup>≤</sup> <sup>ε</sup>*S*). Thus, it is sufficient to choose ( ( (*We*(0) ( ( ( <sup>≥</sup> 1/ε*<sup>S</sup>* (ε*<sup>S</sup>* is the tracking error), which can be achieved by correcting Function (6) with the modified form:

$$\mathcal{W}\_{\mathfrak{c}}(\mathbf{s}) = \frac{\frac{\mathfrak{s}}{\mathcal{M}\mathfrak{s}} + \omega\_{\mathfrak{bs}}}{\mathfrak{s} + \omega\_{\mathfrak{bs}}\varepsilon\_{\mathfrak{s}}} \tag{7}$$

A proper design in terms of the sensitivity function is obtained if both conditions imposed to ω*bs* and *MS* are satisfied according to the relations:

$$\left\|\left\|\mathcal{W}\_{\mathfrak{c}}(j\omega)S(j\omega)\right\|\right\|\_{\infty} \leq 1, \left\|S(j\omega)\right\|\_{\infty} \leq \frac{1}{\mathcal{W}\_{\mathfrak{c}}(j\omega)},\tag{8}$$

where the upper limit (Figure 4a) is:

$$\frac{1}{\mathcal{W}\_{\mathfrak{c}}(\mathbf{s})} = \frac{\mathbf{s} + \omega\_{\mathfrak{bs}}\varepsilon\_{\mathcal{S}}}{\frac{\mathbf{s}}{\mathcal{W}\_{\mathfrak{s}}} + \omega\_{\mathfrak{bs}}}.\tag{9}$$

**Figure 4.** Design models for: (**a**) *S*(*j*ω); (**b**) *RS*(*j*ω).

For improved performances, Model (7) may have a higher order, as follows:

$$\mathcal{W}\_{\mathfrak{c}}(\mathbf{s}) = \left(\frac{\frac{\mathfrak{s}}{M\_{\mathcal{S}}} + \omega\_{\mathrm{bs}}}{s + \omega\_{\mathrm{bs}}\,\varepsilon\_{\mathcal{S}}}\right)^{k}, k \geq 1. \tag{10}$$

For the noise sensitivity function, the weighting function *Wu*(*s*) is chosen, which influences the control signal, *u*, according to the relations:

$$\mathcal{W}\_{\rm u}(s) = \frac{s + \frac{a\nu\_{\rm u}}{M\_{\rm u}}}{\varepsilon\_{\rm u}s + a\nu\_{\rm u}} \left\| \left\| \mathcal{W}\_{\rm u}(j\omega) \mathcal{R}\_{\rm S}(j\omega) \right\|\_{\rm co} \leq 1 \, \Big|\, \mathcal{R}\_{\rm S}(j\omega) \right\|\_{\rm co} \leq \frac{1}{\mathcal{W}\_{\rm u}(j\omega)}.\tag{11}$$

where *Mu*, ω*bu*, ε*<sup>u</sup>* are the maximum gain, the bandwidth and the error. The upper limit is:

$$\frac{1}{\mathcal{W}\_{\boldsymbol{u}}(\boldsymbol{s})} = \frac{\varepsilon\_{\boldsymbol{u}}\boldsymbol{s} + a\nu\_{\boldsymbol{u}}}{\boldsymbol{s} + a\nu\_{\boldsymbol{u}}/M\_{\boldsymbol{u}}}.\tag{12}$$

The magnitude of |*RS*| on low frequency is essential to limit the control signal. The procedure is similar for the complementary sensitivity function and the load sensitivity function.

## **3. The E**ff**ect of the Parameters on the Weighting Functions**

Several possibilities may be used in order to design the weighting functions. One possible choice is to consider a combination of cost functions providing the mixed-sensitivity formulation.

Figures 5 and 6 show the behaviors imposed using the functions 1/*We*(*s*), Equation (9), and 1/*Wu*(*s*), Equation (12), considering different values of the parameters *MS*, ω*bs*, ε*S*, *Mu*, ω*bu*, ε*<sup>u</sup>* [8,16].

**Figure 5.** Models imposed with function 1/*We*(*s*).

**Figure 6.** Models imposed with function 1/*Wu*(*s*).

The study of the dependencies on the shape of the weighting functions *We*(*s*) and *Wu*(*s*) highlights the importance of the following values for the magnitude *MS*, *Mu* = 1, 2, 3, bandwidth ω*bs*, ω*bu* = 1, 2 and errors ε*S*, ε*<sup>u</sup>* > 0.01.

#### **4. Results of the Analysis of the Closed-Loop Control System**

The main objectives required in the control system (defined by the block diagram with the weighting functions from Figure 7) are: good reference tracking and a limited control signal [1–5].

**Figure 7.** Control diagram with the weighting functions.

The robust design consists in determining the controller *HR* so that the H-infinity norm of the closed-loop transfer function is less than a positive number (γ):

$$\left\| \begin{array}{c} \mathcal{W}\_{\mathcal{C}} \mathcal{S} \\ \mathcal{W}\_{\mathcal{U}} \mathcal{H} \mathcal{S} \end{array} \right\|\_{\infty} < \mathcal{\mathcal{Y}} \tag{13}$$

*Proceedings* **2020**, *63*, 46

The plant model used for the case study of the closed-loop system from Figure 7 is described by the transfer function *Hp*(*s*) = <sup>2</sup> *<sup>s</sup>*2+0.05*s*+0.2 .

The block diagram also includes the weighting functions and the controller designed using the H-infinity synthesis. The simulations from Figures 8 and 9 show the behaviors of the system [13].

**Figure 8.** Responses to the step input considering different weighting functions 1/*We*(*s*).

**Figure 9.** Responses to the step input considering different weighting functions 1/*Wu*(*s*).

## **5. Conclusions**

In this paper, a short study of the effects of the weighting functions on the performances of the robust control systems was presented.

In this context, the basic requirements imposed to the control system (from Figure 1) were:


In the robust control diagram (Figure 2), the performances regarding the reference tracking, disturbances and measurement noise rejection and control signal effort should be achieved for each external input *r*, *v*, η, *l*, whose energy does not exceed a predetermined value. As a result, weighting functions must be properly designed and used.

So, for the weighting function *We*(*s*):


**Author Contributions:** Conceptualization, M.D. and S.-E.O.; methodology, M.D.; software, validation, M.D. and S.-E.O.; formal analysis, S.-E.O.; investigation, resources, and data curation, M.D.; writing—original draft preparation, M.D.; writing—review and editing, S.-E.O.; visualization, supervision, M.D.; project administration, M.D.; funding acquisition, M.D. and S.-E.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Proceedings* **Concept Lattice-Based Classification in NLP** †

## **László Kovács**

Department of Information Science, University of Miskolc, 3515 Miskolc, Hungary; kovacs@iit.uni-miskolc.hu; Tel.: +36-20-3319765

† Presented at the 14th International Conference INTER-ENG 2020 Interdisciplinarity in Engineering, Mures, , Romania, 8–9 October 2020.

Published: 24 December 2020

**Abstract:** Classification in discrete object space is a widely used machine learning technique. In this case, we can construct a rule set using attribute level implication rules. In this paper, we apply the technique of formal concept analysis to generate the rule base of the classification. This approach is suitable for cases where the number of possible attribute subsets is limited. For testing of this approach, we investigated the problem of the part of speech prediction in natural language texts. The proposed model provides a better accuracy and execution cost than the baseline back-propagation neural network method.

**Keywords:** machine learning; morphological classification; formal concept analysis; natural language processing

## **1. Introduction**

In the area of machine learning, classification is the most widely used technology. In the case of classification, we have an object space *O*, where every object *o* ∈ *O* is given by a set of attribute–value pairs, and the objects are assigned to a category/class value:

$$O = \{ (\{(a, \upsilon)\}, \mathfrak{c}) | a \in A, \upsilon \in \text{Dom}\_{a\*} \mathfrak{c} \in \mathbb{C} \},$$

where *A* denotes the attribute set and *C* is the category set. The main goal of the classifier is to generate a prediction function:

$$f\_{\mathbb{C}} : \{ \{ (a, v) \} \} \to \mathbb{C} \tag{2}$$

where *fc* assigns a category value for every attribute–value pair set of the problem domain.

A special case of the classification problem is when the *Doma* sets contain only discrete values. Such discrete domains can be found among others in assignment problems, permutation to perform efficient classification in discrete domain space. Among the most widely known candidates, we can mention the decision tree classifier and the neural network classifier.

Neural network (NN) [1] classifiers are based on composition of elementary binary perceptron classifiers using separation hyperplanes. The elementary classification nodes are structured into a network where the nodes are connected by weighted directed edges. In the simplest case, the output of a node is given by:

$$a \, sigmoid \left(\sum\_{i} w\_{i} \cdot s\_{i} + \, b\right) \tag{3}$$

where *wi* denotes the edge weight values, *si* is the notation for signal strength and *b* is a bias value. During the training process, the weight values are adjusted to produce a minimal misclassification error. In the case of Back-propagation NN, the input is given by a feature vector and the output is a category vector. The network contains only one hidden layer. Besides this simple classification network, many other special complex network structures were developed in recent years. For the

classification of multi-dimensional objects, the convolutional NN models provide an efficient solution. For the handling of sequences such as words or transaction events, the family of recurrent networks is the suitable tool. The LSTM (long-term short-term memory) NN are very popular in the domain of natural language processing.

In the case of decision trees [2], each child node corresponds to a distinct attribute value. As the child node set is discrete, thus also the attribute value set is assumed to be discrete. In the ID3 decision tree construction methods, the attribute with the smallest weighted entropy is selected:

$$
arg\min\_{a} \{ \text{entropy}(\{ p\_{(i,a)} \}) \}\tag{4}$$

where the entropy refers to the homogeneity level of the category (class label) distribution.

The decision process of a decision tree can be represented by a set of logical formulas of the form:

$$If \ a\_1 = \upsilon\_1 \land a\_2 = \upsilon\_2 \land \dots \land a\_m = \upsilon\_m \text{ then category} = \mathfrak{c} \tag{5}$$

If we consider all formulas related to a decision tree, we can see that attribute *a*<sup>1</sup> is contained in all formulas, while the last attribute is used only in few formulas. The decision tree constructs a priority order of the attributes.

A more general ruleset can be constructed if we consider not a decision tree, but a decision lattice. The theory of Formal Concept Analysis (FCA) can be used to construct a decision lattice as generalization of the decision tree.

In the next section, the basic concepts of formal concept analysis are presented. This formalism is used to generate the frequent attribute patterns in the training set. Section 3 introduces a morphological classification model based on the concept lattice approach. The results of the experimental tests on efficiency comparison are analyzed in Section 4.

#### **2. Classification with FCA**

The theory of Formal Concept Analysis (FCA) [3] provides a tool for conceptualization in an object–attribute relational context. A formal concept corresponds to a pair of related closed sets. The first component containing objects is called the extent part, while the second component containing attributes is the intent part. Formal concepts created from the context can be structured into a concept lattice based on the set containment relationship. The ordering among the lattice elements can be considered as a specialization and generalization relationship among the concepts. The roots of FCA originate in the theory of Galois connections [4] and in the applied lattice and order theory developed later by Birkhoff [5]. The terminology and theoretical foundation of FCA was introduced and built up in the 1980s by Rudolf Wille and Bernhard Ganter [6].

In FCA, two special, partially ordered sets are considered, namely, G is the set of objects and M is the set of attributes. The corresponding fixpoints are called formal concepts, which are built up from a matching pair of object and attribute sets. Using the notations defined in [7], the terminology of FCA can be summarized in the following way:

A formal context is defined as a triplet *G*, *M*, *I*, where *I* is a binary relation between *G* and *M*. The condition (*x*, *y*) ∈ *I* is met if and only if the attribute *y* is true for the object *x*. Two derivation operators are introduced as mappings between the powersets of *G* and *M*. For *A* ⊆ *G*, *B* ⊆ *M*:

$$\begin{array}{l} f(A) = A^I = \{ m \in M \lor \forall \emptyset \in A : (\emptyset, m) \in I \} \\ f(A) = A^I = \{ m \in M \lor \forall \emptyset \in A : (\emptyset, m) \in I \} \end{array} \tag{6}$$

For a context *G*, *<sup>M</sup>*, *<sup>I</sup>*, a formal concept is defined as a pair (*A*, *<sup>B</sup>*), where *<sup>A</sup>* <sup>⊆</sup> *<sup>G</sup>*, *<sup>B</sup>* <sup>⊆</sup> *<sup>M</sup>*, *<sup>A</sup>* = *BI* , *B* = *A<sup>I</sup>* are met. The composition of these derivations is closure operator. Regarding this derivation, the components of a formal concept satisfy the *A* = *AII*, *B* = *BII* conditions, too. The *A* component is called the extent of the concept, while *B* is the intent part.

On the set of formal concepts, *C*, generated from the context *G*, *M*, *I*, a partial ordering relation is defined in the following way:

$$(A\_1, B\_1) \lhd (A\_2, B\_2) \Leftrightarrow A\_1 \subseteq A\_2 \tag{7}$$

FCA can be used also as a classification tool in Machine Learning [8,9]. In classification problems, the context is extended with a category attribute and the lattice is used similarly to a decision tree, where the tree leaves correspond to the maximal consistent concepts of the lattice. A pioneer work on the application of concept lattices in retrieval of semantic information is presented in [10]. The area of FCA-based classification is nowadays an active research problem domain [11,12].

Taking an object ( 1 (*a*, *v*) 2 , *c*) from the training set, we can convert the information on the object into the following implication rule:

$$a\_1 = \upsilon\_1 \land a\_2 = \upsilon\_2 \land \dots \land a\_m = \upsilon\_m \implies c \tag{8}$$

Thus, the training set can be considered as a ruleset. The objects usually correspond to the atomic nodes in the FCA lattice. Every node in the lattice can be constructed as the intersection of some object-level nodes of the lattice, resulting a node of the following form:

$$a\_1 = \upsilon\_1 \land a\_2 = \upsilon\_2 \land \dots \land a\_m = \upsilon\_m \Rightarrow \mathfrak{c}\_1 \lor \mathfrak{c}\_2 \lor \dots \lor \mathfrak{c}\_k \tag{9}$$

where *a*<sup>1</sup> = *v*1, *a*<sup>2</sup> = *v*2, ... , *am* = *vm* are present in every operand node of the intersection. If the right side of the rule contains only one class label, then this node is called the consistent node.

$$a\_1 = \upsilon\_1 \land a\_2 = \upsilon\_2 \land \dots \land a\_m = \upsilon\_m \implies c \tag{10}$$

Among the consistent nodes of the lattice, a special role is assigned to the maximal consistent nodes having a minimal attribute set. A consistent node

$$a\_1 = \upsilon\_1 \land a\_2 = \upsilon\_2 \land \dots \land a\_m = \upsilon\_m \implies c \tag{11}$$

is maximal if none of the attribute–value pairs can be removed without breaking the rule.

The set of maximal consistent nodes is considered as the core ruleset for the classification process. The concept lattice structure can be used to generate the elements of the core ruleset. The naive way to generate the lattice is to perform all intersections on the lattice elements. The main bottleneck of this approach is that the size of the lattice and the number of required intersections can be very huge. In the survey paper of [13], the efficiency of two popular methods, CbO and the NextClosure, is investigated. It is shown that the most efficient lattice construction methods have a theoretical complexity of *O N*2*MC* (see Figure 1). Theoretically, the value of *C* is bound by 2*M*, but usually the density of the training set is lower, yielding a smaller lattice. A sharper theoretical upper bound was shown by Prisner in [14] and by Albano Chornomaz in [15]. They investigated a special type of context, the contranomial scale free context involving both theoretical analysis and test experiments. They could show that the execution cost is in general of polynomial shape and in the worst case of exponential shape.

Based on these cost properties, the processing of all possible intersections is implausible in acceptable time. Thus, to apply an FCA-based classification for problem domains with several millions of objects, we need an optimized and simplified method to determine the maximal consistent elements.

**Figure 1.** Execution cost function (N = 1000, M = 100) for left side; (M = 150, density level = 7) for right side.

#### **3. Cost Reduction Methods for FCA-Based NLP POS Classifiers**

In FCA, the main goal is to construct all possible formal concepts from the input object set. In the prediction task, we do not need all implication rules—one matching rule may be enough to determine the related class label. Thus, in the construction of FCA-based classifiers, we require only a subset of the all possible concept nodes. The relevance of a node can be given with the following three parameters:


In the classic approach, the engine builds up first the whole concept lattice. The maximal consistent nodes are selected from this lattice using a top-down traversing approach. In the lattice, the top node is usually inconsistent while the atomic nodes are usually consistent. The standard way of lattice construction is to process the atomic nodes first and generate the more general concepts later. This process corresponds to a bottom-up approach.

In the proposed method, the support level is also an important factor, as we need such rules that can cover a larger set of objects. Using rules with large support values results in a reduced ruleset.

We allow also inconsistent nodes in the lattice as in some cases only the atoms are consistent. Thus, the method yields a probability distribution of the different classes.

In the case of part of speech classifiers (NLP POS) [16], the classifier engine determines the grammatical category of the input words. We have analyzed the POS prediction for the Hungarian language, where we use eight basic POS units (noun, verb, adjective, adverb, article, conjunction, number, date) and 42 compound POS units. The composite POS labels are coming from the fact that some word forms have different meaning with different POS labels. For example, the word *vár* can be a verb (*to wait*) or a noun (*castle*).

For the representation of the words, we have used the character sequence format, thus for the word *almákat*, the corresponding letters (*a*, *l*, *m*, *á*, *k*, *a*, *t*) are the description attributes. A special constraint is used in our model, namely, we allow only a subset of the word subsequences as attribute tuples used in the generalization process. For example, for the word *almákat*, only the following strings are valid attribute subsets:

#### *t#, at#, kat#, ákat#, mákat#, lmákat#, almákat#*

where symbol # denotes the terminal symbol.

With this kind of simplification, for a word with a length of *n* characters, the number of possible subsets is equal to *O*(*n*), instead of the usual subset count *O*(2*n*). Thus, the number of concept nodes is linear with total number of characters in the word set. This means that we can process all candidate nodes in a linear time to generate the maximal consistent nodes. Based on this consideration, we constructed the following algorithm to generate all maximal consistent nodes.

*SDict* = *dictionary() For all word (w) in the training set: S* = *generate all substrings of w for each s in S: h* = *class\_homogeneity (s) f* = *support(s) merge (s,h,f) into SDict*

In the merge process, the cost frequency distribution of s is aggregated to the existing (*h*,*f*) values in the dictionary.

In the purging process, we can eliminate nodes having a weaker relevance value.

*for all (s,h,f) in SDict: w* = *w(h,f) if w* < *w\_limit: remote (s,h,f,) from SDict*

The reduced dictionary can be used as rule space, every node as separate role of the form

$$\{f\_c : \|\{(a, v)\}\| \to \mathbb{C} \tag{12}$$

In the prediction phase, we perform a nearest neighbor classification process. In the first step, the set of subset stings is generated for the query word on the standard way mentioned previously. Then, for each substring *s*, the entry for *s* in the dictionary is located. Every entry contains also a class distribution vector; thus, we can calculate an aggregated class distribution for the query word. The winner category is the class with the highest probability.

#### **4. Experimental Results**

For the tests, we have used a training set containing 2,200,000 word-entries with POS value. For the baseline method, we have selected the back-propagation neural network and the LSTM neural network engine. In the tests, first we used a character-level representation of the words to compare to baseline neural network classifiers: standard back-propagation network and the LSTM recurrent network. Both NN units were implemented in Python Keras framework. The test results were a bit surprising as both NN engines provided very similar accuracy, near 70% for the case when the size of training set was 250,000. For the test, we have used a disjoint sample of 250,000 items. This experiment has shown that the sequence approach is not superior to the standard classification approach for our problem domain.

Later, we have introduced a different representation approach, where an extended character representation form was used. In this approach, the feature vector also contained the phonetical attributes of the characters. Using this approach, the NN engines could achieve 86% accuracy, while the combined pattern matching, the NN method, could reach a 96% accuracy. The test results are summarized in Table 1. In the table, PM denotes our proposed method and symbol NN is for the neural network approach.

**Table 1.** Efficiency comparison of the proposed method (PM) with neural network (NN)classifiers.


In the tests, we have investigated two factors:


Based on the performed test experiments, we experienced two surprising results:


## **5. Conclusions**

The FCA concept–lattice-based structure can be used also in classification problems where the maximal consistent nodes are basic rules for the prediction process. The FCA-based approach is suitable for cases where the number of possible attribute subsets is limited. In this paper, we introduced the adaption of the FCA-based classifier to solve an NLP-POS classification problem. Based on the test experiments, the proposed model provides a better accuracy and execution cost than the baseline back-propagation neural network method.

**Funding:** This research received no external funding.

**Acknowledgments:** The described article/presentation/study was carried out as part of the EFOP-3.6.1-16-2016-00011 "Younger and Renewing University—Innovative Knowledge City—institutional development of the University of Miskolc aiming at intelligent specialisation" project implemented in the framework of the Szechenyi 2020 program. The realization of this project is supported by the European Union, co-financed by the European Social Fund.

**Conflicts of Interest:** The author declares no conflict of interest.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Proceedings*
