**1. Introduction**

In this paper, we are focused on spaces with closeness that are characterized using weighted graphs, as in [1], or local neighborhoods of elements as in [2]. Formally, closeness can be described by the graph adjacency matrix that contains weights—a higher value of weight is assigned to the edge that connects closer objects. Those matrices were always square as they contained mutual closeness values of all objects in the dataset.

In our previous research [1,3], the notion of closeness was used in the new approach to dimensionality reduction for which fuzzy transform (F-transform for short, [4]) proved to be useful. Otherwise, the concept of closeness can be observed in multiple contexts and under different names, where it serves auxiliary purposes. For example, in [5], the phase space of a dynamical system is described by a network (graph) where each point represents one state of the system and closeness-describing weights (between two points) are equal to the frequency with which a transition occurred between the two states. There exists a related concept of closeness centrality (aggregation of closeness-describing weights with respect to all neighbors except oneself computed by arithmetic or harmonic mean) that describes the data density around one particular point in a network, used, e.g., in [6]. In the area of image processing (image compression, image segmentation, image retrieval, e.g., [7], etc.), the related concept of proximity space is widely used. It originated from [8] and then evolved—currently, it is used, e.g., to describe the similarity between pixels or patches.

**Citation:** Janeˇcek, J.; Perfilieva, I. Preimage Problem Inspired by the F-Transform. *Mathematics* **2022**, *10*, 3209. https://https://doi.org/ 10.3390/math10173209

Academic Editor: Michael Voskoglou

Received: 10 June 2022 Accepted: 1 September 2022 Published: 5 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

81

In this research, we consider closeness determined by a fuzzy partition of a universe of discourse. This particular space structure is used in the theory of F-transforms [4,9]. The notion of F-transform was introduced in [4], where we explained modelling with fuzzy IF-THEN rules as a specific transformation. From this point of view, F-transform bridges fuzzy modelling and the theory of linear (in particular, integral) transforms. The generalization to a higher degree version was proposed in [9]. Generally speaking, F-transform performs a transformation of the original universe of functions into a universe of their skeleton models (vectors or matrices of F-transform components), for which further computations are easier. In [4], the approximation property of F-transform was described, and in [10], the effect of the shapes of basic functions on the approximation quality was demonstrated. F-transform has many other useful properties and grea<sup>t</sup> potential in various applications, such as special numerical methods as well as solutions to ordinary and partial differential equations with fuzzy initial conditions, mining dependencies from numerical data, signal processing, compression and decompression of images [11] and image fusion.

Among the recent applications of F-transform, we refer to an image classification problem, in which data is scarce [12], a numerical solution to fuzzy integral equations [13] and improving the JPEG compression algorithm in the cases where a high ratio compression is required [11].

Similarly to F-transform, which, in its direct phase, aggregates a large number of functional values of all points into a small number of components associated with selected nodes, we consider that within our (possibly large) dataset, there are several selected points (nodes) with known closeness values with respect to all the other points and that closeness among other points is undefined, which gives rise to a rectangular closeness matrix that describes the closeness space. Nodes can be thought of as the most prominent data points because the dataset can be sparsely represented as a collection of their neighborhoods.

Following the methodology of F-transform, we aim to demonstrate that a partial knowledge about the mutual position of all points in the dataset is sufficient for a successful retrieval of a discrete function (up to non-significant differences that distinguish particular functions from each other within the same equivalence class) from its F-transform components. Thus, we present a not just theoretical tool for demarcating the set of such functions that can be mutually replaced without losing an important feature in the space.

This allows us to express similarity between functions defined on spaces with closeness. In specific cases, a representative function is given by the inverse F-transform. Moreover, we contribute to the general theory of F-transforms in aspects described in this section. The task of reconstructing an element of an input space (or finding an approximate solution if it does not exist) based on (a lower-dimensional representation of) an element of a feature space has been studied in various machine learning domains, e.g., in kernel-based methods (e.g., [14] where the nonlinear dimensionality reduction is performed using the kernel trick on an input image and the task is to recover the image from its denoised version in the feature space, or [15] where algebraic tools based on the simple relationship of distances in the input space and in the feature space induced by a kernel are used to find the preimage of a feature vector) and graph-based methods (where the task is to recover structured data from a single point of a feature space, e.g., to find a representative graph based on features of a found data cluster). In these applications, the concept of preimage problem was already used. This paper is focused on finding all functions that share a set of features (given by the F-transform components) utilizing the above-described concept. We do not consider that these features might be noisy. On the other hand, in [16], we used a similar initial setting of the input space and we assumed that the input signal (function placed in this space) is noisy in a certain sense. The signal was processed in the same manner (based on closeness) and we proposed how to denoise the signal by finding the appropriate closeness parameters. The inverse F-transform is known to reduce the noise of the input signal (provided a suitable fuzzy partition) as was in the continuous case shown, e.g., in [17].

As explained in [18], preimage problems are useful not only in (pattern) denoising and Kernel Dependency Estimation but also in signal compression, where the kernel technique

serves as an encoder and the preimage technique as a decoder. Moreover, in [18], a technique to learn the preimage without the need to solve a difficult nonlinear optimization problem is presented. Additionally, in [15], the authors consider that a noisy pattern is mapped to a feature space using the Kernel Principal Component Analysis and then the approximate preimage is found using their technique. To summarize the main stream of preimage problems in machine learning, we can say that a nonlinear optimization, nonlinear iteration, or learning method is used to find a function (the space of all possible functions comprises an input space), s.t. its image in a specified direct mapping has the minimal squared distance from the given point in a feature space. The existence of a precise solution is considered to be a coincidence. The purposes of computing the preimage in machine learning are various (to denoise a pattern, reconstruct a signal, compress an image, find a representative example of a data cluster, or learn a general mapping between the input space and the feature space) but all of them are highly application-oriented, and, hence, the initial settings are adjusted accordingly. This limits their transferability to another task.

In contrast, our approach pays close attention to the structure of the input space and this structure is quite flexible. We are focused on finding the precise solution (we ensure that it always exists) to the special case of the preimage problem. Using the assumption of the finiteness of the universe and endowing the input space with a fuzzy partition and the corresponding closeness, we solve it by the means of linear algebra. The consequent mapping forms a special case of direct mappings mentioned above. Since our theoretical work is not aimed at solving any specific task, we compute the whole class (usually infinite) from which we do not need to choose a particular solution (all of them are equivalent).

In Section 2, we give the details about the notion of closeness considered in this article. After that, the fuzzy partition and transform are recalled and modified for the discrete case. We show that a certain fuzzy partitioned space (space with a fuzzy partition) is a special case of the space with closeness. Section 3 discusses the formulation of the main topic of the paper—called preimage problem—that takes place in the space introduced in Section 2. Preimage problem belongs to basic problems in algebra (e.g., in [19]) where it is considered between certain two structured spaces (we propose to use the space with closeness and the space of scaled F-transform components). The solution to this problem is described in three different ways. Theorem 1 is based on a commonly used solution to the set of algebraic equations. In Section 4, we examine the conditions when the inverse F-transform forms the solution to this problem. In Section 5, we propose a technique consisting of checking whether a solution to the preimage problem can be obtained by a certain transformation of the given element of the space of scaled F-transform components given by a new set of fuzzy partition units. Appendix A provides the numerical examples related to four preceding sections, except for the Conclusions (Section 6).

#### *Relationship of Closeness, Metric and Similarity*

A space with closeness (*<sup>X</sup>*, *w*) is more general than a metric space (*<sup>X</sup>*, *ρ*) as they share the axioms of symmetry and non-negativity but metric is more restrictive—it requires two additional axioms (*ρ*(*<sup>x</sup>*, *y*) = 0 iff *x* = *y* and the triangle inequality *ρ*(*<sup>x</sup>*, *y*) ≤ *ρ*(*<sup>x</sup>*, *z*) + *ρ*(*y*, *z*) for all *x*, *y*, *z* ∈ *X*). As closeness is a relaxed version of a reciprocal distance metric, it is more suitable for the description of data with graph structure (where the triangle inequality is generally non-enforceable). Another example of a context where closeness is more suitable is formed by the data that is assumed to lie on a topological manifold as there is no straightforward way how to establish a metric there. Similarly to metric, closeness encodes mutual relations between data points.

There is a close connection between closeness and similarity measures—each of these concepts applies in different contexts. An algebraic, fuzzy, or probabilistic background is usually assumed as a similarity requirement. There is no standard axiomatization of its definition. Similarly to closeness, a similarity value is higher for more similar objects (that are, e.g., more correlated, have a higher intersection, smaller cross-entropy, or belong to the same cluster), its values can be negative, it can be non-symmetric, etc. Similarity spaces emerging in various applied fields are even more general than spaces with closeness. A solution to a particular type of problem is often associated with a certain type of similarity measure.

This explains why we introduce the notion of closeness and why we give it a preference over that of metric (too-restrictive axioms) and similarity (too-loose axiomatization). This good trade-off allows us to express various established concepts, e.g., of graph theory (edge weights used, e.g., in the dimensionality reduction technique of Laplacian eigenmaps [2]), fuzzy logic (values of biresiduum of truth values of two formulae), clustering problems (class membership degrees in, e.g., KNN clustering algorithm), etc., in terms of closeness and incorporate a theoretical apparatus upon that. Therefore, as the closeness values can be extracted from the data in various ways, the closeness space creates a platform that enables both data-driven and model-driven approaches to be utilized.
