Next Article in Journal
AutoFL: Towards AutoML in a Federated Learning Context
Previous Article in Journal
Study of the Overlying Strata Movement Law for Paste-Filling Longwall Fully Mechanized in Gaohe Coal Mine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data Visualization

1
Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
2
Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(14), 8018; https://doi.org/10.3390/app13148018
Submission received: 12 April 2023 / Revised: 1 July 2023 / Accepted: 5 July 2023 / Published: 9 July 2023

Abstract

:
Dimensionality reduction plays an important role in interpreting and visualizing high-dimensional data. Previous methods for data visualization overestimate the local structure and lack the consideration of global preservation. In this study, we develop a Gaussian process latent variable model (GP-LVM) for data visualization. GP-LVMs are one of the frameworks of principal component analysis and preserve the global structure effectively. The drawbacks of GP-LVMs are the absence of local structure preservation and the use of low-expressive kernel functions. Therefore, we introduce regularization for local preservation and an expressive kernel function into GP-LVMs to overcome these limitations. As a result, we reflect the global and local structures in low-dimensional representations, improving the reliability and visibility of embeddings. We conduct qualitative and quantitative experiments comparing baselines and state-of-the-art methods on image and text datasets.

1. Introduction

Real-world data generally contain high-dimensional structures with nonlinear correlation. Under this condition, it is required to identify a low-dimensional representation and to present a visible expression of data beneficial for understanding data structures. Dimensionality reduction [1] is one of the techniques for achieving this and it searches low-dimensional manifolds that preserve the original structures as much as possible. Principal component analysis (PCA) [2,3] is a classical approach to reduce dimensionality; however, PCA is a linear approach and has limitations in expressing the derived low-dimensional representation.
There are several nonlinear techniques for dimensionality reduction. Deep generative models (DGMs) [4], such as variational autoencoders [5,6] and diffusion models [7,8], have achieved remarkable success. DGMs are based on probabilistic modeling and are parameterized by deep neural networks [9], enabling them to obtain highly expressive low-dimensional structures. Although DGMs provide a strong representation, they necessitate the selection of many hyperparameters and network architectures and are typically overkill for data preprocessing and visualization. Graph-based approaches, such as t-distributed stochastic embedding (t-SNE) [10,11] or uniform manifold approximation and projection (UMAP) [12], are visualization-aided dimensionality reduction techniques frequently used for biological data visualization [13,14]. Graph-based approaches use few hyperparameters and have a high scalability; however, they overestimate local structures and tend to induce a misleading visualization overlooking global structures. Gaussian process latent variable models (GP-LVMs) [15,16] are another framework for dimensionality reduction and the extension of the classical PCA. The Gaussian process has similar properties to deep neural networks [17,18] and has fewer hyperparameters than DGMs. From the above, GP-LVMs are suitable for data preprocessing and visualization and have gained significant attention for dimensionality reduction as a counterpart to graph-based approaches [19,20,21,22,23].
The kernel function and prior distribution used in GP-LVMs influence the quality of low-dimensional representations. The kernel function determines the properties of the embedding space, and the prior distribution corresponds to the regularization term of low-dimensional embedding. Previously, researchers have proposed prior distributions for various purposes, such as time-series analyses [24,25], WiFi localization [26,27], and face verification [28]. Furthermore, many kernel functions have been developed in the regression literature [29,30,31] and have achieved many successful results. However, in GP-LVMs, the prior distribution is used to express supervised information [28,32,33] and is not designed for unsupervised cases typical in data visualization. Furthermore, the kernel functions used in GP-LVMs are limited to the class of homogeneous kernels, such as the squared exponential (SE) kernel and Matérn families, which have limited expressions. Therefore, we require a comprehensive approach to GP-LVMs for data visualization purposes.
In this study, we develop a GP-LVM for data visualization. Our main contribution is the development of a stationary kernel for GP-LVMs and a prior distribution for unsupervised cases. The detail of each component is as follows.
  • Kernel function. We focus on the spectral mixture (SM) kernel [30] and introduce it into a GP-LVM. Generally, the SE kernel and Matérn kernel are applied to GP-LVMs, which are homogeneous kernels that only depend on the distance between latent features. In contrast, the SM kernel is a broader class kernel that depends on not only the distance but the periodicity of input features. Although the SM kernel has strong expressiveness compared to the SE and Matérn kernels, it has been applied to regression models and not to GP-LVMs. In this study, we develop the SM kernel for GP-LVMs and reveal its high ability for pattern discovery and visualization.
  • Prior distribution. We focus on the use of a locally estimated manifold in the prior distribution. Local structures, such as the cluster structure, are crucial to improving the visibility of a low-dimensional representation. Therefore, we regularize low-dimensional embedding with the local estimation of the data manifold. Specifically, we locally approximate it with a neighborhood graph and regularize the low-dimensional representation with it. Through this strategy, we consider local structures and improve the visibility of low-dimensional representation. The regularization based on the graph Laplacian in supervised cases has been previously studied as the Gaussian process latent random field (GPLRF) [33], and we apply the GPLRF scheme in an unsupervised manner.
The remainder is as follows. In Section 2 and Section 3, we present related works and the preliminaries to our method. Section 4 presents our method, and we validate our method in Section 5.

2. Related Works

This section presents previous studies on dimensionality reduction approaches aiming at data visualization and preprocessing. We explain the graph-based method in Section 2.1 and PCA-based methods in Section 2.2.

2.1. Graph-Based Approaches

Graph-based approaches first encode high-dimensional data into a neighborhood graph to preserve the local structures of data and embed a neighborhood graph into a low-dimensional space, generally a two-dimensional space. Laplacian eigenmaps (LEs) [34] are one of the classical methods in graph-based approaches and they derive a low-dimensional representation as the eigenvectors of the graph Laplacian of the neighborhood graph. An LE is a linear method with rich theoretical properties for its convergence and loss function [35,36]. t-Distributed stochastic embedding (t-SNE) [10] is a long-standing nonlinear graph embedding approach that provides a low-dimensional representation by matching the neighborhood structure. Specifically, t-SNE minimizes the Kullback–Leibler (KL) divergence between neighborhood graphs in high- and low-dimensional space. For its strong visibility, t-SNE is frequently used to visualize complex structures, such as biological data and the middle layers of deep neural networks, and researchers have developed a computationally efficient version of t-SNE introducing the Burnes–Hut algorithm [11] or hierarchical optimization [37]. UMAP [12] is a state-of-the-art method in graph-based approaches and is a more efficient algorithm than t-SNE. UMAP derives a k-nearest neighborhood graph as a fuzzy topological representation of data. Then, UMAP minimizes the cross-entropy of neighborhood graphs in high- and low-dimensional space, which does not necessitate the normalization of the graph, in contrast to t-SNE which necessitates this normalization for the calculation of KL divergence. Furthermore, the cross-entropy objective is considered as a combination of the attraction force of neighbors and the repulsion force of non-neighbors, allowing negative-sampling-based optimization to boost the scalability of the UMAP algorithm. LE, t-SNE, and UMAP have similar properties [38] in their loss functions and produce visible two-dimensional representations compared with the other dimensionality reduction approaches. However, their representations do not preserve the global structures of data because of the encoding process of data into neighborhood graphs. Researchers have developed approaches for global preservation by introducing a graph construction scheme based on the triplet [39,40] or diffusion process [41]. However, they estimate global structures by connecting local information, which inevitably causes failure to preserve the global structure. In this study, we focus on PCA-based approaches, which are suited for global preservation, and introduce them in the following subsection.

2.2. PCA-Based Approaches

PCA [2,3] is a classical dimensionality reduction technique and has been applied in various situations, such as data visualization and preprocessing. PCA aims to derive a linear projection onto a subspace that maximizes the variance of projected high-dimensional data. This formulation yields a closed-form expression of the embedding, improving the applicability and scalability of PCA. Probabilistic PCA (PPCA) [42] introduces probabilistic formulation into PCA. Specifically, PPCA defines latent variables in a low-dimensional space and derives the projection by maximizing the likelihood after marginalizing these latent variables. Although PCA and PPCA have produced successful results, they are both linear methods, and data generally contain nonlinear correlations. GP-LVMs [15] are a nonlinear extension of the PCA framework and realize nonlinear embedding with two differences: optimization of latent variables instead of projection and introduction of the kernel method [43] for nonlinear expression. These differences are achieved by the Gaussian-process-based formulation, and GP-LVMs assume that high-dimensional data are generated from the Gaussian process [44] input using latent variables. GP-LVMs can derive nonlinear latent variables according to the nonlinearity of the kernel, and several extensions introduce prior distributions, including discriminative [28,32], dynamic [24,25], and hierarchical GP-LVMs [45]. Recently, GP-LVMs have been developed for scalable training by inducing methods [46,47,48], for the Bayesian inference of latent variables [16,23], and for deep modeling [49,50]. Although GP-LVMs and PCA-based approaches preserve the global structures of data, they do not consider the local structures of data beneficial for visualization-aided dimensionality reduction. Furthermore, the kernel functions used in GP-LVMs are selected from homogeneous kernels, which have limited expression. In this study, we design a GP-LVM for visualization with a stationary kernel and local estimation of data.

3. Preliminary

In this section, we present the preliminaries of our method. Specifically, we introduce the basic idea of a GP-LVM, the kernel function, and the UMAP algorithm for the local estimation of the data manifold.

3.1. Gaussian Process Latent Variable Model

Let Y = [ y 1 , y 2 , , y N ] R N × D be D-dimensional observed variables with N samples. A GP-LVM assumes the existence of a projection from the Q-dimensional latent space to the D-dimensional observed space. Specifically, a GP-LVM assumes that the observed variables are generated from the Gaussian process [44] input by Q-dimensional latent variables X = [ x 1 , x 2 , , x N ] R N × Q as follows:
y : , d = f ( X ) + ϵ ,
f GP ( 0 , k ( · , · ) ) ,
ϵ N ( 0 , β 1 I ) ,
where y : , d R N is the d-th column of Y , f GP ( 0 , k ( · , · ) ) is a zero-mean Gaussian process prior with a kernel function k ( · , · ) , and ϵ is a Gaussian noise with a precision β . The GP-LVM estimates the latent variables X by maximum likelihood estimation after marginalizing the Gaussian process prior f . The log-likelihood function of the GP-LVM is derived from Equations (1)–(3) as
log p ( Y | X ) = d = 1 D log N ( y : , d | 0 , K N N + β 1 I ) = N D 2 log 2 π D 2 log | K N N + β 1 I | 1 2 tr ( K N N + β 1 I ) 1 Y Y ,
where K N N R N × N denotes a gram matrix whose i j -th entry is k ( x i , x j ) . The extensions of a GP-LVM generally introduce a prior distribution of the latent variables p ( X ) and estimate them by maximum a posteriori (MAP) estimation. The posterior distribution is given as the following equation:
log p ( X | Y ) = log p ( Y | X ) + log p ( X ) + C ,
where C represents a constant term. From these equations, the selection of the kernel function k ( · , · ) and the prior p ( X ) significantly influences the quality of the derived latent variables. Furthermore, the evaluation of Equation (4) contains the inversion of an N × N matrix K N N whose evaluation necessitates cubic time complexity. This complexity limits the scalability of the model, and the original GP-LVM cannot handle datasets with thousands of samples. Therefore, we use the inducing method [47] for the evaluation of the likelihood function and describe it in Section 4.

3.2. Kernel Function

The kernel function can be considered as a similarity measure of two inputs x , x R Q and should be a positive definite function. The most well-known kernel function is the SE or radius basis function (RBF) kernel, defined as follows:
k S E ( x , x ) = σ 2 exp 1 2 l 2 | | x x | | 2 ,
where σ indicates a variance and l indicates a lengthscale parameter. Another popular kernel is the Matérn kernel, which is the sparse version of the SE kernel. The Matérn kernel is given as
k M a t e ´ r n ( x , x ) = σ 2 2 1 ν Γ ( ν ) 2 ν | | x x | | κ ν K ν 2 ν | | x x | | κ ,
where ν and κ denote kernel parameters, Γ ( · ) denotes the gamma function, and K v ( · ) denotes the modified Bessel function of the second kind. ν is generally set to 3 2 , 5 2 , and they can be easily defined as follows:
k M a t e ´ r n 3 / 2 ( x , x ) = σ 2 1 + 3 | | x x | | κ exp 3 | | x x | | κ ,
k M a t e ´ r n 5 / 2 ( x , x ) = σ 2 1 + 5 | | x x | | κ + 5 | | x x | | 2 3 κ 2 exp 5 | | x x | | κ
Furthermore, if ν , the Matérn kernel converges to the SE kernel in Equation (6). Although the SE and Matérn kernels have different expressions, they are both homogeneous kernels as k ( x , x ) = k | | x x | | . The stationary kernel is in a broader class of homogeneous kernels as k ( x , x ) = k x x . The SM kernel [30] is a stationary kernel and can measure the periodicity of input signals. The SM kernel is defined by the following equation:
k S M ( x , x ) = p = 1 N w w p q = 1 Q exp 2 π 2 ( x q x q ) 2 v q ( p ) cos 2 π ( x q x q ) μ q ( p ) ,
where N w denotes the number of mixtures, w p denotes a weight parameter of each mixture, v q ( p ) denotes a lengthscale parameter of the dimension q of a mixture p, and μ q ( p ) denotes a mean parameter of the dimension q of a mixture p. We collectively denote μ p = [ μ 1 ( p ) , μ 2 ( p ) , , μ Q ( p ) ] R Q . The SM kernel is derived from Bochner’s theorem, which determines the spectral density of any stationary kernel. On this basis, μ q ( p ) indicates the center in the spectral domain, and the SE kernel is the special case of the SM kernel as μ q ( p ) = 0 . The SM kernel measures the similarity between two inputs with the distance | | x x | | and the periodicity represented by the cosine function of the spectral μ p . By optimizing these parameters, we can find the optimal center in the spectral domain from the input signals and derive a data-specific expression of the kernel. For intuitive comparison of these kernels, we present the sampling results from GP ( 0 , k ( · , · ) ) for each kernel in Figure 1.

3.3. Uniform Manifold Approximation and Projection

The approximation of data manifolds is crucial for graph-based dimensionality reduction approaches. UMAP is a state-of-the-art method and incorporates the idea of algebraic topology and categorical theory. As a result, the approximation of data manifolds is simplified to a k-nearest neighborhood graph computation, which is computationally efficient due to its very sparse structure. Specifically, UMAP first calculates a weight matrix W n | m as follows:
[ W n | m ] i j = exp d ( y i , y j ) ρ i σ i ( j { i 1 , i 2 , , i k } ) 0 ( otherwise ) ,
where [ W n | m ] i j denotes an ( i , j ) element of W n | m , i l ( l = 1 , 2 , , k ) denotes the l-nearest neighbor of y i , d ( · , · ) denotes an arbitrary distance function, ρ i = d ( y i , y i 1 ) denotes a nearest neighbor distance, and σ i denotes local connectivity around y i . The weight matrix represents the similarities between samples, and UMAP approximates the data manifold as follows:
W = W n | m + W m | n W n | m W m | n ,
where ∘ denotes the Hadamard product. Equation (12) corresponds to the symmetrization of W n | m , and W is called a fuzzy topological representation of observed variables Y . W denotes a locally estimated data manifold that contains local information about data. Then, UMAP minimizes the cross-entropy between the fuzzy topological representation and the local structure of the low-dimensional space. In this study, we focus on the estimated manifold W and regularize the embedding with W to improve the visibility of the representation.

4. Proposed Method

In Section 4.1, we define our model based on GP-LVMs and introduce the regularization with the graph Laplacian based on GPLRF. In Section 4.2, we derive the scalable lower bound of the log-posterior distribution, which enables our model to deal with a larger dataset than the original GP-LVM. In Section 4.3, we present the optimization method of our model and extend the SM kernel for the use of latent variable models.

4.1. Model Formulation

As in GP-LVMs, we define the generation process from the latent space to the observed space through the Gaussian process. Furthermore, we establish a prior distribution of latent variables by applying the GPLRF scheme in an unsupervised manner. The model of our method is as follows:
y : , d = f ( X ) + ϵ ,
x : , q N ( 0 , α 1 L ) ,
f GP ( 0 , k S M ( · , · ) ) ,
ϵ N ( 0 , β 1 I ) ,
where x : , q R N denotes the q-th column of X , α denotes a hyperparameter to control the strength of the regularization, L R N × N is a graph Laplacian of a fuzzy topological representation W in Equation (12), and † denotes the pseudoinverse. From these definitions, we estimate the latent variables and kernel parameters by MAP estimation, and the objective function is given as follows:
log p ( X | Y ) = log p ( Y | X ) + log p ( X ) + C = d = 1 D log p ( y : , d | X ) + q = 1 Q log p ( x : , q ) + C ,
where
p ( x : , q ) = N ( x : , q | 0 , α 1 L ) .
The novelty of our method is the use of the expressive kernel in Equation (15) and the prior distribution utilizing the locally estimated manifold W in Equation (14). We further explain the effects of the prior distribution in Equation (18). The log-prior term in Equation (17) can be written as follows:
log p ( X ) = α 2 q = 1 Q x : , q L x : , q + C = α 4 i = 1 N j = 1 N w i j | | x i x j | | 2 + C ,
where C denotes a constant. From Equation (19), the log-prior term is equal to the weighted sum of the distances between connected samples on the neighborhood graph. Latent variables are coordinated to minimize them through optimization; thus, the connected samples become closer, and the local information W is reflected in the latent representation. In Equation (17), the first term is equal to Equation (4) and necessitates O ( N 3 ) time complexity. In the following section, we introduce the inducing method [47] into Equation (17) and derive a scalable objective function.

4.2. Lower Bound of Log-Posterior Distribution

The inducing method [47] uses the property of the conditional distribution of a Gaussian and replaces the matrix inversion of a large matrix with that of a small matrix. Specifically, we define a few inducing points u R M and their positions Z = [ z 1 , z 2 , , z M ] R M × Q (M being the number of inducing points) in the latent space and evaluate the log-likelihood function after conditioning by these inducing positions Z as log p ( y : , d | X , Z ) . First, the joint distribution of y : , d , f , and u is given as follows:
p ( y : , d , f , u | X , Z ) = p ( y : , d | f ) p ( f | u , X , Z ) p ( u | Z ) .
Each distribution is given as the following equation:
p ( y : , d | f ) = N ( y : , d | f , β 1 I ) ,
p ( f | u , X , Z ) = N ( f | K N M K M M u , K N N K N M K M M 1 K M N ) ,
p ( u | Z ) = N ( u | 0 , K M M ) ,
where K M N = K N M R M × N denotes a covariance matrix whose i j -th entry is k S M ( x i , z j ) and K M M R M × M is a gram matrix whose i j -th entry denotes k S M ( z i , z j ) . For the evaluation of log p ( y : , d | X , Z ) , we need to marginalize the Gaussian process prior f and inducing points u . The variational method [47] introduces the Jensen inequality to evaluate the log-likelihood as follows:
log p ( y : , d | X , Z ) = log p ( y : , d | f ) p ( f | u , X , Z ) d f p ( u | Z ) d u log exp log p ( y : , d | f ) p ( f | u , X , Z ) d f p ( u | Z ) d u L d .
We evaluate the lower bound L d of the log-posterior in Equation (24) instead of the original log-likelihood. L d can be computed in the closed-form expression as follows:
L d = N 2 log 2 π 1 2 log | Q N N + β 1 I | 1 2 y : , d ( Q N N + β 1 I ) 1 y : , d β 2 tr ( K N N Q N N ) ,
where Q N N = K N M K M M 1 K M N . By comparing Equation (4) with Equation (25), the N × N matrix K N N is replaced by the matrix Q N N , which is considered a low-rank approximated form of the original matrix K N N . Therefore, by applying the matrix formula to Equation (25), we can evaluate the lower bound of the likelihood in O ( N M 2 ) time complexity more efficiently than the likelihood function in Equation (4). By substituting (25) into (17) and neglecting the constants, we obtain the following objective function L :
L = D 2 log | Q N N + β 1 I | 1 2 tr ( Q N N + β 1 I ) 1 Y Y β D 2 tr ( K N N Q N N ) α Q 2 tr L X X .
We maximize the lower bound of the likelihood function in Equation (26) and explain the methodology of the optimization in the following section.

4.3. Optimization

The objective function in Equation (26) is a continuous function with respect to the latent variables and kernel parameters and can be generally maximized by gradient-based optimization, such as conjugate gradient descent [51] and the quasi-Newton method [52]. Generally, gradients are derived from a chain role of differentiation, and, thus, we need to compute the differentiation of the SE kernel with respect to the latent variables. This differentiation is given as follows:
x q k ( x , x ) = p = 1 N w w p q = 1 Q 4 π 2 ( x q x q ) v q ( p ) 2 π μ q ( p ) tan 2 π ( x q x q ) μ q ( p ) × q = 1 Q exp 2 π 2 ( x q x q ) 2 v q ( p ) cos 2 π ( x q x q ) μ q ( p ) .
Furthermore, the differentiations with respect to the kernel parameters w p , v q ( p ) , and μ q ( p ) are given as follows:
w p k S M ( x , x ) = q = 1 Q exp 2 π 2 ( x q x q ) 2 v q ( p ) cos 2 π ( x q x q ) μ q ( p ) ,
v q ( p ) k S M ( x , x ) = 2 π 2 ( x q x q ) 2 exp 2 π 2 ( x q x q ) 2 v q ( p ) cos 2 π ( x q x q ) μ q ( p ) ,
μ q ( p ) k S M ( x , x ) = 2 π ( x q x q ) tan 2 π ( x q x q ) μ q ( p ) × exp 2 π 2 ( x q x q ) 2 v q ( p ) cos 2 π ( x q x q ) μ q ( p ) .
We optimize the parameters with Equations (27)–(30) as well as L K M N and L K M M , which have been derived in previous research [48]. After optimization, we obtain the optimal latent variables X , spectral μ p ( q ) , weights w p , and lengthscales v q ( p ) . As a result, we can reduce the number of hyperparameters and improve the tractability of our model.

5. Experiment

In this section, we validate our method through qualitative and quantitative experiments. We implemented our method with GPy [53], an open library for the Gaussian process, and ran it on an Intel Core i7-10700 CPU and 16 GB random access memory.

5.1. Experimental Setup

5.1.1. Dataset

We used four real-world datasets. Table 1 shows summary of the datasets used in this study.
  • MNIST (http://yann.lecun.com/exdb/mnist/ (accessed on 25 July 2022)) contains 70K images of hand-written digits and labels corresponding to each digit. We randomly selected 20K images and colored embeddings along the corresponding labels. MNIST contains a cluster structure of each digit, and the embedding should preserve the cluster structure.
  • COIL-20 [54] contains 1440 grayscale images of rotated objects. We used the first ten objects and colored embeddings along each object. Images of COIL-20 have a rotating structure, and the embedding should preserve it.
  • DBPedia (https://wiki.dbpedia.org/ (accessed on 29 July 2022)) contains 530K Wikipedia articles classified into 14 categories. We used 20K random articles and extracted feature vectors by FastText following [37]. We colored the embedding following the category information, and the low-dimensional representation should preserve the cluster structure.
  • Fashion MNIST (FMNIST) [55] contains 70K images of 10 kinds of fashion items and labels corresponding to each item. We randomly selected 20K images and colored embeddings to match the corresponding labels. Although FMNIST has a cluster structure similar to MNIST, the categories of FMNIST are more correlated than those of MNIST, and separating clusters in a low-dimensional representation is more difficult.

5.1.2. Comparative Methods

To validate the novelty of our method, we compared the proposed method (PM) with its version without the SM kernel in Equation (10) (PM w/o SM kernel) and without the prior in Equation (18) (PM w/o prior). Furthermore, we compared our method with the following comparative methods.
  • PCA [3] is a classical method for dimensionality reduction and linearly derives its embedding. We compared our method with PCA as a benchmark method. Note that GP-LVM-based methods typically use PCA as the initial values of their latent variables.
  • LE [34] is a classical graph-based approach and derives its low-dimensional embedding as the eigenvectors of the graph Laplacian. We used LE as the benchmark in the graph-based approaches.
  • t-SNE [10,11] is a long-standing graph-based data visualization technique that derives the low-dimensional representation by minimizing the KL divergence between the observed and low-dimensional spaces. We used t-SNE as the baseline method of the graph-based dimensionality reduction approaches.
  • Bayesian GP-LVM (BGP-LVM) [16,25] is a GP-LVM that introduces the Bayesian inference into the latent variables. We used BGP-LVM as the baseline method and visualized the mean vectors as a low-dimensional representation. We used the RBF kernel in Equation (6) and standard normal density as the prior distribution following the original work [16].
  • Potential of heat diffusion of affinity-based transition embedding (PHATE) [41] is a recently proposed graph-based approach and derives the neighborhood graph on the basis of the diffusion operation [56], enabling the global preservation of data. We used PHATE as a state-of-the-art method among the methods aimed at preserving global structures.

5.1.3. Evaluation

We used average-based and correlation-based metrics to evaluate the local and global preservation quality. The average-based metric is calculated by averaging the preservation quality around each data point and evaluates the local preservation around them. The correlation-based metric is a correlation value between the observed and embedding spaces and quantifies the global similarity between two spaces. We used (1) trustworthiness [57] as an average-based metric and (2) Shepard goodness [58] as a correlation-based one, which have also been used in previous studies [58,59]. Trustworthiness T ( k ) [ 0 , 1 ] is defined as follows:
T ( k ) = 1 2 N k ( 2 N 3 k 1 ) i = 1 N j N i k max ( 0 , r ( i , j ) k ) ,
where N i k denotes an index list of the k-nearest neighbors in the low-dimensional space, and r ( i , j ) indicates that samples i and j are the r ( i , j ) -th nearest neighbors in the observed space. Therefore, trustworthiness penalizes unexpected neighbors in the low-dimensional space and can evaluate the preservation quality of the local structures of data. We set the number of neighbors as k = 5 , but we observed that the results are robust to this setting. Shepard goodness is a global metric and is calculated with the Shepard diagram [60]. The Shepard diagram is a scatterplot of the point-wise distances of all points in the observed and embedding spaces. The Shepard goodness is the Spearman rank correlation of the Shepard diagram and can evaluate the global similarity between the original data and embedding. The Shepard goodness is computed as follows:
ρ = 1 6 n = 1 N s D n 2 N s 3 N s ,
where N s = N ( N 1 ) 2 is the number of the points on the Shepard diagram and D n is the distance between the ranks of two features of the n-th point on the Shepard diagram. Both metrics are within [ 0 , 1 ] , and a higher value indicates better results. Importantly, the global and local metrics are tradeoffs of each other and should be balanced to realize reliable visualization.

5.1.4. Training Procedure

Our method optimizes the latent variables and kernel parameters by maximizing the lower bound in Equation (26). The hyperparameters in our method include the strength of the regularization α and the number of inducing points M. The number of dimensions D enlarges the log-likelihood function in Equation (26); thus, we chose α as the proportional number to the dimensions α = 0.1 D . The number of inducing points significantly impacts the quality of the low-dimensional representation and is easily chosen in the small natural number (e.g., 10∼50); thus, we adaptively selected M according to the dataset. We optimized the objective function in Equation (26) using L-BFGS-B [52] of SciPy implementation [61].

5.2. Ablation Study

Figure 2 shows the ablation results on MNIST. We observed that the low-dimensional representation with PM w/o SM kernel divided the independent clusters into different parts in contrast to PM and PM w/o prior that properly connected each cluster. PM and PM w/o prior used the SM kernel considering the spectral density of the latent variables, which contributed to the proper embeddings of the cluster structure. Although the effect of the prior distribution was unclear from the MNIST results, it was demonstrated in the COIL-20 results in Figure 3. PM w/o prior does not preserve the object variation and rotation structure, due to the absence of a local structure consideration. From these comparisons, we confirm the effectiveness of our novelty.

5.3. Qualitative Results

Figure 4 shows the visualization results with the comparative methods. On MNIST, PCA and LE do not reflect the cluster structure corresponding to each digit, which is difficult to capture by linear dimensionality reduction. In contrast, embeddings of the nonlinear methods, such as t-SNE, BGP-LVM, PHATE, and PM, contain several cluster structures and successfully preserve the local structure of data. In the COIL-20 results, PCA and LE do not preserve the object variation. t-SNE, BGP-LVM, and PM retain the object variation, and PM retains the rotation structure more correctly than t-SNE and BGP-LVM, which have several linear and separated clusters. PHATE can embed COIL-20 with a complete object variation and rotation structure, due to the graph embedding optimization. On DBPedia, PCA and LE also fail to preserve local clusters. t-SNE, BGP-LVM, PHATE, and PM preserve them, but their shapes are different, and their correctness needs to be evaluated by quantitative metrics. On FMNIST, although all embeddings do not preserve the cluster structure, t-SNE, PHATE, and PM split them into several independent clusters, beneficial for gaining insight into the data structure.

5.4. Quantitative Results

Table 2 and Table 3 show the quantitative results of trustworthiness and Shepard goodness on each dataset, respectively. We first evaluate these results by separating them into graph-based methods (LE, t-SNE, and PHATE) and PCA-based methods (PCA and BGP-LVM). LE, t-SNE, and PHATE exhibit high values on trustworthiness (local metric) but low values on Shepard goodness (global metric). They embedded a neighborhood graph, the local estimation of given data, and thus over-reflected the local structures in low-dimensional representations. This drawback resulted in misleading visualizations due to the lack of global preservation, such as the shape of clusters on DBPedia by t-SNE and PHATE in Figure 4. In contrast, PCA and BGP-LVM, especially PCA, performed a reliable reduction in global preservation with a high Shepard goodness value, but they did not show good local preservation, resulting in a non-visible representation. Our method exhibited high values for both trustworthiness and Shepard goodness, indicating that our method balances global and local structure preservation, i.e., ensuring the reliability and visibility of low-dimensional embedding. Our method is based on GP-LVMs and regularizes latent variables with a neighborhood graph. This approach contributed to retaining local and global structures and achieving the high reliability and visibility of low-dimensional representations. From the above, we confirm the effectiveness of our method quantitatively.

6. Conclusions

In this study, we proposed a novel visualization-aided dimensionality reduction technique based on the Gaussian process. Our proposed method derives the latent representation using an expressive kernel function and by regularization based on a graph Laplacian retaining the local structure of data. Furthermore, we introduced a sparse method and realized scalable optimization with the lower bound of the log-posterior distribution. Our method preserves the global structure of data with a Gaussian-process-based formulation and the local structure with a locally estimated data manifold. In the experiment, we validated our method on multiple datasets and confirmed the effectiveness qualitatively and quantitatively. However, the visibility of low-dimensional representations is better with graph embedding methods, and the PCA-based methods have low scalability. In future work, we will introduce the local information and scalable optimization in a more efficient way to improve its applicability to visualization-aided dimensionality reduction.

Author Contributions

Conceptualization, K.W., K.M., T.O. and M.H.; methodology, K.W., K.M. and T.O.; software, K.W.; validation, K.W.; data curation, K.W.; writing—original draft preparation, K.W.; writing—review and editing, K.M., T.O. and M.H.; visualization, K.W.; funding acquisition, K.M., T.O. and M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported in part by JSPS KAKENHI grant numbers JP21H03456 and JP20K19856 and AMED grant number JP22zf0127004h0002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were used in this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Van der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative. J. Mach. Learn Res. 2009, 10, 66–71. [Google Scholar]
  2. Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. [Google Scholar] [CrossRef]
  3. Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
  4. Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  6. Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. β-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  7. Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
  8. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. NeurIPS 2020, 33, 6840–6851. [Google Scholar]
  9. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  10. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  11. Van der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 2014, 15, 3221–3245. [Google Scholar]
  12. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
  13. Kobak, D.; Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 2019, 10, 5416. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.A.; Kwok, I.W.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2019, 37, 38–44. [Google Scholar] [CrossRef]
  15. Lawrence, N.D. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 2005, 6, 1783–1816. [Google Scholar]
  16. Titsias, M.; Lawrence, N.D. Bayesian Gaussian process latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy, 13–15 May 2010; pp. 844–851. [Google Scholar]
  17. Neal, R.M. Bayesian Learning for Neural Networks; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 118. [Google Scholar]
  18. Lee, J.; Bahri, Y.; Novak, R.; Schoenholz, S.S.; Pennington, J.; Sohl-Dickstein, J. Deep neural networks as Gaussian processes. arXiv 2017, arXiv:1711.00165. [Google Scholar]
  19. Märtens, K.; Campbell, K.; Yau, C. Decomposing feature-level variation with covariate Gaussian process latent variable models. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 4372–4381. [Google Scholar]
  20. Jensen, K.; Kao, T.C.; Tripodi, M.; Hennequin, G. Manifold GPLVMs for discovering non-Euclidean latent structure in neural data. Adv. Neural Inf. Process. Syst. 2020, 33, 22580–22592. [Google Scholar]
  21. Liu, Z. Visualizing single-cell RNA-seq data with semisupervised principal component analysis. Int. J. Mol. Sci. 2020, 21, 5797. [Google Scholar] [CrossRef]
  22. Jørgensen, M.; Hauberg, S. Isometric Gaussian process latent variable model for dissimilarity data. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 18–24 July 2021; pp. 5127–5136. [Google Scholar]
  23. Lalchand, V.; Ravuri, A.; Lawrence, N.D. Generalised GPLVM with stochastic variational inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual Event, 28–30 March 2022; pp. 7841–7864. [Google Scholar]
  24. Wang, J.M.; Fleet, D.J.; Hertzmann, A. Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 283–298. [Google Scholar] [CrossRef] [Green Version]
  25. Damianou, A.C.; Titsias, M.K.; Lawrence, N.D. Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes. J. Mach. Learn. Res. 2016, 17, 1–62. [Google Scholar]
  26. Ferris, B.; Fox, D.; Lawrence, N. WiFi-SLAM using Gaussian process latent variable models. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 6–12 January 2007; pp. 2480–2485. [Google Scholar]
  27. Zhang, G.; Wang, P.; Chen, H.; Zhang, L. Wireless indoor localization using convolutional neural network and Gaussian process regression. Sensors 2019, 19, 2508. [Google Scholar] [CrossRef] [Green Version]
  28. Lu, C.; Tang, X. Surpassing human-level face verification performance on LFW with GaussianFace. In Proceedings of the AAAI conference on Artificial Intelligence (AAAI), Austin, TX, USA, 25–30 January 2015. [Google Scholar]
  29. Cho, Y.; Saul, L. Kernel methods for deep learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
  30. Wilson, A.; Adams, R. Gaussian process kernels for pattern discovery and extrapolation. In Proceedings of the International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; pp. 1067–1075. [Google Scholar]
  31. Lloyd, J.; Duvenaud, D.; Grosse, R.; Tenenbaum, J.; Ghahramani, Z. Automatic construction and natural-language description of nonparametric regression models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Quebec City, QC, Canada, 27–31 July 2014. [Google Scholar]
  32. Urtasun, R.; Darrell, T. Discriminative Gaussian process latent variable model for classification. In Proceedings of the International Conference on Machine Learning (ICML), Hong Kong, China, 19–22 August 2007; pp. 927–934. [Google Scholar]
  33. Zhong, G.; Li, W.J.; Yeung, D.Y.; Hou, X.; Liu, C.L. Gaussian process latent random field. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Atlanta, GA, USA, 11–15 July 2010; pp. 679–684. [Google Scholar]
  34. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef] [Green Version]
  35. Belkin, M.; Niyogi, P. Convergence of Laplacian eigenmaps. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 4–7 December 2006. [Google Scholar]
  36. Carreira-Perpinán, M.A. The Elastic Embedding Algorithm for Dimensionality Reduction. In Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 167–174. [Google Scholar]
  37. Fu, C.; Zhang, Y.; Cai, D.; Ren, X. AtSNE: Efficient and robust visualization on gpu through hierarchical optimization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AL, USA, 4–8 August 2019; pp. 176–186. [Google Scholar]
  38. Böhm, J.N.; Berens, P.; Kobak, D. Attraction-repulsion spectrum in neighbor embeddings. J. Mach. Learn. Res. 2022, 23, 1–32. [Google Scholar]
  39. Amid, E.; Warmuth, M.K. TriMap: Large-scale dimensionality reduction using triplets. arXiv 2019, arXiv:1910.00204. [Google Scholar]
  40. Wang, Y.; Huang, H.; Rudin, C.; Shaposhnik, Y. Understanding how dimension reduction tools work: An empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization. J. Mach. Learn. Res. 2021, 22, 9129–9201. [Google Scholar]
  41. Moon, K.R.; van Dijk, D.; Wang, Z.; Gigante, S.; Burkhardt, D.B.; Chen, W.S.; Yim, K.; Elzen, A.v.d.; Hirn, M.J.; Coifman, R.R.; et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 2019, 37, 1482–1492. [Google Scholar] [CrossRef] [PubMed]
  42. Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 611–622. [Google Scholar] [CrossRef] [Green Version]
  43. Hofmann, T.; Schölkopf, B.; Smola, A.J. Kernel methods in machine learning. Ann. Stat. 2008, 36, 1171–1220. [Google Scholar] [CrossRef] [Green Version]
  44. Rasmussen, C.E.; Williams, C.K. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  45. Lawrence, N.D.; Moore, A.J. Hierarchical Gaussian process latent variable models. In Proceedings of the International Conference on Machine Learning (ICML), Corvallis, OR, USA, 20–24 June 2007; pp. 481–488. [Google Scholar]
  46. Quinonero-Candela, J.; Rasmussen, C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. [Google Scholar]
  47. Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the International Conference on Artificial intelligence and statistics (AISTATS), Clearwater Beach, FL, USA, 16–18 April 2009; pp. 567–574. [Google Scholar]
  48. Lawrence, N.D. Learning for larger datasets with the Gaussian process latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), San Juan, Puerto Rico, 21–24 March 2007; pp. 243–250. [Google Scholar]
  49. Damianou, A.; Lawrence, N.D. Deep Gaussian processes. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, Scottsdale, AZ, USA, 29 April–1 May 2013; pp. 207–215. [Google Scholar]
  50. Dai, Z.; Damianou, A.; González, J.; Lawrence, N. Variational auto-encoded deep Gaussian processes. In Proceedings of the International Conference on Learning Representation (ICLR), San Juan, Puerto Rico, 2–4 May 2016; pp. 1–11. [Google Scholar]
  51. Shewchuk, J.R. An Introduction to the Conjugate Gradient Method without the Agonizing Pain; Carnegie Mellon University: Pittsburgh, PA, USA, 1994. [Google Scholar]
  52. Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
  53. GPy. GPy: A Gaussian Process Framework in Python. 2012. Available online: http://github.com/SheffieldML/GPy (accessed on 26 June 2020).
  54. Nene, S.A.; Nayar, S.K.; Murase, H. Columbia Object Image Library (COIL-20); Technical Report CUCS-006-96; Columbia University: New York, NY, USA, 1996. [Google Scholar]
  55. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
  56. Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef] [Green Version]
  57. Venna, J.; Kaski, S. Neighborhood preservation in nonlinear projection methods: An experimental study. In Proceedings of the International Conference on Artificial Neural Networks (ICANN), Vienna, Austria, 21–25 August 2001; pp. 485–491. [Google Scholar]
  58. Espadoto, M.; Martins, R.M.; Kerren, A.; Hirata, N.S.; Telea, A.C. Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Vis. Comput. Graph. 2019, 27, 2153–2173. [Google Scholar] [CrossRef] [PubMed]
  59. Zu, X.; Tao, Q. SpaceMAP: Visualizing High-dimensional Data by Space Expansion. In Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022; pp. 27707–27723. [Google Scholar]
  60. Joia, P.; Coimbra, D.; Cuminato, J.A.; Paulovich, F.V.; Nonato, L.G. Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 2011, 17, 2563–2571. [Google Scholar] [CrossRef] [PubMed]
  61. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Sampling results from GP ( 0 , k ( · , · ) ) for each kernel function. Since the SE kernel is a special case of the SM kernel, they produce similar results.
Figure 1. Sampling results from GP ( 0 , k ( · , · ) ) for each kernel function. Since the SE kernel is a special case of the SM kernel, they produce similar results.
Applsci 13 08018 g001
Figure 2. Ablation study of our method on MNIST visualization.
Figure 2. Ablation study of our method on MNIST visualization.
Applsci 13 08018 g002
Figure 3. Ablation study of our method on COIL-20 visualization.
Figure 3. Ablation study of our method on COIL-20 visualization.
Applsci 13 08018 g003
Figure 4. Qualitative results of visualization on each dataset. We color the embedding following the category given in the datasets.
Figure 4. Qualitative results of visualization on each dataset. We color the embedding following the category given in the datasets.
Applsci 13 08018 g004
Table 1. Details of the datasets used in this study.
Table 1. Details of the datasets used in this study.
DatasetSamplesDimensionsCategoriesTypeFeatures
MNIST20,00078410ImagePixels
COIL-2072016,38410ImagePixels
DBPedia20,00010014TextFastText
FMNIST20,00078410ImagePixels
Table 2. Quantitative results of trustworthiness. We boldface the best results and underline the second-best results on each dataset.
Table 2. Quantitative results of trustworthiness. We boldface the best results and underline the second-best results on each dataset.
DatasetPCALEt-SNEBGP-LVMPHATEPM
MNIST0.7380.7560.9940.8230.8710.881
COIL-200.8980.8820.9970.9840.9310.974
DBPedia0.8830.9560.9980.9890.9860.990
FMNIST0.9120.9270.9940.9090.9580.953
Table 3. Quantitative results of Shepard goodness. We boldface the best results and underline the second-best results on each dataset.
Table 3. Quantitative results of Shepard goodness. We boldface the best results and underline the second-best results on each dataset.
DatasetPCALEt-SNEBGP-LVMPHATEPM
MNIST0.5030.4310.3490.4640.3680.512
COIL-200.8180.6330.6110.5250.3550.687
DBPedia0.7780.4900.3390.5940.3610.694
FMNIST0.8760.6920.5790.8830.6150.862
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Watanabe, K.; Maeda, K.; Ogawa, T.; Haseyama, M. A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data Visualization. Appl. Sci. 2023, 13, 8018. https://doi.org/10.3390/app13148018

AMA Style

Watanabe K, Maeda K, Ogawa T, Haseyama M. A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data Visualization. Applied Sciences. 2023; 13(14):8018. https://doi.org/10.3390/app13148018

Chicago/Turabian Style

Watanabe, Koshi, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. 2023. "A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data Visualization" Applied Sciences 13, no. 14: 8018. https://doi.org/10.3390/app13148018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop