Previous Article in Journal
Train Neural Networks with a Hybrid Method That Incorporates a Novel Simulated Annealing Procedure
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Visualization and Analysis of Three-Way Data Using Accumulated Concept Graphs

1
School of Science and Engineering, Tokyo Denki University, Hatoyama, Saitama 350-0394, Japan
2
Department of Computer Systems, Tallinn University of Technology, Ehitajate Tee 5, 19086 Tallinn, Estonia
*
Author to whom correspondence should be addressed.
AppliedMath 2024, 4(3), 1162-1180; https://doi.org/10.3390/appliedmath4030062
Submission received: 28 June 2024 / Revised: 19 August 2024 / Accepted: 26 August 2024 / Published: 9 September 2024

Abstract

:
This paper introduces the Accumulated Concept Graph (ACG), a visualization tool based on the quantile method designed to analyze three-way data, including distributional data. Such data often have complex structures that make it difficult to identify patterns using conventional visualization techniques. The ACG represents each object with one or more monotonic line graphs. As a result, the three-way data are visualized as a set of parallel monotonic line graphs that never intersect. This non-intersecting property allows for the easy identification of both macroscopic and microscopic patterns within the data. We demonstrate the usefulness of ACGs and principal component analysis in the analysis of real three-way datasets.

1. Introduction

Visual perception is a powerful tool for humans, aiding in the recognition of complex patterns within data. We can identify details and patterns more easily through visual input than by analyzing large volumes of numeric data. This ability is a crucial factor in visual data mining and is particularly beneficial during the exploratory data analysis phase, where little is known about the data or the patterns within them [1]. Numerous ideas have been proposed and examined within traditional data analysis (e.g., [1,2,3,4,5]).
The visualization of multi-dimensional data is a complex procedure, even more so than traditional data visualization, yet it is essential for a comprehensive understanding of the data. Symbolic data, a type of multi-dimensional data, allow for the aggregation of large datasets (including big data that cannot be analyzed using classical approaches), reducing them to a more compact format and thus enabling researchers to analyze and process the data. The most thorough overview of the uniqueness, benefits, and available approaches for symbolic data remains [6]. Due to the complexity of this problem, many different ideas have been explored over the years (e.g., [7,8,9,10,11,12,13,14,15,16,17,18]). The most influential research in the field of symbolic data visualization has likely been led by Noirhomme-Fraiture, who has authored multiple works on symbolic data visualization, using a radial chart-based “zoomstar” approach to describe various features and different data types (e.g., [8,9]). Visualization of multiway (including three-way) data has also been covered in works such as [16,17,18].
The main drawback of the currently available approaches is that they can only handle a limited number of features, require a very specific type of symbolic data, or are unable to support datasets with varying types of symbolic data. Most of the focus has been on visualizing the results of principal component analysis or clustering. However, detailed visualizations designed for exploratory data analysis or human perception, which would allow for the identification of both microscopic and macroscopic details in the data, have been largely overlooked.
The aim of this paper is to propose a solution that overcomes these shortcomings by using a quantile method to analyze distributional symbolic data with Accumulated Concept Graphs (ACGs) ([19]). The main advantage of distributional symbolic data is that any other type of symbolic data can be transformed to have a distributional format. The proposed approach enables readers to observe both macroscopic and microscopic properties of the underlying aggregated data. In Section 2, we describe three ACGs for the Hardwood dataset [20,21], which is a three-way dataset organizing ten hardwoods. Each hardwood is composed of seven quantile vectors and described by eight features. We define the Quantile Vectors ACG (QV-ACG), the Feature-Wise ACG (FW-ACG), and the Total ACG (T-ACG) for the given numerical data: (10 objects) × (7 quantile vectors) × (8 features). The QV-ACG represents each object using a set of seven monotonically increasing line graphs, obtained by accumulating eight (0–1)-normalized feature values for each quantile vector. The FW-ACG represents each object with a set of eight monotonically increasing line graphs, obtained by accumulating seven (0–1)-normalized quantile values. By further accumulating the FW-ACG, we obtain the T-ACG. In the T-ACG, each line graph describing the hardwood is obtained by accumulating the eight feature-wise line graphs in a given order. We describe various macroscopic and microscopic properties of the Hardwood dataset using the ACGs, along with the results of the PCA.
In Section 3, we examine the Prefecture Profile Data I [22] dataset, another three-way dataset. This dataset represents Japan’s 47 prefectures based on the number of people employed in 10 different job categories over six distinct time periods. By applying the quantile method with ACGs and PCA, we visualize and analyze this numerical data, structured as (47 objects) × (6 quantile vectors) × (10 features).
In Section 4, we analyze another three-way dataset: Prefecture Profile Data II [22]. This dataset represents the 47 prefectures using seven different types of numerical data tables. By applying the quantile method, we transform this general three-way data into the following format: (47 objects) × (5 quantile vectors) × (11 features). We then visualize and analyze the transformed data by combining ACGs with the quantile method of PCA
Section 5 summarizes the major collaborative properties of ACGs and PCA for the given three-way datasets.

2. Analysis of Distributional Data (Hardwood Data)

2.1. Accumulated Concept Graphs for the Hardwood Data

As a typical three-way data problem, we selected the following ten hardwoods based on five species from the US Geological Survey [20]:
  • Acer East, Acer West; Alnus East, Alnus West; Fraxinus East, Fraxinus West;
  • Juglans East, Juglans West; and Quercus East, Quercus West.
Table 1 lists the eight histogram-valued features describing the selected hardwoods. For example, Acer East is described by seven quantile vectors, corresponding to quantile values for 0, 10, 25, 50, 75, 90, and 100 percentiles. The quantile vector QV 4, associated with the 50th percentile, has feature values such as ANNT = 9.2, JANT = −5.1, JULT = 22.2, and so on.
In Table 2, the quantile values for each feature satisfy the monotonicity property, meaning that the seven quantile vectors follow a consistent vector order:
QV1 ≤ QV2 ≤ ⋯ ≤ QV7.
We apply (0–1) normalization to the quantile values of the ten hardwoods for each feature. It is important to note that (0–1) normalization is essential to ensure consistent data with unitless numbers. Table 3 shows the results for Acer East.
Let xij, j = 1, 2, …, 8; i = 1, 2, …, 7, be the (0–1)-normalized quantile values in Table 3. Then, the accumulated quantile values yij, j = 1, 2, …, 8, for the quantile vector QVi, i = 1, 2, …, 7, are given by:
yij = xij + yi(j−1), j =1, 2, …, 8; i = 1, 2, …, 7,
where we assume that yi0 = 0 for all i. We accumulate the (0–1)-normalized feature values for each quantile vector and obtain the result in Table 4 for Acer East.
We organized the data into a long Excel column by vertically arranging the QV1–QV7 values from Table 4, with one space inserted between different quantile vectors. The QV-ACG was then generated using the scatter plot command, as shown in Figure 1. By swapping the positions of the quantile vectors and features, we create the FW-ACG for Acer East, shown in Figure 2.
The following important observations should be noted:
  • The resulting line graphs display the accumulated sizes of the quantile vectors for the given set of features, and they do not intersect within the ACGs.
  • The shapes of the line graphs in the QV-ACG change when the order of features is altered. However, their overall lengths, or accumulated sizes, remain unchanged.
We generated the QV-ACG (Figure 3) and FW-ACG (Figure 4) for the Hardwood data, with two spaces placed between different hardwoods in the Excel column. The Total ACG (T-ACG) was obtained in Figure 5 by further accumulating the line graphs for each hardwood in the FW-ACG.
The following insights can be drawn from these ACGs:
  • Among the hardwoods, Alnus West and Quercus West have the largest line graph sizes for the 7th quantile in the QV-ACG, the 8th line graph in the FW-ACG, and the T-ACG (macroscopic property).
  • In the T-ACG, East and West hardwoods show similarities in their final segments, with the West hardwoods being larger than the East hardwoods.
  • In the QV-ACGs for each species, the difference between East and West is mainly influenced by the 7th quantile vector. Removing the 7th quantile line graphs from Figure 3 significantly reduces the differences between East and West for each species.
  • The lower portions of the graphs in both the QV-ACG and FW-ACG display similar shapes for each species.
  • In both the QV-ACG and FW-ACG, the East hardwoods exhibit general similarity to each other, with the exception of Alnus East.
  • In the QV-ACG and FW-ACG, Fraxinus West, Juglans West, and Quercus West display a general similarity, whereas Acer West and Alnus West also exhibit comparable characteristics.
Figure 6 displays a scatter plot of the Hardwood data, utilizing the minimum value of QV1 and the maximum value of QV7 for each hardwood. This plot effectively demonstrates the macroscopic properties of the Hardwood data, as uncovered by the ACGs.

2.2. Principal Component Analysis of the Hardwood Data

Principal component analysis (PCA) is a valuable tool for visualizing the relationships between objects in factor planes defined by the principal components. In the PCA for the Hardwood data [21], we calculate Spearman’s rank order correlation matrix from the (10 × 7) 8-dimensional quantile vectors, with the results shown in Table 5. In this table, the first principal component (Pc1) represents a size factor, with similar weights assigned to all eight features, and has a significantly high contribution ratio. The second principal component (Pc2) represents a shape factor, distinguishing two groups: precipitation and moisture index; temperature and growing degree days. Figure 7 illustrates the positions of the eight features, clearly separating them into the two groups: (precipitation and moisture) and (temperature and growing degree days).
Figure 8 shows the results of the PCA, where each hardwood is represented by six connected lines spanning from the minimum to the maximum quantile vector. Three distinct groups are evident: (Acer West, Alnus West), (East Hardwoods), and (Fraxinus West, Juglans West, Quercus West). Alnus West and Quercus West are the largest. Additionally, the minimum and maximum quantile vectors effectively highlight the similarities and differences between hardwoods, as identified by the ACGs. The final arrow lines, connecting the 90th and 100th percentile quantile vectors, are particularly long for the West hardwoods, a characteristic that is also observed in both the QV-ACG and FW-ACG.
This highlights that visualizations using ACGs and the quantile method of PCA are effective tools for gaining insights into the data.

3. Analysis of Periodically Summarized Multiple Data Tables (Prefecture Profile Data I)

3.1. Accumulated Concept Graphs for the Prefecture Profile Data I

We have n periodically summarized data tables with the same structure, where each table consists of N objects described by d features. This results in a three-way data table in the form of n × N × d.
In the Prefecture Profile Data I [22] dataset, the number of people employed in ten different job categories, as shown in Table 6, was recorded across 47 prefectures of Japan, from Hokkaido to Okinawa, for the years 1980, 1985, 1990, 1995, 2000, and 2005. We represent these data as (47 objects) × (6 quantile vectors) × (10 features). As part of our analysis, Table 7 and Table 8 present a summary of the (0–1)-normalized quantile vectors and the accumulated quantile vectors for Hokkaido, respectively.
Figure 9 presents the QV-ACG, where each prefecture is represented by six monotone line graphs, with a spacing of one unit between each line graph and a gap of two units between different prefectures. Similarly, Figure 10 shows the FW-ACG, where each prefecture is depicted by ten monotone line graphs, with the same spacing arrangement.
From the QV-ACG and FW-ACG, we can observe the following:
  • Tokyo is the largest prefecture, while Tottori is the smallest. The length of the line graphs primarily reflects the population size of each prefecture.
  • By analyzing the patterns in the QV-ACGs and FW-ACGs, it is easy to identify similar prefectures. For example, Aomori and Iwate, Akita and Yamagata, Tochigi and Gunma, and Toyama and Ishikawa share similar patterns. Additionally, it is straightforward to distinguish between rural and urban areas.
  • As a microscopic observation, Tokyo, Kanagawa, and Osaka have significantly higher numbers of people employed in unclassified jobs compared to other prefectures.
  • In 1980, many rural prefectures, such as Akita and Aomori, show very low starting values in the first six positions in the line graph (corresponding to service jobs), while the last four positions (related to production jobs) display noticeably higher values.
  • In many cases, such as in Kochi, Iwate, and Tochigi, the line graphs for service jobs remain short. In contrast, prefectures like Ibaraki and Chiba show a significant increase in the length of line graphs for service and management jobs in later years.
It is important to emphasize that visualizing data through ACGs allows us to effectively capture both the macroscopic and microscopic similarities and differences between prefectures in two-dimensional figures.

3.2. Principal Component Analysis for the Prefecture Profile Data I

We derive the Spearman’s rank–order correlation matrix from the (47 × 6) 10-dimensional quantile vectors. Table 9 presents the resulting principal components. The first principal component, Pc1, represents the size factor, with a very high contribution ratio. In the second principal component, Pc2, F7 (agriculture, forestry, and fishery) has a notably large positive value. Similarly, in the third principal component, Pc3, F10 (unclassified jobs) shows a significantly large positive value. Figure 11a,b illustrate the relationships among the ten features, as represented by pairs of eigenvectors (Pc1, Pc2) and (Pc1, Pc3), respectively.
Figure 12 and Figure 13 show the results of the PCA. In these figures, the zoomed-in results are obtained by removing ten large prefectures from Tokyo to Shizuoka. We have the following facts.
  • In 1980, with the exceptions of Tokyo, Kanagawa, and Osaka, other prefectures existed in a narrow region in the factor planes. As time goes on, each prefecture spreads in their respective direction in the factor planes.
  • In the factor plane by (Pc1, Pc2), many prefectures grow up with addition of other job types to F7, i.e., agriculture, forestry, and fishery.
  • In the factor plane by (Pc1, Pc3), nine large prefectures from Tokyo to Fukuoka are affected by job type F10, i.e., unclassified jobs. In the zoomed-in factor plane, Aomori, Kumamoto, Kyoto, Ibaraki, and Hiroshima show the same tendency.
These findings, obtained through the PCA, may be useful in understanding the Prefecture Profile Data I together with the QV-ACG data, shown in Figure 9, and the FW-ACG data, shown in Figure 10.

4. Analysis of Multiple Different Sized Data (Prefecture Profile Data II)

4.1. Total ACG of the Prefecture Profile Data II

In the Prefecture Profile Data II [22] dataset, seven different data tables, summarized in Table 10, describe the profiles of 47 Japanese prefectures in 2010. Each feature is represented by a numerical value drawn from a set of possible feature values.
By merging the seven data tables from Table 10 into one large table, we create a two-way dataset with a size of (47 prefectures) × (49 features). The Total ACG (T-ACG) shown in Figure 14 is obtained by accumulating the (0–1)-normalized 49 features. From this figure, we can observe the following insights:
  • The largest ten prefectures are consistent with those identified in the QV-ACG and FW-ACG for Prefecture Profile Data I, while Tottori remains the smallest.
  • Macroscopically similar prefectures include: (Aomori and Iwate), (Tochigi and Gunma), (Toyama and Ishikawa), (Fukui and Yamanashi), (Gifu and Mie), (Okayama and Kumamoto), and (Shiga, Nara, and Oita).
Figure 15 presents a scatterplot of the 47 prefectures, using two values: F1 (agriculture and forestry) and the total accumulated value, F49. This figure effectively illustrates the macroscopic properties of the 47 prefectures as revealed by the T-ACG in Figure 14.
Further findings using the QV-ACG, FW-ACG, and PCA are discussed in the next section.

4.2. The Quantile Method of the ACGs for the Prefecture Profile Data II

In the analysis of the Hardwood data and Prefecture Profile Data I, the quantile methods effectively detected the microscopic patterns within the datasets. To extend this approach, we combined the seven tables from Table 10 into a single table, as shown in Table 11, where each of the eleven features is represented by five quantile values. In Table 11, features F4, F5, F9, F10, and F11 include a value labeled as “Dummy”, which we assume to be zero. Consequently, our dataset takes the following form: (47 prefectures) × (5 quantile vectors) × (11 features).
For each of the eleven features, we calculated the (0–1)-normalized quantile values for the 47 prefectures. Given that the features have different units, it is essential to emphasize the importance of (0–1)-normalization. Table 12 and Table 13 display the (0–1)-normalized quantile vectors and the accumulated quantile vectors for Hokkaido, respectively. Figure 16 and Figure 17 show the QV-ACG and FW-ACG for Prefecture Profile Data II, respectively.
The following insights can be drawn from the QV-ACG and FW-ACG:
  • The ten largest prefectures remain in the same order in terms of macroscopic size, consistent with the T-ACG: Tokyo > Kanagawa > Osaka > Aichi > Saitama > Hokkaido > Chiba > Hyogo > Fukuoka > Shizuoka.
  • The smallest prefecture is not Tottori but Tokushima.
  • Some prefecture pairs that appeared similar in the T-ACG become dissimilar in the QV-ACG and FW-ACG. For instance, Akita and Yamagata differ in features F1~F4, Tochigi and Gunma differ in F9~F11, and Toyama and Ishikawa differ in F4~F11.
Overall, the QV-ACG and FW-ACG provide more detailed information than the T-ACG in the analysis of Prefecture Profile Data II.

4.3. The Quantile Method of PCA for the Prefecture Profile Data II

We calculate the Spearman’s rank–order correlation matrix from the (47 × 5) 11-dimensional quantile vectors. Table 14 presents the resulting principal components. In this table, the first principal component (Pc1) represents the size factor, with a notably high contribution ratio. In Pc2, features F1~F4 have positive values, with F1 carrying a significant weight, while features F5~F11 show relatively negative weights, as illustrated in Figure 18a. In Pc3, F4 has a substantial positive weight, while F1 and F5 exhibit large negative weights. The remaining features span a broad range of values, as shown in Figure 18b.
Figure 19 and Figure 20 are the results of the quantile method of PCA in the factor planes (Pc1, Pc2) and (Pc1, Pc3). Figure 21 is the zoomed-in result in the factor plane (Pc1, PC3).
The following observations have been made:
  • The ten largest prefectures in the factor planes (Pc1, Pc2) and (Pc1, Pc3) align with the results from the QV-ACG and FW-ACG analyses.
  • Many prefectures are concentrated in a narrow region of the factor plane (Pc1, Pc2).
  • Tottori is isolated from other prefectures in the plane (Pc1, Pc3), where QV3~QV5 almost overlap.
  • In the zoomed-in factor plane, most prefectures show an upward trend, except Tottori. Notably, Gunma displays significant movement in the final portion due to feature F9, while Tokushima exhibits the smallest size. Among similar prefecture pairs, Okayama and Kumamoto trace comparable curves.
It is important to note that the results from both PCA and ACGs complement each other, enhancing the analysis and understanding of the three-way data.

5. Discussion

This paper demonstrates the effectiveness of Accumulated Concept Graphs (ACGs) and their collaborative use with the quantile method of PCA for analyzing three-way datasets. The advantages of ACGs, as illustrated by the examples, are as follows:
  • Universal Approach with Data Specificity: ACGs are versatile and can be applied to various types of symbolic data while still capturing detailed microscopic properties within unaggregated data.
  • Simplicity: Transforming three-way data into a distributional format is computationally efficient, making it suitable for large datasets.
  • Microscopic and Macroscopic Properties: ACGs highlight macroscopic differences between objects through the total lengths of line graphs, while microscopic differences are revealed through the local shapes created by accumulated values at specific points.
  • Outlier Detection: Unlike traditional visualizations like parallel coordinates or radar charts, ACGs feature parallel monotone line graphs that never intersect, making it easier to detect outliers.
  • Enabling Classical Analysis on Symbolic Data: ACGs and the transformation of symbolic data into a distributional format allow classical methods like PCA to be applied to symbolic data, which would otherwise be computationally demanding or impractical.
Additionally, ACGs can be easily created using scatter plots in Excel, and the proposed methods can be extended to more complex symbolic data. Future work could explore how ACGs can enhance other areas of data analysis, such as clustering, especially considering the computational challenges of existing clustering methods for symbolic data.
In conclusion, this paper proposes a visualization technique using simple line graphs, termed Accumulated Concept Graphs, for three-way and symbolic data. This approach enables the visualization of both macroscopic and microscopic details embedded in the data. The primary contribution of the method is its ability to provide a simple yet comprehensive visual overview of complex relationships within the dataset. By facilitating exploratory data analysis through visual interpretation, the proposed method aids analysts in making informed decisions about further analyses. Furthermore, this method can be applied to datasets with intricate internal structures that are difficult to visualize using currently available techniques.

Author Contributions

Conceptualization and methodology, M.I. and H.Y.; software and validation, K.U.; original draft presentation, M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKWNHI (Grants-in-Aid for Scientific Research) Grant Number 253000268. Part of this work has been conducted under JSPS International Research Fellow program.

Data Availability Statement

The data presented in this study are openly available in the websites of the references [21,22].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Franconeri, S.L.; Padilla, L.M.; Shah, P.; Zacks, J.M.; Hullman, J. The Science of Visual Data Communication: What Works. Psychol. Sci. Public Interest 2021, 22, 110–161. [Google Scholar] [CrossRef] [PubMed]
  2. Healy, K. Data Visualization: A Practical Introduction; Princeton University Press: Princeton, NJ, USA, 2018. [Google Scholar]
  3. Keim, D.A. Information Visualization and Visual Data Mining. IEEE Trans. Vis. Comput. Graph. 2002, 8, 1–8. [Google Scholar] [CrossRef]
  4. Tukey, J.W. Exploratory Data Analysis; Addison-Wesley Publishing Company: San Francisco, CA, USA, 1977; Volume 2. [Google Scholar]
  5. Tukey, P. Graphical Methods for Data Analysis; Wadsworth: Belmont, CA, USA, 1983. [Google Scholar]
  6. Billard, L.; Diday, E. Symbolic Data Analysis: Conceptual Statistics and Data Mining; Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  7. Khalid, Z.M.; Zeebaree, S.R. Big Data Analysis for Data Visualization: A Review. Int. J. Sci. Bus. 2021, 5, 64–75. [Google Scholar]
  8. Elhamdadi, H.; Stefkovics, A.; Beyer, J.; Moerth, E.; Pfister, H.; Bearfield, C.X.; Nobre, C. Vistrust: A Multidimensional Framework and Empirical Study of Trust in Data Visualizations. IEEE Trans. Vis. Comput. Graph. 2023, 30, 348–358. [Google Scholar] [CrossRef] [PubMed]
  9. Noirhomme-Fraiture, M. Visualization of Large Data Sets: The Zoomstar Solution. Int. Electron. J. Symb. Data Anal. 2002, 0, 26–39. Available online: https://www.researchgate.net/publication/228615915_Visualization_of_large_data_sets_The_Zoom_Star_solution (accessed on 25 August 2024).
  10. Diday, E.; Noirhomme-Fraiture, M. Symbolic Data Analysis and the SODAS Software; Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  11. Verde, R.; Lechevallier, Y.; Chavent, M. Symbolic Clustering Interpretation and Visualization. Electron. J. Symb. Data Anal. 2003, 1, 1. [Google Scholar]
  12. Cui, W.; Strazdins, G.; Wang, H. Visual Analysis of Multidimensional Big Data: A Scalable Lightweight Bundling Method for Parallel Coordinates. IEEE Trans. Big Data 2021, 9, 106–117. [Google Scholar] [CrossRef]
  13. Su, E.C.Y.; Wu, H.M. Dimension Reduction and Visualization of Multiple Time Series Data: A Symbolic Data Analysis Approach. Comput. Stat. 2024, 39, 1937–1969. [Google Scholar] [CrossRef]
  14. Pelka, M. Outlier Identification for Symbolic Data with the Application of the DBSCAN Algorithm. In Conference of the Section on Classification and Data Analysis of the Polish Statistical Association; Springer International Publishing: Cham, Switzerland, 2021; pp. 53–62. [Google Scholar]
  15. Umbleja, K.; Ichino, M.; Yaguchi, H. Improving Symbolic Data Visualization for Pattern Recognition and Knowledge Discovery. Vis. Inform. 2020, 4, 23–31. [Google Scholar] [CrossRef]
  16. Nguyen, H.; Rosen, P.; Wang, B. Visual Exploration of Multiway Dependencies in Multivariate Data. In SIGGRAPH ASIA 2016 Symposium on Visualization; ACM: New York, NY, USA, 2016. [Google Scholar]
  17. Sarkar, D. Lattice: Multivariate Data Visualization with R; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  18. Todorov, V.; Di Palma, M.; Gallo, M. Robust Tools for Visualization in Three-way Analysis. In Proceedings of the International Conference on Robust Statistics, Leuven, Belgium, 2–6 July 2018. [Google Scholar]
  19. Ichino, M. The Quantile Method for Symbolic Principal Component Analysis. Stat. Anal. Data Min. ASA Data Sci. J. 2011, 4, 184–198. [Google Scholar] [CrossRef]
  20. Ichino, M.; Britto, P. The Data Accumulation Graph (DAG) to Visualize Multidimensional Symbolic Data. In Proceedings of the Workshop in Symbolic Data Analysis, Taipei, Taiwan, 14–16 June 2014. [Google Scholar]
  21. US Geological Survey. Tables of Histogram Data: Climate-Vegetation Atlas of North America. 2013. Available online: http://pubs.usgs.gov/pp/p1650-b/datatables/hgtable.xls (accessed on 2 October 2018).
  22. National Statistics Center Japan. Statistics of Japan. 2018. Available online: https://www.e-stat.go.jp/en (accessed on 8 October 2018).
Figure 1. The Quantile Vectors ACG (QV-ACG) for Acer East.
Figure 1. The Quantile Vectors ACG (QV-ACG) for Acer East.
Appliedmath 04 00062 g001
Figure 2. The Feature-Wise ACG (FW-ACG) for Acer East.
Figure 2. The Feature-Wise ACG (FW-ACG) for Acer East.
Appliedmath 04 00062 g002
Figure 3. The QV-ACG for the Hardwood data.
Figure 3. The QV-ACG for the Hardwood data.
Appliedmath 04 00062 g003
Figure 4. The FW-ACG for the Hardwood data.
Figure 4. The FW-ACG for the Hardwood data.
Appliedmath 04 00062 g004
Figure 5. The T-ACG for the Hardwood data.
Figure 5. The T-ACG for the Hardwood data.
Appliedmath 04 00062 g005
Figure 6. Scatter plot by the minimum value of QV1 and the maximum value of QV7.
Figure 6. Scatter plot by the minimum value of QV1 and the maximum value of QV7.
Appliedmath 04 00062 g006
Figure 7. Scatter plot by the eigenvectors.
Figure 7. Scatter plot by the eigenvectors.
Appliedmath 04 00062 g007
Figure 8. The result of PCA for the Hardwood data.
Figure 8. The result of PCA for the Hardwood data.
Appliedmath 04 00062 g008
Figure 9. The QV-ACG for Prefecture Profile Data I.
Figure 9. The QV-ACG for Prefecture Profile Data I.
Appliedmath 04 00062 g009
Figure 10. The FW-ACG for Prefecture Profile Data I.
Figure 10. The FW-ACG for Prefecture Profile Data I.
Appliedmath 04 00062 g010
Figure 11. Scatter plots by (Pc1, Pc2) and (Pc1, Pc3) for Prefecture Profile Data I.
Figure 11. Scatter plots by (Pc1, Pc2) and (Pc1, Pc3) for Prefecture Profile Data I.
Appliedmath 04 00062 g011
Figure 12. Results of PCA in the factor plane by (Pc1, Pc2) for Prefecture Profile Data I.
Figure 12. Results of PCA in the factor plane by (Pc1, Pc2) for Prefecture Profile Data I.
Appliedmath 04 00062 g012
Figure 13. Results of PCA in the factor plane by (Pc1, Pc3) for Prefecture Profile Data I.
Figure 13. Results of PCA in the factor plane by (Pc1, Pc3) for Prefecture Profile Data I.
Appliedmath 04 00062 g013
Figure 14. The T-ACG for Prefecture Profile Data II.
Figure 14. The T-ACG for Prefecture Profile Data II.
Appliedmath 04 00062 g014
Figure 15. Plot by the minimum and the maximum values of the T-ACGs.
Figure 15. Plot by the minimum and the maximum values of the T-ACGs.
Appliedmath 04 00062 g015
Figure 16. The QV-ACG for the Prefecture Profile Data II.
Figure 16. The QV-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g016
Figure 17. The FW-ACG for the Prefecture Profile Data II.
Figure 17. The FW-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g017
Figure 18. The FW-ACG for the Prefecture Profile Data II.
Figure 18. The FW-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g018
Figure 19. The FW-ACG for the Prefecture Profile Data II.
Figure 19. The FW-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g019
Figure 20. The FW-ACG for the Prefecture Profile Data II.
Figure 20. The FW-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g020
Figure 21. The FW-ACG for the Prefecture Profile Data II.
Figure 21. The FW-ACG for the Prefecture Profile Data II.
Appliedmath 04 00062 g021
Table 1. Features for Hardwood data.
Table 1. Features for Hardwood data.
FeatureDescription
F1Annual Temperature (ANNT) (°C)
F2January Temperature (JANT) (°C)
F3July Temperature (JULT) (°C)
F4Annual Precipitation (ANNP) (mm)
F5January Precipitation (JANP) (mm)
F6July Precipitation (JULP) (mm)
F7Growing Degree Days on 5 °C base × 1000 (GDC5)
F8Moisture Index (MITM)
Table 2. Acer East, represented with 7 quantile vectors by 8 feature values.
Table 2. Acer East, represented with 7 quantile vectors by 8 feature values.
QVANNTJANTJULTANNPJANPJULPGDC5MITM
1−2.30−24.611.541510560.50.62
20.60−18.316.672023771.20.89
33.80−12.318.283540891.50.94
49.20−5.1022.21010691002.50.97
514.42.3025.81200961133.60.99
617.97.9027.313551271354.80.99
723.818.928.81630166226.81.00
Table 3. Acer East, represented with 7 quantile vectors by 8 (0–1)-normalized feature values.
Table 3. Acer East, represented with 7 quantile vectors by 8 (0–1)-normalized feature values.
QVANNTJANTJULTANNPJANPJULPGDC5MITM
10.250.110.160.070.010.120.050.59
20.320.220.360.140.030.170.130.88
30.410.330.420.160.060.200.170.93
40.540.450.570.200.100.220.290.97
50.680.580.700.240.140.250.420.99
60.760.680.760.280.190.300.560.99
70.910.870.810.340.250.490.801.00
Table 4. Representation of Acer East by 7 accumulated quantile vectors.
Table 4. Representation of Acer East by 7 accumulated quantile vectors.
QVANNTJANTJULTANNPJANPJULPGDC5MITM
10.250.360.530.600.610.740.781.37
20.320.550.901.041.071.241.372.25
30.410.731.151.311.371.571.732.67
40.540.991.561.761.862.082.373.34
50.681.261.962.202.342.593.014.00
60.761.442.202.482.672.963.524.51
70.911.792.602.933.183.674.475.47
Table 5. Principal components for the Hardwood data.
Table 5. Principal components for the Hardwood data.
SpearmanPc1Pc2
Eigenvalues6.6910.909
Contribution (%)83.63511.357
EigenvectorsPc1Pc2
ANNT0.362−0.363
JANT0.346−0.427
JULT0.372−0.208
ANNP0.3590.369
JANP0.3370.365
JULP0.3520.170
GDC50.365−0.331
MITM0.3350.484
Table 6. Ten job types in Prefecture Profile Data I [22].
Table 6. Ten job types in Prefecture Profile Data I [22].
FeatureDescription
F1Professional skills
F2Management jobs
F3Office works
F4Sales jobs
F5Service jobs
F6Security jobs
F7Agricultural forestry and fisheries
F8Transportation and communication
F9Industrial process work
F10Unclassified jobs
Table 7. The (0–1)-normalized quantile vectors for Hokkaido for Prefecture Profile Data I.
Table 7. The (0–1)-normalized quantile vectors for Hokkaido for Prefecture Profile Data I.
HokkaidoF1F2F3F4F5F6F7F8F9F10
19800.0400.0700.0540.0640.0550.0590.1290.0870.1340.002
19850.0960.1340.1210.1370.1190.1280.2790.1850.2860.009
19900.1650.2040.1980.2160.1890.2020.4080.2870.4480.025
19950.2430.2800.2790.3030.2690.2850.5320.3950.6080.041
20000.3270.3310.3600.3920.3580.3770.6420.5010.7640.082
20050.4130.3730.4420.4760.4550.4800.7440.6010.9200.180
Table 8. The accumulated quantile vectors for Hokkaido for Prefecture Profile Data I.
Table 8. The accumulated quantile vectors for Hokkaido for Prefecture Profile Data I.
HokkaidoF1F2F3F4F5F6F7F8F9F10
19800.0400.1100.1640.2280.2820.3410.470.5570.6910.693
19850.0960.230.3510.4880.6070.7351.0151.2001.4861.495
19900.1650.3690.5670.7830.9721.1741.5821.8692.3172.342
19950.2430.5240.8021.1061.3741.6592.1922.5873.1953.236
20000.3270.6581.0181.4101.7682.1452.7873.2884.0524.134
20050.4130.7861.2281.7032.1582.6383.3823.9834.9035.083
Table 9. The principal components for Prefecture Profile Data I.
Table 9. The principal components for Prefecture Profile Data I.
SpearmanPc1Pc2Pc3
Eigenvalues9.0350.6510.161
Contribution (%)90.3546.5141.612
EigenvectorsPc1Pc2Pc3
F10.331−0.0570.077
F20.327−0.119−0.256
F30.331−0.087−0.121
F40.331−0.078−0.094
F50.331−0.030−0.055
F60.322−0.0260.112
F70.2110.9540.152
F80.3280.0410.180
F90.325−0.039−0.363
F100.306−0.2330.838
Table 10. Seven different data tables for Prefecture Profile Data II.
Table 10. Seven different data tables for Prefecture Profile Data II.
TableFeatureDescription
1Number of employed persons (19 job types, 1000 people)1. Agriculture and forestry; 2. Fisheries; 3. Mining and quarrying of stone and gravel; 4. Construction; 5. Manufacturing; 6. Electricity; gas, head supply, and water; 7. Information and communications; 8. Transport and postal activities; 9. Wholesale and retail trade; 10. Finance and insurance; 11. Real estate and goods rental and leasing; 12. Scientific research, professional, and technical services; 13. Accommodations, eating, and drinking services; 14. Living-related and personal services and amusement services; 15. Education, learning support; 16. Medical, healthcare, and welfare; 17. Compound services; 18. Services (not elsewhere classified); 19. Government, expect elsewhere classified.
2Nominal GDP(JPY 10 billion)
3Temperature1. Minimum temperature; 2. Maximum temperature
4Area(Square kilometer)
5People in 18 age categories1. [0, 4], 2. [5, 9], 3. [10, 14], 4. [15, 19], 5. [20, 24], 6. [25, 29], 7. [30, 34], 8.[35, 39], 9. [40, 44], 10. [45, 49], 11. [50, 54], 12. [55, 59], 13. [60, 64], 14. [65, 69], 15. [70, 74], 16. [75, 79], 17. [80, 84], 18. [85-]
6Number of people1. Birth; 2. Death; 3. Marriage; 4. Divorce.
7Number of householdsPrivate household: 1. Number of households; 2. Number of household members. Industrial households: 3. Number of households; 4. Number of household members.
Table 11. Eleven features described by five quantile values.
Table 11. Eleven features described by five quantile values.
FeatureDescription
F11. Agriculture and forestry; 2. Fisheries; 3. Mining and quarrying of stone and graves; 4. Construction; 5. Manufacturing
F21. Electricity, gas, heat supply, and water; 2. Information and communications; 3. Transport and postal activities; 4. Wholesale and retail trade; 4. Finance and insurance;
F31. Real estate and goods rental and leasing; 2. Scientific research, professional, and technical services; 3. Accommodations, eating, and drinking services; 4. Living-related and personal services and amusement services; 5. Education, learning support;
F41. Medical, healthcare, and welfare; 2. Compound services; 3. Services (not elsewhere classified); 4. Government, expect elsewhere classified; 5. Dummy
F51. Nominal GDP; 2. Minimum temperature; 3. Maximum temperature; 4. Area; 5. Dummy.
F61. [0, 4], 2. [5, 9], 3. [10, 14], 4. [15, 19], 5. [20, 24]
F71. [25, 29], 2. [30, 34], 3. [35, 39], 4. [40, 44], 5. [45, 49]
F81. [50, 54], 2. [55, 59], 3. [60, 64], 4. [65, 69], 5. [70, 74]
F91. [75, 79], 2. [80, 84], 3. [85-], 4. Dummy, 5. Dummy.
F101. Birth; 2. Death; 3. Marriage; 4. Divorce; 5. Dummy.
F11Private household: 1. Number of households; 2. Number of household members. Industrial households: 3. Number of households; 4. Number of household members; 5. Dummy.
Table 12. The (0–1) normalized quantile vectors for Hokkaido for Prefecture Profile Data II.
Table 12. The (0–1) normalized quantile vectors for Hokkaido for Prefecture Profile Data II.
QVF1F2F3F4F5F6F7F8F9F10F11
10.2700.1240.0430.1660.0810.0750.0590.0890.1640.0860.117
20.5400.1440.0780.4550.0810.1590.1200.1960.3490.2100.243
30.7760.2470.1640.5800.0810.2500.1480.2930.5480.2820.259
40.9460.3340.2480.8060.5160.3340.2510.3830.5480.3960.587
51.0000.3740.3310.8060.5160.3980.3210.4760.5480.3960.587
Table 13. The accumulated quantile vectors for Hokkaido for Prefecture Profile Data II.
Table 13. The accumulated quantile vectors for Hokkaido for Prefecture Profile Data II.
QVF1F2F3F4F5F6F7F8F9F10F11
10.2700.3940.4370.6030.6840.7590.8180.9071.0701.1561.273
20.5400.6840.7631.2171.2981.4581.5781.7742.1232.3332.576
30.7761.0231.1871.7671.8482.0972.2812.5743.1223.4053.663
40.9461.2801.5282.3342.8503.1843.4343.8174.3654.7615.349
51.0001.3741.7052.5113.0273.4253.7464.2224.7705.1675.754
Table 14. The principal components for the Prefecture Profile Data II.
Table 14. The principal components for the Prefecture Profile Data II.
SpearmanPc1Pc2Pc3
Eigenvalues10.1550.6600.146
Contribution (%)92.3226.0031.325
EigenvectorsPc1Pc2Pc3
F10.2490.699−0.529
F20.2960.3870.187
F30.3040.2560.334
F40.3050.1720.487
F50.305−0.196−0.411
F60.307−0.211−0.270
F70.308−0.217−0.179
F80.309−0.207−0.082
F90.310−0.1790.057
F100.310−0.1760.128
F110.309−0.1720.188
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ichino, M.; Umbleja, K.; Yaguchi, H. Visualization and Analysis of Three-Way Data Using Accumulated Concept Graphs. AppliedMath 2024, 4, 1162-1180. https://doi.org/10.3390/appliedmath4030062

AMA Style

Ichino M, Umbleja K, Yaguchi H. Visualization and Analysis of Three-Way Data Using Accumulated Concept Graphs. AppliedMath. 2024; 4(3):1162-1180. https://doi.org/10.3390/appliedmath4030062

Chicago/Turabian Style

Ichino, Manabu, Kadri Umbleja, and Hiroyuki Yaguchi. 2024. "Visualization and Analysis of Three-Way Data Using Accumulated Concept Graphs" AppliedMath 4, no. 3: 1162-1180. https://doi.org/10.3390/appliedmath4030062

Article Metrics

Back to TopTop