*5.1. Case 1: Exploration of Multivariate Patterns*

Using AirInsight, the analyst, who is an environmental expert, started by exploring the multivariate patterns of air quality data in China. To visually perceive preliminary patterns, he first looked at the projection view and employed the point densities to encode the point colors (Figure 9a, Scatter mode). After examination, he found a mass composed of points that are grouped together more tightly in the top-left corner, around the *O*<sup>3</sup> symbol (Figure 9a, Point cloud I). Also, some points are plotted closely together and far from the *O*<sup>3</sup> symbol in the bottom-right corner, where the other attribute symbols are distributed (Figure 9a, Point cloud II). He thus inferred that the above two groups of data points possess distinct multivariate features. At the same time, he realized that there is a low correlation between *O*<sup>3</sup> and the other attributes because of the enormous spatiotemporal difference.

In order to further inspect specific differences and identify more intricate patterns, the analyst set the color mapping scheme to reflect the clustering results. He quickly found that there are two clusters, *C1* and *C2* (Figure 12a), located in the peripheries of the attribute symbols, similar to the previous findings. Next, he used AQI values to map point colors to preliminarily distinguish these two clusters. The samples in *C1* exhibit higher AQI values and more severe air pollution, while all the samples in *C2* are indicative of better air quality. To further compare these two clusters in detail, the analyst successively screened out and brushed them, and he then checked other linked views. Then, the popup radar chart (Figure 12b,c), map view (Figure 12d,e), and trend view (Figure 12f,g) displayed their multivariate spatiotemporal contexts. He observed apparent distinctions between the two following clusters:


As an environmental expert, the analyst confirmed these findings. He explained that the production of *O*<sup>3</sup> is closely related to solar radiation, so *C2* cases are more common in summer and in some sun-intensive areas. The other kinds of pollutants are mainly derived from the burning of fossil fuels and are produced in large amounts, particularly during the period in which central-heating is frequently used in cold areas. This explains why the air pollution in northern China in winter is especially serious.

**Figure 12.** Multivariate patterns and corresponding spatiotemporal context. (**a**) Projection view after brushing *C1* and *C2*. (**b**) Multivariate patterns of *C1*. (**c**) Multivariate patterns of *C2*. (**d**) Spatial context of *C1*. (**e**) Spatial context of *C2*. (**f**) Temporal context of *C1*. (**g**) Temporal context of *C2*.

### *5.2. Case 2: Finding and Understanding Temporal Anomalies*

The analyst decided to identify some hidden temporally anomalous events and explore the specific difference by tracing the whole time-varying process. First, he checked the abnormity classification view (Figure 8) and found an interesting point with a high TD value. The analyst chose the point representing the 3rd timestamp of Nanning, which has a high TD value (0.565), and he then switched the projection view into glyph mode (Figure 13). At the same time, he set the colors of A-Shields to reflect the attribute variations relative to their last timestamps. From the R-shields in Figure 13, the analyst observed three regular patterns:


By tracing the temporal links, the analyst found an apparent recurrent state between *R2* and *R3*. Combining this observation with the previous analyses, he was aware that this recurrent state is caused by the transition from spring to autumn. Since all the samples integrated into these three R-shields have low TD and low GS values, the analyst defined them as "ordinary samples". Hence, he inferred that the cities in the same area as Nanning have similar regular patterns.

**Figure 13.** Projection view in glyph mode after selecting Nanning.

To locate temporal anomalies, the analyst continued his exploration by inspecting the A-Shields with high TD values, which often lead to the occurrence of saltation states and unstable states. By examining the inner color of A-Shields and sequential temporal lines, he became interested in *A1*, which has a high TD value and is linked to *R1* by a long curve. This indicates a typical saltation state in which air quality changes dramatically at the 3rd timestamp. After observing it more closely, he discovered that the color of the outer sector that represents *PM*2.5 in *A1* is deep green, which means that *PM*2.5 pollution decreases considerably compared with the last timestamp. By looking up the relevant climatic information, he found that there is a thicker temperature inversion layer in Nanning at the 1st and 2nd timestamps. A thick temperature inversion layer prevents atmospheric convection and can lead to air pollution. The arrival of cold air at the 3rd timestamp breaks the condition and disperses the fog and haze.

Beyond that, the analyst also found two sequential saltation states between *A2* and *R2*: the air quality changes dramatically and then immediately returns to the original condition. The outer sectors in *A2* are colored in different shades of orange, which suggests that all the attributes are elevated at the 13th timestamp, while *R2* reflects much better air conditions. From this, the analyst inferred that an unexpected pollution event occurs in short bursts at the 13th timestamp. Since this timestamp has a low GS value and is identified as a "susceptible sample", it serves as a reminder to domain experts who analyze pollution causes to not only consider Nanning's own factors but also account for the impact of surrounding cities.

One unstable state exists between *A3* and *A4*. During this period, *PM*2.5 increases at the 27th timestamp, decreases at the 28th timestamp, and stabilizes after the 29th timestamp. The analyst referred to the calendar and realized that this period is around Chinese New Year, and he deduced that the observed increase in air pollution is the result of excessive burning of fireworks and firecrackers.

#### *5.3. Case 3: Exploration of Geographic Anomalies*

AirInsight also allows users to explore geographic anomalies, which can help domain experts analyze the causes of pollution in different cities from a macro perspective. The analyst continued to scrutinize the abnormity classification view (Figure 8). He was interested in the sample representing the 17th timestamp for Ordos, which is a typical "insusceptible sample" that has a high GS value of 0.21 and a low TD value of −0.75. Thus, he regenerated the projection view to focus on Ordos (Figure 14a).

**Figure 14.** Projection view of glyph mode and trend view for Ordos. (**a**) Projection view; (**b**) Trend view; (**c**) Radar chart.

To investigate the reason for the unique performance of the 17th timestamp, he clicked the red point on the corresponding grid. As a result of this action, the mean condition of cities in the same area was shown as additional outer gray sectors (Figure 14 *G1*). By comparing the heights of the gray sectors with those of the original sectors, he realized that this geographical anomaly is caused by lower values of *PM*2.5, *PM*10, and *SO*2. Then, the analyst further examined the whole view and quickly observed that most timestamps are incorporated into two nearby R-Shields. This means that the air quality in Ordos is less volatile. Hence, he surmised that the disparities between Ordos and its adjacent cities are not the result of accidental events.

To verify this hypothesis, he reduced the GS threshold to filter more samples that are less abnormal. After updating the marks of the samples that exceed the threshold, he discovered another two anomalous events (Figure 14 *G2* and *G3*) and then clicked the red to further explore the specific abnormal manifestations. Surprisingly, adjacent cites in *G2* possess more serious *PM*2.5 pollution and better *O*<sup>3</sup> conditions. Meanwhile, cities in *G3* have higher *NO*<sup>2</sup> and *SO*2. According to the process of analysis, he conjectured that Ordos has better air quality over prolonged periods compared with its adjacent cites.

To confirm this, he brought up the highlighted trend view (Figure 14b) to explore the degree of anomalous geographical conditions over the entire timeline. Observing this view presents a satisfactory result: even though all city lines have similar variation trends, the line for Ordos is always above the others, especially in winter. He further combined the analysis with the radar chart (Figure 14c) and found that the cluster labels at the top of the trend view have higher pollution values. Therefore, he ultimately drew the conclusion that the air in Ordos remains clean in the long term and is better than that in other cities in the same area.

#### *5.4. User Evaluations*

The system received positive feedback from the environmental expert at Northeast Normal University. Among all the modules, the expert found that the linkage of the projection view, map view, and trend view was the most useful. He acknowledged that the visual analysis that fuses temporal, spatial, and multivariate perspectives was indispensable for drawing his conclusions. In addition, summarizing the multidimensional time-varying laws of an individual city by using glyphs makes the system far stronger than other general-purpose software. He also agreed that it was easy to explore the cities and months of interest using the interactive operations, especially brushing. Apart from these remarks, the expert also appreciated the effectiveness of the abnormity classification view. He believed that this is a novel direction of air quality data analysis, and it inspired him to further analyze the causes of pollution on the basis of detected anomalies.

To the best of our knowledge, a fully quantitative comparison between AirInsight and other baseline systems is not feasible because few studies [12,16] have focused on the comprehensive exploration of regular patterns and anomalies in air quality data, and these few studies have objectives that differ from the goal of AirInsight. In order to further evaluate the effectiveness and powerfulness of our visual system, we developed a simple system as a baseline to compare with AirInsight. As shown in Figure 15, the simple system integrates a map view and a line chart, both of which are common conventional visualization methods. The map view plays the same role as it does in AirInsight by showing the geographical location of the studied data. The line chart shows the temporal variation in the monthly mean values, with each line representing one pollutant in one location. After clicking a city of interest, the line chart will also focus on the selected city. As an example, Figure 16 presents the variation in air quality for Changchun.

**Figure 15.** The simple system integrating a map and line chart.

**Figure 16.** The line chart of Changchun.

We performed a task-oriented user study with 13 participants (5 females and 8 males). All the participants are graduate students and have little knowledge of air pollution. The tasks are as follows:


After a brief introduction to the AirInsight system and the simple system, the participants were asked to freely explore them individually and complete the tasks. The comparison results are as follows:


In summary, our system can help users to obtain the features of air quality data effectively.
