*4.4. Visual Analytic System*

Integrating the above analysis methods leads to the proposed visual analytic system, AirInsight (Figure 9), which consists of three main views: (a) projection view, (b) trend view, and (c) abnormity classification view. These three views can facilitate the analyses of most requirements. Further, (d) map view and (e) radar view are also provided to display supporting information. We also provide a control panel to manage color mapping schemes and filter interesting data points. In this section, we describe the designs of the three main views, as well as the rich interactions provided in AirInsight.

**Figure 9.** Analyzing regular patterns and anomalies in air quality data using AirInsight. (**a**) Projection view. (**b**) Trend view. (**c**) Abnormity classification view. (**d**) Map view. (**e**) Radar view. (**f**) Control panel.

### 4.4.1. Projection View

The design of the projection view is based on the layout described in Section 4.1. It includes two modes: (1) scatter mode, which is devoted to providing an overview of data and attributes, and (2) glyph mode, which contains multiple linked glyphs to show the time-varying process of a chosen city with rich context.

**Scatter Mode**: As shown in Figures 4b and 9a, we use large gray symbols with different shapes to represent attribute points, while the remaining small round points represent sample points. For the scatter diagram, color is an essential visual encoding channel. Our design includes three color mapping schemes for sample points:


**Glyph Mode**: To acquire a deeper insight into temporal variations in the air quality of a city of interest, we designed two artistic glyphs to visually summarize the regular patterns and emphasize the abnormal timestamps, as shown in Figure 10. These are named R-Shield (Figure 10a) and A-Shield (Figure 10c), respectively. After obtaining the clustering results, R-Shield is assigned for each cluster if the following two timestamp characteristics are present: (1) the TD value of a timestamp equals −1, as well as that of its time neighborhood; (2) the groups consist of at least two continuous timestamps. The remaining discrete timestamps and outliers extracted by the clustering process are treated as A-Shields.

**Figure 10.** Design of glyphs.

The glyphs encode four fundamental metrics: attribute values, time clues, TD, and GS. A-Shield comprises two parts: the inner circle and the outer sectors (Figure 10c). The inner circle's color depicts TD, where red represents a high value and green represents a low value. The number marked in the center indicates the index of the current timestamp. At the same time, there are six outer sectors corresponding to the six kinds of pollutants, and their heights represent the deviations between the attribute values and the mean AQI of the current city. This mean value is presented as a stable circular baseline. When there is a positive deviation, the sector protrudes outwardly. Conversely, the sector is inwardly recessed if there is a negative deviation. The major pollutant can be found by comparing the sector heights and finding the most outwardly protruding sector.

R-Shield (Figure 10a) is designed by extending A-Shield, and it contains information for multiple timestamps. Its size indicates the number of the included timestamps. The interior of R-Shield is a spiral heatmap (Figure 10b), whose radial axis and angular axis represent years and months, respectively. The color of each grid encodes the corresponding TD value. In addition, the wavy outer sectors depict the attribute deviations of all the timestamps included in that specific R-Shield.

We provide two options for the color scheme of the outer sectors. One option is based on attribute deviations (Figure 10a,c). This facilitates the comparison between two glyphs and the recognition of abnormal attributes. However, it is not intuitive enough when glyphs with dissimilar sizes and scattered locations are compared only by their heights. The other option is based on the attribute variations in A-Shield compared with its last timestamp (Figure 10d), while the colors in R-shield are green by default. This mechanism can emphasize variation in an attribute over time.

Additionally, abnormal timestamps from a geographical perspective are highlighted by red dots on timestamps whose GS value exceeds a specified threshold (Figure 10e). The threshold can be set from the control panel. After clicking one red dot, additional gray sectors appear along the circular baseline; these new sectors represent the mean values of other cities in the same geographical area at the current timestamp. This function can help users to recognize the causative pollutants of the anomaly in detail.

Visual clutter is a potential drawback of this design, and it is common for glyph-based visualizations. To solve this problem, a force-directed collision detection method is implemented to separate overlapping glyphs. In addition, when the number of timestamps of a glyph exceeds a user's visual endurance, they can hover over the glyph to enlarge the spiral heatmap and bring it to the foreground.

State transitions [46–48], which are important features in visual analysis, are usually extracted and explored in the form of a node-link diagram. For example, Natalia et al. [49] designed state transition graphs for the semantic analysis of movement behaviors. In order to analyze state transitions of air quality, we use Bezier curves to connect scattered glyphs (Figure 10a, Glyph mode) on the basis of the time sequence, and a curve's color transition (from green to yellow) indicates the direction of the time flow. These curves preserve the continuity of time and show the transitions between the stable state, unstable state, recurrence state, and saltation state in a time-varying process (Figure 11).

**Figure 11.** Typical states in a time-varying process.

The stable state (Figure 11a), which is indicated by the absence of lines in the layout, reflects a stationary time-varying process since all the timestamps belong to a single R-Shield. On the contrary, a line connecting two A-Shields indicates an unstable state (Figure 11b), which reveals that the air quality changes from one anomalous condition to another anomalous condition. Moreover, the recurrence state (Figure 11c) signifies that a loop exists between two regular patterns and indicates the periodicity rules. Dramatically changing data define the saltation state (Figure 11d), which is a noteworthy turning point that prompts analysts to look into causes.

#### 4.4.2. Trend View

The trend view (Figure 9b) displays the temporal trend of different cities using the clustering results and the distributions of multivariate patterns among different timestamps. Similar to the traditional Parallel coordinate plot (PCP) [17], the successively placed axes represent continuous timestamps. The ticks on the axes are the cluster labels, and each line represents a city. By tracing and comparing the line trends, we can observe the temporal variations in the air quality of a city. However, when the lines become dense, it is hard to pinpoint the axis tick that contains the largest amount of data or, in other words, the tick that is representative of the major multivariate pattern at a specific timestamp. To make up for this limitation, we added a gray bar to each tick whose width encodes the number of passing lines. Therefore, by tracing the bars on the same tick of all axes, we can find the cyclical temporal rule of this pattern.

#### 4.4.3. Interactivity

We provide the following interactive functions that allow users to switch between different temporal/spatial contexts and draw in-depth conclusions that are based on linked multiviews.

**Context switching**. We provide interfaces for users to switch color mapping schemes. As described in Section 4.4.1, there are various schemes for the scatters or glyphs in the projection view. In addition, the points in the map view can be colored according to the values of any attribute for a specific month; this scheme displays the geographical distribution of air quality.

**Filtering**. When users want to check the air quality samples for a specific situation, they can set various conditions and filter out scatters in the projection view. AirInsight offers multiple selectors and range sliders that allow the user to jointly filter the data on the basis of both spatiotemporal information and statistical indicators, such as geographic labels, clustering results, and so on.

**Brushing**. AirInsight supports the linkage of projection view, trend view, and map view by brushing. When sample points are brushed in the projection view, a hovering radar chart (Figure 9e) showing the attribute values of the selected samples will appear. Simultaneously, the sizes of the marks in the map view will change, along with the number of brushed samples in corresponding cities. The trend view highlights the lines and time axes of a specified city. A bar is included above each time axis, and the bar's height encodes the number of samples related to this timestamp. Apart from this, when the city lines in trend view are brushed, the projection view highlights the sample points of the selected cities, and the map view highlights the corresponding city marks.

**Focusing**. Users are able to focus on a city of interest and perform detailed inspections. When users click a city mark in the map view, the glyphs will appear in the projection view as time curves are drawn dynamically. At the same time, the trend view will highlight this city and erase lines that are not geographically adjacent, and circles are added to the time axes in colors that encode the GS value of the corresponding timestamps (Figure 9b).

#### **5. Case Studies and User Evaluations**
