*3.1. Data*

#### 3.1.1. Data Sources and Preprocessing

This article draws its raw data from sources used in [1]. Thomson Reuters' DataStream provided price data on a range of precious metals, base metals, energy commodities, and agricultural commodities. Specifically, this article relies upon daily prices from 18 September 2000 through 31 July 2020 for gold, silver, platinum, palladium (precious metals); copper, zinc, tin, lead, nickel, aluminum (base metals); Brent, West Texas intermediate crude (WTI), gasoil, gasoline (energy commodities); and palm oil, wheat, corn, soybeans, coffee, cocoa, cotton, and lumber (agricultural commodities).

The preprocessing pipeline took two further steps. Transforming daily prices into continuous logarithmic returns shortened all series by a single day: 18 September 2000. The resulting log return data (as well as the conditional volatility data derived from log returns) therefore covered the period from 19 September 2000 to 31 July 2020. Two additional days were excluded. On 20 April 2020, WTI closed at –37.63. This event rendered it

mathematically impossible to calculate the log return for WTI on that day and the next, 21 April 2020. Those two trading dates were also omitted.

The second preprocessing step involved forecasts of conditional volatility from log returns. We calculated the conditional, time-variant volatility for all 22 commodities according to a GJR-GARCH(1, 1, 1) process using Student's *t* distribution [1,164]. The mathematical underpinnings of GJR-GARCH(1, 1, 1) have been thoroughly documented [165,166]. GJR-GARCH outperforms alternative time-series models in forecasting financial markets [167].

For purposes of analysis and discussion, we aggregated log return and volatility data according to a precalculated ontology of commodity markets. The vocabulary of commodities trading distinguishes between mined, nonrenewable "hard" commodities (such as metals and fossil fuels) and grown, renewable "soft" commodities [20,88,168,169]. The term "soft" is sometimes reserved for tropical crops such as cocoa, coffee, and sugar [170]. We adopt that narrower definition of "softs" and describe the temperate commodities of wheat, corn, and soybeans as "crops." Because cotton and lumber span tropical and temperate climates, these commodities can be assigned to either agricultural subcategory. Results from the clustering of log returns support the classification of cotton and lumber as tropical or semitropical softs [1].

These distinctions, paired with traditional divisions among metals and fuels, can be summarized as a traditional ontology of commodities trading:


3.1.2. Visualizations of Logarithmic Return and Conditional Volatility Data

This subsection visualizes this article's core data. Although log return and conditional volatility calculations were performed on all 22 commodities, this article compares only energy-related commodities with one another on an individual basis. This article compares crude oil and refined fuels as an asset class alongside the aggregate categories for metals and agricultural commodities.

Figure 2a depicts cumulative log returns for commodities as asset classes. Relative to other classes, energy-related commodities show many sharper price movements. Figure 2b illustrates cumulative log returns for individual crude oil and fuel markets. Although comovement among individual oil and fuel markets is far tighter (as one should expect) than among broad classes of commodities, sharper upward and downward price spikes, particularly for gasoline, are evident to the naked eye.

(**a**) Log returns: All commodities (**b**) Log returns: Energy-specific commodities

**Figure 2.** Cumulative logarithmic returns: (**a**) All classes of commodities; (**b**) Crude oil and refined fuel markets.

Figure 3a,b depict conditional volatility. By analogy to Figure 2a,b, Figure 3a portrays the five broad classes of commodities, while Figure 3b focuses on the four individual energyrelated markets. Visibly greater volatility in energy markets dominates Figure 3b. Relative to crude oil markets and even gasoil, the market for gasoline is palpably more volatile. These acute volatility spikes confirm the intuition motivating the conventional exclusion of food and fuel prices from core inflation indices used in the making of macroeconomic policy [171–174].

(**a**) Conditional volatility: All commodities (**b**) Conditional volatility: Energy-specific commodities 

**Figure 3.** Conditional volatility forecasts: (**a**) All classes of commodities; (**b**) Crude oil and refined fuel markets.

#### *3.2. Clustering Methods*

#### 3.2.1. General Considerations

Many applications within economics and finance exploit clustering and related forms of unsupervised machine learning [175–178]. This article applies five clustering methods: Spectral, mean-shift, affinity propagation, *k*-means, and hierarchical agglomerative clustering. Each of these methods is available in the SciKit-Learn package for Python. The implementation of hierarchical agglomerative clustering in Scipy generated visually distinctive dendrograms for that method.

Previous research had established that temporal clustering should be based on conditional volatility rather than logarithmic returns [1]. All five clustering methods were applied to volatility data arrayed in *n* rows of trading days and *p* columns corresponding to the number of distinct commodity markets. For the full volatility array covering all 22 commodities, *p* = 22. For the energy-specific subarray, *p* = 4. The two arrays, however, had the same number of trading days: *n* = 5182.

For both the full 5182 × 22 array and the energy-specific 5182 × 4 subarray, clustering results underwent a crude aggregation inspired by voting classifiers in machine learning [179]. Since clustering of the full 5182 × 22 array reached rough consensus on the financial crisis of 2008–2009 and the COVID-19 pandemic as the two periods of interest, that analysis relied on the union and the intersection of the five sets of clustering results. Using the union of sets is tantamount to allowing a single vote to drive a positive result. The intersection of those sets indicates unanimity. These set theory concepts therefore define the logical extremes of voting methodologies [180,181].

Greater variability in the results for the energy-specific 5182 × 4 subarray required a more flexible approach. For that array, this article aggregated all positive results registered by two or more of the five clustering methods. The most generous voting method, consisting of the union of all positive results, generated a wider range of dates. Though unexamined in this article, those results remain available for future research.

The balance of this subsection will describe each of the five clustering methods.

#### 3.2.2. Spectral Clustering

Spectral clustering operates on a projection of the normalized Laplacian [182,183]. Since this article's conditional volatility arrays represent 4 or 22 commodity markets as simple functions of a common vector of trading dates, the Laplacian (Δ*f* = ∇<sup>2</sup>*f*) is the sum of the partial second derivatives for each of those variables.

Spectral clustering should work very well with financial data. This method exposes individual clusters within highly non-convex structures [184,185]. Since each volatility vector is plotted against the same vector of trading dates, the resulting arrays of volatility forecasts by date are tantamount to overlapping curves on a two-dimensional plane. Spectral clustering therefore excels precisely where conventional statistical measures of central tendency and variability fail to describe the shape of the data to be clustered.

These properties have made spectral clustering especially popular in computer vision and image processing [186,187]. The ability of spectral clustering to detect blobs and edges suggests potential success with economic time series. In mathematical terms, image and time-series data are quite similar. Unlike documents that have been vectorized for natural language processing, these data sources consist of perfectly dense arrays whose columns observe the same scale, or at least nearly so. Still images and simple, harmonized arrays of economic time series can be rendered in a nominally two-dimensional format.

Spectral clustering generated the fewest discrete clusters. Consequently, the spectral method may be regarded as setting the most conservative clustering baseline.

#### 3.2.3. Mean-Shift Clustering

An extension of more traditional pattern-recognition algorithms, mean-shift clustering uses nonparametric techniques to identify deviant blobs in an otherwise smooth space [188]. Alongside *k*-means, mean-shift is one of two centroid-based methods in this article. The distinctive process that gives mean-shift its name relies on a recursive updating of potential centroids that would represent the mean of the points within a given region. A final postprocessing stage eliminates near-duplicates before reporting the final list of centroids. Hybridizing the mean-shift method with agglomeration can reduce the computation cost of mean-shift clustering [189].

#### 3.2.4. Hierarchical Agglomerative Clustering

Hierarchical clustering methods decompose and arrange mathematical objects according to dendrograms, or trees expressing phylogenetic relationships [190–192]. The agglomerative method begins from the "bottom" of a dataset and combines instances into clusters until all data has been assigned to a single, overarching cluster [193].

Bottom-up agglomeration is less computationally demanding than top-down division [194,195]. Four methods for computing distances in hierarchical clustering are widely used: Ward's method and single-, average-, and complete-linkage [196–199].

In economics and finance, hierarchical clustering has evaluated stock markets [200,201], buildings and real estate [202,203], broader financial indicators [204], and the relationship between financial markets and the real economy [177]. Hierarchical clustering of cryptocurrency markets [205] intensifies the urgency of research into this asset class during market turbulence [206].

One source has used hierarchical clustering to identify correlation patterns similar enough to comprise distinct market states [207]. Aside from our own work [1] and the use of multidimensional scaling to evaluate comovement among commodities during subjectively defined crises [164], this application of hierarchical clustering represents the most extensive effort to classify periods in financial history through unsupervised machine learning.

#### 3.2.5. Affinity Propagation

Affinity propagation identifies typical cluster members by exchanging quantitative messages between data pairs until the algorithm converges on a high-quality set of exemplars [208–210]. This property distinguishes affinity propagation from mean-shift and *k*-means clustering, which are centroid-based methods.

Under SciKit-Learn's default settings, however, affinity propagation generates far too many distinct exemplars. To the extent that other methods (specifically spectral, mean-shift, and hierarchical clustering) can better estimate the optimal number of clusters, an instance of affinity propagation can alter the element preference from its default value of the median of the array of input similarities [211]. To a limited extent, this adjustment enables affinity propagation to alter the number of clusters that it finds.

Affinity propagation spans an impressive range of applications. Affinity propagation is used to cluster microarray and gene expression data [212–214] and in sequence analysis [215]. Applications beyond bioinformatics [216] include natural language processing [217–219] and computer vision [220,221]. Especially if calibrated so that element preference yields something close to the optimal number of exemplars, this versatile clustering method should accommodate financial time series.

#### 3.2.6. *k*-Means Clustering

One of the oldest clustering algorithms [222], *k*-means clustering remains a popular way to partition mathematical space [223]. *k*-means clustering excels in detecting fraud [224] and firms at risk of default or failure [225]. Other financial applications include the forecasting of returns and the managemen<sup>t</sup> of investment risk [176,226–228]. Our own previous research on commodity markets relied heavily on *k*-means clustering [1].

*k*-means clustering does require more careful handling. More than other methods, kmeans clustering depends on algorithms for determining the ideal number of clusters [229,230]. In addition to *k*, the optimal number of clusters, this centroid-based method depends entirely on randomized instantiation [231]. To ensure replicability of results, this article seeded SciKit-Learn's pseudo-random number generator with the value of 1. Finally, *k*-means clustering cannot detect objects lacking a hyper-ellipsoidal shape [232].

#### *3.3. t-Distributed Stochastic Neighbor Embedding (t-SNE)*

This article uses a single method of manifold learning: *t*-distributed stochastic neighbor embedding, or *t*-SNE [233–235]. *t*-SNE reduces distances between similar instances and maintains distances between dissimilar instances. Although this article applies *t*-SNE solely for visualization, *t*-SNE can be a valuable form of unsupervised learning on its own. Preprocessing with *t*-SNE can detect and remove outliers in preparation for the application of convolutional neural networks to computer vision [236].

#### **4. Results, Part 1: Temporal Clustering**

Clustering results differ dramatically according to the underlying array of conditional volatility forecasts. This section accordingly separates results for the full 5182 × 22 array of all commodities from results based on the smaller 5182 × 4 energy-specific array.

Differences among clustering methods are also stark. Clustering differs from classification through supervised machine learning in a crucial respect. Clustering results do not correspond to *a priori* labels assigned by a human. Analyst judgment therefore plays a subtler role. Each clustering method must be evaluated on its own terms. Moreover, each method's results must be evaluated in light of all others and against the backdrop of unavoidably subjective judgment. Each method's underlying mathematics, however, offers principled guidance on the exercise of that discretion.

#### *4.1. Temporal Clustering of the Full Array of Conditional Volatility Forecasts*

#### 4.1.1. The Naïve Biennial Baseline

The naive clustering of all 20 years of commodities trading data provides a valuable starting point. Consider the possibility that a fixed and predetermined period of time should define each segmen<sup>t</sup> of financial interest. This hypothetical is far from absurd; monthly, quarterly, or annual reporting slices financial time in precisely this way. In the interest of convenience, we select intervals of two calendar years each.

Figure 4 establishes a visual baseline for all temporal clustering. Consistent reliance on *t*-SNE to reduce all 22 dimensions produces a uniform three-dimensional projection of conditional volatility forecasts. Synthetic centroids generated by the average of all observations for each biennium supply a rough sense of those two years.

**Figure 4.** Naïve, biennially defined clusters of trading days in commodity markets.

Cluster 9 is particularly interesting because the 2019–2020 biennium includes the global maximum for cumulative log returns on precious metals and the global minimum in cumulative log returns on oil and fuels. That cluster's synthetic centroid falls very near the global center. Its corresponding observations, in cyan, stretch across the financial firmament, as measured by its width across the zeroth *t*-SNE dimension.

Expanding all spheres from Figures 4 and 5 reveals the futility of arbitrary biennial clusters. If spherical radii corresponding to each cluster define the mean distance of each observation from its corresponding synthetic centroid, then the size of each sphere and its overlap with other spheres sugges<sup>t</sup> the extent to which each cluster is internally cogen<sup>t</sup> and externally distinct. Internal cogency, if present, should reveal itself through contiguous or nearly contiguous clusters in an ordered, one-dimensional projection along a temporal axis. An ordered, horizontal representation would indeed display 10 perfectly contiguous, nonoverlapping clusters. That is an artifact of the arbitrary definition of those clusters, however, and not any mathematical property captured by *t*-SNE.

**Figure 5.** Naïve, biennially defined clusters in commodity markets—with spheres representing the mean distance of each cluster's observations from its synthetic centroid.

#### 4.1.2. Spectral Clustering

Spectral clustering of all conditional volatility forecasts identifies eight clusters. Although this method does not generate centroids, finding the mean of each cluster's members in the three-dimensional *t*-SNE manifold produces synthetic centroids.

Figure 6 reveals the complete *t*-SNE manifold of spectral clusters. Clusters 1, 2, 3, and 5 appear in a tight group at upper left. Clusters 1 and 2 contain only two days each, while cluster 3 adds only nine more. The tiny size of these clusters is implied by their compactness.

**Figure 6.** Spectral clustering of commodity markets—A *t*-SNE (*t*-distriubted stochastic neighbor embedding) manifold.

Two other groupings also stand out. Clusters 4 and 7 occupy the lower foreground. Cluster 6 stands alone. As with clusters 1, 2, and 3, a tight radius implies that cluster 6 consists of a small number of days. Indeed, cluster 6 contains only 22 days.

The vast majority of trading days—4920 out of 5182—belong to cluster 0. The *t*-SNE manifold suggests that cluster 0 may be the fallback cluster representing ordinary trading days, when volatility levels do not substantially deviate from their central tendency.

The most useful representation of temporal clusters, of course, is the one plotted against the ordered vector of dates. Figure 7 reveals how the eight spectral clusters almost perfectly identify two critical periods of interest from 2000 to 2020. The height of the bars communicates categorical rather than ordinal or numerical information. Because of the fortuity that spectral clustering assigned the number 0 to the default, catch-all category, all clusters numbered 1 and above identify periods of interest.

**Figure 7.** Spectral clustering of commodity markets—An ordered timeline.

Spectral clustering identified the financial crisis of 2008–2009 and the COVID-19 pandemic. Almost miraculously, six of the remaining seven clusters are perfectly contiguous. Instances from cluster 5, though split by clusters 1, 2, and 3, joined those other clusters to form a continuum covering the beginning of the pandemic. Cluster 6 covers the final 22 days in the dataset. Whether those days belong with the earliest phase of the pandemic or instead indicate a transition toward noncritical cluster 0 may be inferred from the location of cluster 6 in Figure 6 as well as the statistical summary of each cluster.

The resolution of Figure 7, however, is not sharp enough to reveal additional insights. Cluster 5 consists of two subclusters separated by nearly 19 years. The earliest instances in cluster 5 occur in 25–28 September 2001, exactly two weeks after the terrorist attacks of 11 September 2001. The remaining 67 days in cluster 5 started in March 2020, coinciding with the outbreak of COVID-19 in Europe and North America. This represents evidence, however faint, that an event unequivocally related to energy markets might sway the commodities market as a whole.

#### 4.1.3. Mean-Shift Clustering

Mean-shift clustering generated results remarkably similar to spectral clustering. In certain respects, mean-shift clustering might be even more parsimonious.

Figure 8 identifies two periods of potential interest: The tight clump formed by clusters 2, 4, and 5 at left and the looser pair of clusters 1 and 3 at bottom. Because *t*-SNE manifolds are shaped by their underlying data, Figure 8 can be compared directly with other *t*-SNE manifolds. Figures 4–6 make it apparent that clusters 2, 4, and 5 correspond to COVID-19, while clusters 1 and 3 track the financial crisis of 2008–2009.

The ordered timeline in Figure 9 confirms these intuitions. Clusters 2, 4, and 5 indeed cover the COVID-19 pandemic. Notably, the final 39 trading days (9 June through 31 July 2021) fall within cluster 0. Mean-shift results sugges<sup>t</sup> that the final 22 days might be better classified as "ordinary" trading days rather than part of the COVID-19 crisis.

#### 4.1.4. Hierarchical Agglomerative Clustering

The visual signature of hierarchical clustering is the dendrogram. The dendrogram has the added benefit of offering principled guidance on the optimal number of clusters.

Figure 10 displays the dendrogram for hierarchical agglomerative clustering using Ward's method and Euclidean distances. The height of the branches offers guidance on the ideal number of clusters. In principle, the ideal number of hierarchical clusters may be as low as two. The height of the blue branches exceeds the vertical distance between any other set of splits. Splitting this dataset into two temporal clusters is tantamount to the binary classification between crises and ordinary (or non-critical) periods.

**Figure 8.** Mean-shift clustering of commodity markets—A *t*-SNE manifold.

**Figure 9.** Mean-shift clustering of commodity markets—An ordered timeline.

The dotted horizontal line in Figure 10 intersects five vertical branches. The comfortable vertical distance on either side of 75 implies that 5 is a near-optimal number, if we are unwilling to abandon multiclass in favor of binary clustering. In any event, the logic of agglomeration makes it easy to rearrange the five clusters as two.

**Figure 10.** Hierarchical agglomerative clustering of commodity markets—A dendrogram truncated after four levels with a horizonal cut indicating five clusters.

Hierarchical agglomerative clustering in Python can designate an arbitrary number of clusters, *k* ∈ [1, *n*]. Having determined *k* = 5, we can project the *t*-SNE manifold in three dimensions as well as the ordered timeline.

The three-dimensional *t*-SNE manifold of hierarchical clustering results differs in striking ways from its spectral and mean-shift counterparts. Figure 11 divides noncritical trading days more evenly among three clusters: 0, 1, and 2. Clusters 3 and 4 are the outliers. Cluster 3 surely represents the financial crisis, while cluster 4 captures COVID-19.

The ordered timeline in Figure 12 confirms the intuitive interpretation of the *t*-SNE manifold. Again, departures from ordinary trading are designated by higher-numbered clusters. The spike for cluster 3 coincides with the financial crisis, while cluster 4 rises during the COVID-19 pandemic.

#### 4.1.5. Affinity Propagation

The final two clustering methods, affinity propagation and *k*-means clustering, require more computation and discretionary judgment. These difficulties arise from a simple difference: Default settings for affinity propagation and *k*-means clustering generate a larger number of smaller clusters. Worse, many of those clusters cover non-consecutive days, despite their relatively small size.

Adjusting the element preference matrix enables affinity propagation to generate a desired number of exemplars. This trait of affinity propagation is not infinitely elastic. Nevertheless, a simple matrix of element preferences generated five clusters, the same value of *k* in hierarchical agglomerative clustering. Those element preferences consisted of the median (not mean) of each vector of volatility forecasts, uniformly scaled by −3000.

**Figure 11.** Hierarchical agglomerative clustering of commodity markets—A *t*-SNE manifold.

**Figure 12.** Hierarchical agglomerative clustering of commodity markets—An ordered timeline.

Figure 13 shows how closely affinity propagation, once nudged toward five clusters, resembles hierarchical agglomerative clustering. Critical days appear in clusters 2 and 4, which respectively define the financial crisis and the pandemic.

**Figure 13.** Affinity propagation of commodity markets—A *t*-SNE manifold.

Figure 14 places these clusters within an ordered timeline. Cluster 2, however, covers not only the financial crisis of 2008–2009 but also the three days immediately following cluster 4s definition of the pandemic. Consistent with other clustering results, this minor deviation from perfect contiguity suggests that volatility during the COVID-19 crisis drifted toward conditions characterizing the longer-lasting "great recession."

**Figure 14.** Affinity propagation of commodity markets—An ordered timeline.

#### 4.1.6. *k*-Means Clustering

This article's exercise in *k*-means clustering on all conditional volatility forecasts duplicates the temporal clustering in [1], with a salient difference: The value of *k*, now fixed at six, is the average number of clusters found by other methods (Figure 15). Conventional methods for optimizing *k* did not prove particularly satisfying. It remains possible to determine *k* through other clustering methods.

**Figure 15.** *k*-means clustering of commodity markets—A *t*-SNE manifold.

Like mean-shift clustering, *k*-means clustering relies on the stochastic instantiation of centroids. *k*-means clustering, however, generates the least contiguous and the least visibly cogen<sup>t</sup> set of clusters. Figure 16 reveals only two wholly contiguous clusters (1 and 4), which coincide with the financial crisis and the pandemic.

**Figure 16.** *k*-means clustering of commodity markets—An ordered timeline.

4.1.7. The Union and Intersection of Clustering Results for the Full Volatility Array

Though similar, these five clustering methods differ subtly, just enough to require human intervention. Some methods confine all results for the financial crisis or the pandemic to a single cluster. Others divide results among as many as four clusters. Affinity propagation associated three days after its COVID cluster with the earlier financial crisis.

Prior intuitions about any particular clustering method are just that: prior intuitions. The "no-free-lunch" theorem of machine learning posits that no single method can be expected to outperform others in every task [237]. Moreover, machine-learning ensembles typically outperform any individual model [238]. Some method of aggregating results from different clustering models seems advisable.

Elementary set theory provides a simple solution. The *union* of all clustering results identifies a critical period as long as *any* method assigns a date to a critical period. The *intersection* of those results demands agreemen<sup>t</sup> among *all* methods. Given the simplicity of finding agreemen<sup>t</sup> over exactly two periods—the financial crisis and the pandemic—these opposite extremes of any plausible voting algorithm define the range of answers.

Figure 17 depicts this simple voting algorithm's parsimonious results. The union of all results defines the financial crisis as 16 September 2008 to 24 April 2009. The intersection of those results narrows the timeframe so that it runs from 16 October 2008 to 17 March 2009.

The definition of the COVID-19 pandemic is likewise perfectly contiguous by either criterion. The union of results defines the COVID crisis as 10 March to 1 July 2020. The narrower intersection of those sets also begins on 10 March but ends on 26 May 2020.

**Figure 17.** Set theory aggregations of temporal clusters—An ordered timeline.

#### *4.2. Temporal Clustering of the Energy-Specific Array of Conditional Volatility Forecasts*

We now apply all five clustering methods to the energy-specific 5182 × 4 subarray of conditional volatility forecasts. The smaller size of this array nudges all methods toward finding more clusters. That property makes some clustering models more difficult to manage. On the other hand, the relative stability of clustering on the grand array of 22 commodities suggests that this suite of unsupervised machine-learning methods can be successfully extended to larger financial markets (including equity markets with hundreds or thousands of stocks) and to arrays of macroeconomic indicators.

Dispensing with the naïve clustering of observations by arbitrary two-year periods, we begin with spectral clustering and progress through all other methods.

#### 4.2.1. Spectral Clustering

Figure 18 reports spectral clustering results for the time periods within the subarray of energy-specific conditional volatility.

**Figure 18.** Spectral clustering of energy-related markets—A *t*-SNE manifold.

On the energy-specific subarray, as with the full array, spectral clustering is a very conservative method. It finds fewer and smaller clusters apart from a single large cluster of ordinary observations. In Figure 18, clusters 1 through 6 adhere together during the COVID-19 pandemic. Cluster 7 stands apart in time and contains 17 consecutive trading days. Cluster 0 accounts for nearly 99 percent of the full 5182 days.

39

Figure 19s ordered timeline reveals that cluster 7 does not overlap any period associated with the financial crisis of 2008–2009. Rather, cluster 7 consists of 17 days in August and September 2005. This is the first energy-specific event not identified by the broader array of all commodities. As will become apparent, these days coincided with Hurricane Katrina, which profoundly affected oil production and gasoline refining in and near the Gulf of Mexico [239,240]. Indeed, an enduring structural break between crude oil and spot gasoline prices is attributed to this event [241].

**Figure 19.** Spectral clustering of commodity markets—An ordered timeline.

#### 4.2.2. Mean-Shift Clustering

Relative to spectral clustering, the mean-shift method finds nearly twice as many clusters. More intriguingly, mean-shift clusters deviating from the central tendency of energy-specific volatility gather on a single side of the three-dimensional *t*-SNE manifold.

Figure 20 shows how mean-shift clustering is based on centroids. The centroids indicated by numerals are visibly distinct from the apparent center of gravity for each cluster within the *t*-SNE manifold's stylized three-dimensional space. Clusters 0 and 1, the two largest, exhibit the greatest apparent dislocation between centroids and individual instances. All other clusters, except perhaps clusters 2 and 7, are more likely to identify brief, compact events in the trading in crude oil and refined fuels. Such events likely arise from supply disruptions, as opposed to longer-lasting shifts in demand associated with broader crises affecting all commodities.

Figure 21 renders mean-shift results on an ordered timeline. Mean-shift clustering is manifestly more sensitive than spectral clustering. Cluster 0 plays its usual role as the fallback category. All clusters numbered higher than 1 are much smaller, containing (in two instances) as few as two days. Pronounced spikes are associated with the global financial crisis and the pandemic, as well as a previously undetected 2016 event.

Clusters 1 and 2, as the second- and third-largest clusters among the 15, fall between the extremes represented by cluster 0 and collectively by clusters 3 through 14. In addition to indicating several periods in the early 2000s, Cluster 1 brackets better known, already identified volatility events. It may be reasonably surmised that this cluster indicates the beginning or the end of distinctive events. Its appearance at the end of the peak of the pandemic reinforces what all-commodity clustering has already suggested: The pandemic arrived suddenly and began to relax almost as quickly.

Cluster 2 recurs on multiple occasions in the first half of this 20-year period and again in 2015. Those 60 trading days should share characteristics that distinguish them from the financial crisis, the 2016 event, and the pandemic.

Recombining mean-shift clusters from 15 into four—0, 1, 2, and all clusters numbered 3 or higher—provides a clearer picture. Figure 22 reports this summarized timeline.

**Figure 20.** Mean-shift clustering of energy-related markets—A *t*-SNE manifold.

**Figure 21.** Mean-shift clustering of energy-related markets—An ordered timeline.

**Figure 22.** Mean-shift clustering of energy-related markets—A simplified timeline compressing 15 clusters into four.

#### 4.2.3. Hierarchical Agglomerative Clustering

As a matter of visual interpretability as well as mathematical logic, hierarchical clustering begins with a dendrogram. Figure 23 suggests that the ideal number of clusters may be as low as three: A concentrated cluster of 51 trading days (not necessarily consecutive) in the middle in red, a moderately large supercluster of 848 days at right in cyan, and a very large supercluster of the remaining 4283 days at left in green. Deviating from cluster distance as a guide to the optimal value of *k* yields the 12 clusters along the bottom.

Distances within these 12 clusters average less than 30, as opposed to the distance of 60 separating a three-cluster configuration from its five-cluster alternative. Even so, many of these clusters will exhibit so little contiguity that it will take considerably more analyst judgment to cogently interpret hierarchical clustering.

The *t*-SNE manifold of hierarchical clustering in Figure 24 looks decidedly unlike the manifolds for spectral and mean-shift clustering. The affinity propagation and *k*-means manifolds will exhibit a shape similar to the hierarchical results. The greatest difference lies in the relative sizes and overlapping locations of the spheres representing the clusters. Aside from clusters 9, 2, 10, and perhaps 5, these clusters have large radii and overlap their neighbors. The centroids are synthetic, as in spectral clustering, and not stochastically instantiated, as in *k*-means. Overlapping spheres sugges<sup>t</sup> that the adjoining clusters will not be perfectly contiguous, or even close to being so.

**Figure 24.** Hierarchical agglomerative clustering of energy-related markets—A *t*-SNE manifold.

The ordered timeline in Figure 25 confirms these fears. Cluster 0, the closest representation of normal trading, has shorter stretches of uninterrupted, contiguous cogency than the default, background trading clusters under the spectral or mean-shift methods. Cluster 1, which appears during the financial crisis and the pandemic, also appears in 2001. Reducing the total number of clusters below the 15 clusters generated by mean-shift did not bring visible order to the timeline. Additional analyst judgment seems advisable.

Figure 26, the revised manifold, highlights the six smallest hierarchical clusters. A principled case can be made to include cluster 8, the seventh smallest among 12, because of its proximity to cluster 5 in the *t*-SNE manifold and in Figure 23s dendrogram. On the other hand, cluster 8 adds 484 days to the 415 total days in clusters 1, 2, 5, 7, 9, and 10. At 415 total days, those clusters comprise almost exactly 8 percent of the 5182 trading days. Adding 484 days from cluster 8 would raise the share of critical trading days to more than 17 percent. For the sake of comparison, mean-shift clustering identified 609 trading days of interest, while spectral clustering found only 70.

Whether critical periods in energy commodity trading comprise 8 or 17 percent of an entire timeframe requires delicate analyst judgment. An incidental benefit of forecasting conditional volatility through GARCH is the ability to estimate the degrees of freedom for the *t*-distribution that best fits each series of returns. Figure 27 shows that the estimated degrees of freedom for energy-related commodities ranged between 3.03 (WTI) and 3.71 (gasoil). For an equally weighted market basket of oils and refined fuels, *ν* ≈ 3.51.

**Figure 25.** Hierarchical agglomerative clustering of energy-related markets—An ordered timeline.

**Figure 26.** Hierarchical agglomerative clustering of energy-related markets—the *t*-SNE manifold revisited, with clusters of interest indicated by their centroids.

**Figure 27.** Estimated degrees of freedom (*ν*) for Student's *t*-distribution for each series of log returns. Energy-related commodities, including an equally weighted market basket, appear in red.

The degrees of freedom estimate enables the cumulative distribution function for Student's *t*-distribution with location = 0 and scale = 1 to describe the size of the tails at a given value of *ν*. At this dataset's estimates for *ν*, the two-tailed estimate for *F*(*x* | |*x*| > 2) ranges from 0.121634 for gasoil to 0.138395 for WTI. The estimate is 0.125950 for the equally weighted market basket. The one-tailed estimate would be exactly half of those values. The one-tailed estimate for *F*(*x* | *x* > 2) might be justified on the reasoning that volatility is invariably non-negative and that outliers found through clustering are likely to exhibit extremely high rather than extremely low volatility. That rationale, to say nothing of methodological conservatism, supports a smaller number of clusters.

By either measure, the six or seven smallest clusters occupy a distinct edge within Figure 26. All of the candidate clusters lie a palpable distance from the *t*-SNE manifold's center of gravity. This is intriguing (if not altogether conclusive) visual evidence that a size-based criterion can successfully isolate outliers among trading days.

Figure 28 simplifies the ordered timeline in Figure 25 by reducing the more conservative six-cluster interpretation of hierarchical clustering into binary classification. Those six clusters have been aggregated into a single "critical" supercluster, while all other days are classified as a normal, noncritical background. In addition to the financial crisis and the pandemic, simplified hierarchical clustering identifies periods of interest in 2000, 2001, 2003, 2005, 2015, and 2016.

**Figure 28.** Hierarchical agglomerative clustering of energy-related markets—A simplified timeline aggregating the smallest six among 12 clusters.

#### 4.2.4. Affinity Propagation

The smaller size of the energy-specific subarray created immense difficulty with affinity propagation. Scaling the element preference matrix according to the median values for each series cannot reduce the number of clusters close to the range of eight to 15, the number of clusters found by the spectral and mean-shift methods. More aggressive efforts prevented the algorithm from converging. The smallest number of viable clusters in affinity propagation appears to be 32.

Affinity propagation generates a beautiful but deadly *t*-SNE manifold (Figure 29). The large number of overlapping clusters, many enveloped in spheres with moderate to large radii, suggests that this method yields highly atomized, noncontiguous clusters.

**Figure 29.** Affinity propagation of energy-related markets—A *t*-SNE manifold.

Figure 30 displays an ordered timeline whose clusters are extremely hard to interpret. Affinity propagation is even more chaotic than hierarchical clustering (Figure 25). The larger the number of clusters, the likelier that individual clusters will splinter internally. Identifying financially meaningful groups of trading days requires extensive work.

Experience with more tractable clustering methods suggests a way forward. Critical and ordinary trading days are not uniformly distributed. The very process used to forecast volatility—GJR(1, 1, 1)-GARCH—presumes heteroskedasticity in the sequence of logarithmic returns. All else being equal, clusters identifying extreme levels of volatility are likely to be smaller than clusters describing lower background levels.

A viable filter therefore consists of tagging affinity propagation clusters for further evaluation until the cumulative number of trading days reaches a certain threshold. The 415 out of 5182 days selected by hierarchical clustering provide a workable benchmark. Isolating the 14 smallest among 32 clusters yields 384 trading days, roughly 7.4 percent of the total. Adding a 15th cluster would add the 78 days from cluster 12 and raise the number of potentially critical days to 459, or nearly 8.9 percent. Because cluster 12 is so close to the 14 even smaller clusters, we included it. Fortuitously, that choice ultimately made no difference in aggregation through voting.

**Figure 30.** Affinity propagation of energy-related markets—An ordered timeline.

Figure 31 isolates the 15 smallest affinity propagation clusters. As expected, these clusters occupy the left edge of the *t*-SNE manifold and resemble the critical clusters chosen by hierarchical clustering (Figure 26). Four subgroups are evident: Two appear closer to the top: Clusters 7 and 8 in one supercluster and clusters 11, 23, and 26 in another beneath it. Clusters 28 through 31 occupy the far upper left. Finally, clusters 1, 12 through 15, and 25 comprise a more diffuse but still distinct supercluster at lower left.

Figure 32 isolates these four superclusters. The first three superclusters cover contiguous or nearly contiguous periods corresponding to energy-trading events in 2005, 2016, and 2020. The last of these plainly covers the COVID-19 pandemic—specifically, its frantic first weeks. Clusters in 2005 and 2016, wholly distinct from the financial crisis and the pandemic, imply the occurrence of events quantitatively distinct from the fourth supercluster. Those clusters unite several events in the early 2000s and the back half of the pandemic with the financial crisis.

Analyst judgment, aided by the heuristic tool of choosing the *k* smallest clusters until some fraction of all trading days is attained, rescued an initially frustrating set of results from affinity propagation. We will apply a similar approach to *k*-means clustering.

**Figure 31.** Affinity propagation of energy-related markets—A *t*-SNE manifold, with clusters of interest indicated by their synthetic centroids.

**Figure 32.** Affinity propagation of energy-related markets—A simplified timeline showing the 15 smallest clusters, further subdivided into four groups of interest.

#### 4.2.5. *k*-Means Clustering

Finding the optimal number of clusters is as difficult as it is pivotal for *k*-means clustering [229,230]. Other methods have yielded as few as eight and as many as 32 clusters. Without reliable guidance from other tests, we proceed with *k* = 12, as suggested by hierarchical clustering and roughly halfway between spectral and mean-shift clustering.

Figure 33 shows another treacherously beautiful, highly overlapping set of clusters. Although *k*-means clustering proceeded on a value of *k* akin to the number of clusters found by mean-shift and hierarchical clustering, it attains less clarity. The failure to deliver cogen<sup>t</sup> clusters vexed affinity propagation and ultimately required considerable human intervention. Finally, the radial sizes of the spheres within the *t*-SNE manifold, aside from clusters 2, 6, 10, and maybe 11, sugges<sup>t</sup> that few if any clusters will be close to contiguous.

**Figure 33.** *k*-means clustering of energy-related markets—A *t*-SNE manifold.

As expected, Figure 34 shows a deeply fractured *k*-means timeline. Only clusters 6 and 10 approached perfect contiguity. Cluster 10 is more readily associated with the COVID-19 pandemic. Cluster 6 identifies the September 2005 Katrina event, which eluded detection by temporal clustering of all commodities.

The previously deployed size-based filtering technique converts the superficial chaos of *k*-means clustering into a credible division of energy-trading history. Figure 35 isolates the six smallest clusters (2, 6, 10, 11, 8, and 0) at the familiar left edge of the *t*-SNE manifold.

**Figure 34.** *k*-means clustering of energy-related markets—An ordered timeline.

**Figure 35.** *k*-means clustering of energy-related markets—A *t*-SNE manifold, with clusters of interest indicated by their synthetic centroids.

Figure 36 reduces the apparent chaos in *k*-means clustering (Figure 34) into a binary indicator of critical events. Familiar episodes have emerged: In addition to the financial crisis and the pandemic, *k*-means clustering isolates events in the early 2000s (including August/September 2005) as well as events in 2015 and 2016.

**Figure 36.** k-means clustering of energy-related markets—A simplified timeline showing the six smallest clusters, aggregated as indicators of critical events.

#### 4.2.6. Aggregating Clustering Results through Voting

All that remains is the aggregation of clustering results through voting. The much smaller number of energy-related commodities makes clustering more sensitive and more likely to find a larger number of critical events. In addition, spectral clustering is much more conservative than other methods. Consequently, some gradations in addition to the extreme outcomes of set theory might be warranted.

The union of all sets of clustering results is tantamount to a one-vote regime. The intersection of those sets effectively imposes a unanimous hard voting regime. Tabulating positive results from each clustering method as a single, equally weighted vote facilitates as many gradations as there are models. In this instance, five distinct models can generate votes ranging from 0 to 5. Any positive result is an element of the union of all five sets. The more votes required, the more stringent the voting regime becomes, until the intersection of all sets reaches the extreme of unanimity.

Figure 37 displays voting results. The only trading days receiving a single vote were those identified by mean-shift clustering but by no other method. Aggregation through voting becomes most interesting at the threshold of two votes. Moreover, the 70 days receiving unanimous support are coextensive with the days found by spectral clustering. Of the other 400 days, 333 received unanimous support from the four remaining methods.

**Figure 37.** A voting-based aggregation of temporal clusters and critical periods in energy-related commodities trading—An ordered timeline.

#### **5. Results, Part 2: Evaluating Critical Periods in Energy-Related Markets**

*5.1. Identifying and Classifying Critical Periods Located through Temporal Clustering*

If all periods receiving two or more votes in Figure 37 are treated as critical, or at least as candidates for such a classification, the following events emerge from the temporal clustering of energy-related markets between 2000 and 2020:


Three of these 12 events may be too brief or incoherent for proper examination. The noncontiguous days in fall 2000 and December 2004, as well as 30 September 2003, comprise a total of 11 trading days. The shortest span among the nine other events is the 13 days of the December 2000 event. Even if those three events are excluded from in-depth analysis, however, the 11 days they collectively span may be worth including in a broader definition of critical (as distinct from ordinary, noncritical) trading days.

A more generous definition of critical days remains available. Several clustering methods could have been expanded to include closer to 800 rather than 400 days. Days that are noncontiguous under this aggregation of clustering results may cohere once more days of possible interest are investigated.

Among the nine surviving events, it makes sense to distinguish between (a) events uncovered by temporal clustering of all commodities and (b) events unique to the energyspecific subarray. There are three possible and nonmutually exclusive justifications for separate treatment. First, the financial crisis of 2008–2009 and the COVID-19 pandemic may have affected *all* commodity asset classes in ways that meaningfully departed from the ordinary course of trading. Second, crises affecting all commodities are likelier to be deeper recessions affecting the broader economy across a wider geographic swath. In other words, events affecting other commodities in addition to oil and refined fuels arise from comprehensive declines in demand. By contrast, crises unique to energy markets are likelier to arise from disruptions in supply, attributable to acts of war, natural disasters, or even OPEC production decisions. Finally, the impact of the financial crisis or the pandemic on energy may have been so profound as to sway the overall commodities market.

#### *5.2. Visualizing and Evaluating Critical Periods Uncovered by Temporal Clustering* 5.2.1.CondiitonalVolatilityForecasts

In principle, temporal clustering precedes and enables more extensive analysis. Identifying events such as the global financial crisis, the COVID-19 pandemic, and energy-market disruptions associated with American military engagements offers even greater value when those events' financial characteristics are distinguished from those of calmer, ordinary conditions. This section visualizes conditional volatility and cumulative logarithmic returns during critical events.

Since temporal clustering operated on arrays of conditional volatility, it makes sense to depict conditional volatility during critical events. Cumulative log returns describe the experience of commodity traders during those events. They, too, are worth illustrating.

Figure 38 shows the volatility conditions during the nine critical periods identified through temporal clustering. Throughout 20 years, an equally weighted market basket

of Brent, WTI, gasoil, and gasoline exhibited an average GJR(1, 1, 1)-GARCH conditional volatility forecast of 1.918575. Collectively, all critical events exhibited average conditional volatility of 4.009828, while noncritical periods averaged 1.709983. Many but not all of the periods in Figure 38 exhibited peak volatility exceeding 4.00.

**Figure 38.** Conditional volatility forecasts during critical periods for an equally weighted market basket of four oil and fuel commodities, with details for each of the constituent markets.

The real question is why some periods showed elevated volatility for energy-related commodities, but others did not. Notably, both the financial crisis and the COVID-19 pandemic showed sustained volatility above 4.00. By contrast, a majority of the energyspecific critical events managed to stay below 4.00. Conditions of active warfare do not explain the difference. The Second Gulf War in 2003 remained below 4.00, while the event of 2016, comparable in duration and overall volatility, did crest above 4.00.

Every energy-related crisis does exhibit an upward volatility spike in at least one of four oil and fuel markets. The two episodes associated with the September 11 terrorist attacks and the American military response, the global financial crisis, and the COVID-19 pandemic all show the four individual markets spiking together and early. To a limited degree, the same can be said for Gulf War II in 2003.

The five other energy-specific events appear to be driven by a volatility spike in a single constituent market. Only the December 2000 event involved a spike in a crude oil market, as volatility in WTI rocketed in the middle of that month. Gulf War II occasioned a sudden rise in gasoil volatility, which remained high until markets eased seven weeks later. All other events—Hurricane Katrina in 2005 and the temporally proximate events of September 2015 and late winter 2016—involved spikes in gasoline.

At least to some degree, all nine critical periods identified by temporal clustering of volatility exhibit the imbalanced triangular shape associated with the rockets-and-feathers account of oil pricing and the Edgeworth price cycles in refined fuel markets. At or near the beginning of each event, volatility in at least one constituent market spikes. Volatility then eases slowly. Whether to describe the relaxation of volatility by analogy to feathers or gradations on a sawtooth blade appears to be a strictly esthetic question. Volatility during these critical events exhibits the triangular signature associated with either account of pricing dynamics in energy-related markets.

On the other hand, critical periods identified through temporal clustering do not invariably exhibit the peak-to-trough shape that characterizes traditional definitions of recessions and bull and bear markets. Though several episodes open with peak volatility for at least one of the four energy-related commodities, others do not. Given the mathematical basis of clustering, critical periods do not end because volatility reaches a local trough. Rather, they end because volatility has relaxed and returned to background levels.

Differences in the volatility profile of these events provide a reminder that temporal clustering by any one method reflects subtleties that can be erased during aggregation by voting. To be workable, the voting process must treat each method as though it were a binary classifier. Either a period is critical, or it is not.

Each of the individual methods nevertheless achieved subtleties by finding more than two clusters. For instance, the very conservative spectral clustering method isolated the 17 days it associated with Hurricane Katrina from six wholly separate clusters that collectively identified 53 days during the pandemic. Differences among those six periods become unrecoverable once they are aggregated as a "pandemic" supercluster.

Other methods reflect a similar subtlety. Mean-shift clustering suggested that a single cluster characterized much of the financial crisis as well as the geopolitically fraught energy crises of the early 2000s, but Katrina stood entirely apart. Hierarchical agglomerative clustering could have been interpreted as recommending three superclusters: one for 51 days during the crisis, another 848 days worthy of attention for abnormal volatility readings, and a third supercluster comprising all other trading days across two decades.

Analyst judgment looms large again. There may be no quantitatively consistent rule for striking the desired balance between the ease of isolating outliers on a binary basis and the nuance of discerning differences *among* outlier, critical periods.

#### 5.2.2. Logarithmic Returns

These periods' log returns do provide another tool. Visualizing log returns also depicts markets as investors understand them: by the ebb and flow of profit and loss.

Volatility events are associated, perhaps stereotypically and simplistically, with harrowing declines in asset prices. This perception is reinforced by the popular depiction of VIX as the "fear index." The log returns in Figure 39 sugges<sup>t</sup> far greater diversity and subtlety in the temporal clustering of energy-related markets. For the steep, sustained decline in demand associated with the financial crisis, the stereotype does apply.

Other events tell a subtly different story. The suspension of air travel in the United States after 11 September 2001 inflicted losses on all oil and fuel markets. That episode may represent a rare instance of an energy-specific crisis arising from an acute disruption in demand as well as supply, or instead of it. After a steep decline at the beginning of the ensuing invasion of Afghanistan, prices stabilized and rose. Though they were separated by less than a week, these were distinct events.

Although the rockets-and-feathers hypothesis and Edgeworth pricing cycles are associated with prices rather than volatility, the triangular charts associated with those accounts of energy markets do not appear in Figure 39. Temporal clustering of the volatility array did not isolate periods where prices rose rapidly and eased slowly. If anything, some

critical periods exhibit the opposite "boulders and balloons" pattern, by which gasoline prices steeply decline in response to oil price decreases, and then recover slowly [28].

**Figure 39.** Cumulative logarithmic returns during critical periods for an equally weighted market basket of four oil and fuel commodities, with details for each of the constituent markets. Cumulative log returns on precious metals are shown for purposes of comparison.

On the other hand, to the extent that these signature descriptions of price- or returnbased time series apply to energy markets under normal conditions, we might find that energy markets follow differently shaped arcs during critical periods. Indeed, it is entirely plausible that sawtooth-shaped or rockets-and-feathers patterns characterize volatility but not return during critical periods, while the opposite relationship governs ordinary, background trading. It is also possible that the iconic shapes associated with Edgeworth pricing cycles or rockets-and-feathers behavior do appear throughout these time series, but over time horizons longer than those of acute events isolated by temporal clustering. The behavior of energy markets during temporal clusters associated with ordinary, background trading invites further research.

The movement of precious metal prices also highlights the difference between the terrorist attacks and the Afghan invasion. Precious metals are considered hedges against inflation and geopolitical turbulence. The latter property is probably the dominant driver of precious metal prices during military activities affecting petroleum-exporting regions. Precious metal prices fell after 11 September 2001 but recouped their losses during the Afghan

invasion. Precious metal prices fell again during Gulf War II, when they accompanied even steeper declines in oil and fuel prices.

At least two events proved to be net winners for energy investors and companies. Despite a few downward spikes, winter 2016 eventually rallied these energy markets.

Even more dramatically, the onset of the COVID-19 pandemic inflicted catastrophic losses on oil and fuel markets, only to spark a ferocious rally. The price of gasoil, a fuel associated with industrial uses and long-haul transport, remained more stable throughout both phases. (Despite gasoil's superior fuel efficiency and lower levels of pollution [242], and despite the popularity of diesel-powered cars in Europe, gasoline engines in passenger vehicles outnumber diesel engines four to one [243].) It is little wonder that this historically unprecedented episode generated such diverse clustering results. At the same time, aggregating all methods enables the evaluation of four months of prices, returns, and volatility that know no equal in financial history.

#### *5.3. Comparing Energy-Market Impacts with Other Commodity Asset Classes*

Energy-specific crises may be best understood through a comparison with other commodity classes. Subjectively defined crisis periods offer a good starting point. In addition to six critical periods in broader commodity markets between 2000 and 2019 [164], we propose a seventh—the COVID-19 pandemic—as defined by temporal clustering of energy-specific volatility. The critical periods are as follows:


Figure 40 overlays these periods on conditional volatility for all commodity asset classes. A majority of these seven human-designated crises accompany visible spikes in volatility in energy-related markets, even though many such crises are either defined neutrally (for example, Chinese deceleration) or wholly by reference to other commodity markets (the coffee shock). Indeed, deceleration of the Chinese economy would explain the energy markets' September 2015 and winter 2016 events.

Aggregate statistics on energy-specific crises show elevated volatility for these markets (Figure 41). Energy-specific markets are more volatile on the whole, but the gap between volatility in these commodities and in all other asset classes grows considerably during volatility outliers in energy-related markets.

Unsurprisingly, defining crises according to a single asset class has the effect of highlighting volatility events unique to that class. An even more striking implication of Figure 41 is the reduction of volatility in almost every other asset class, even relative to noncritical periods generally. Only tropical and semitropical softs experienced increased volatility during energy-related events. Akin to the way VIX options and other volatilitybased strategies can hedge equity portfolios, stakeholders in the fossil fuel sector might consider broader holdings as a way to offset energy-specific turbulence.

**Figure 40.** Six human-defined commodity crises, 2000–2020, plus the COVID-19 pandemic.

**Figure 41.** Volatility for commodity asset classes, overall and during noncritical and critical periods.

The opposite directions of annualized log returns on all commodity classes, as shown in Figure 42, reinforce the intuition that other commodities move separately during events affecting solely energy-related markets. This exercise vindicates the wisdom of clustering all commodity markets before focusing on energy-specific events. There have been exactly two crises affecting all commodity markets since 2000: the global financial crisis of 2008–2009 and the COVID-19 pandemic. Aside from assets related to energy, no asset class lost ground during energy-specific events. Base metals did suffer steep price declines overall and lost ground relative to baseline rates of return during energy-specific events. Even that class did not decline in the aggregate, however, during the American military interventions of the early 2000s and the energy-market disturbances of 2015 and 2016.

**Figure 42.** Annualized logarithmic returns for commodity asset classes, overall and during noncritical and critical periods.

Figures 43 and 44 highlight the effects of the financial crisis and the pandemic. Though these broad events affected all commodities, they made a far deeper impression on energyrelated markets. Collapses in demand had a far greater impact on energy-related commodities and (to a lesser extent) base metals during the financial crisis. COVID-19, on the other hand, benefited the energy sector overall after historically unprecedented gyrations in both directions.

**Figure 43.** Volatility for commodity asset classes, overall and during the financial crisis of 2008–2009 and the COVID-19 pandemic.

**Figure 44.** Annualized logarithmic returns for commodity asset classes, overall and during the financial crisis of 2008–2009 and the COVID-19 pandemic.

#### *5.4. Comparing Crude Oil with Refined Fuels*

The examination of volatility and log return for energy-related markets in Section 5.2 suggested dramatic differences among individual markets. Internal differences among these markets may be more economically meaningful than differences separating oil and refined fuels from other commodities.

Volatility for Brent, WTI, gasoil, and gasoline is elevated during all energy-related events. Figures 45 and 46 should come as no surprise at all. Differences in scaling may obscure the fact that the across-the-board, the crises of 2008–2009 and COVID-19 in Figure 46 were more volatile than the energy-specific events in Figure 45.

There is a noticeable difference between refined fuels. The palpably lower levels of volatility for gasoil in all conditions suggests that this fuel enjoys a floor of demand that undergirds prices and returns throughout varying economic conditions. The flip side of gasoil's relative stability is greater susceptibility for gasoline. Faster and less consistent changes in demand for gasoline generate greater turbulence.

**Figure 46.** Volatility for energy-related commodities, overall and during the financial crisis and the pandemic.

Annualized logarithmic returns on Brent, WTI, gasoil, and gasoline tell a more dramatic story (Figures 47 and 48). Relative to crude oil, refined fuels absorb far more punishing losses in critical periods. Such losses—though by no means universal, as demonstrated by the winter 2016 event and the COVID-19 pandemic—are far steeper for gasoil and especially gasoline. WTI essentially broke even during the two greatest economic crises of the past two decades. Brent pulled affirmatively ahead of the breakeven point.

By contrast, gasoil and gasoline staggered during the financial crisis. They cratered during the onset of the COVID-19 pandemic, only to regain their footing and actually advance as pandemic conditions retreated during the summer of 2020.

**Figure 47.** Annualized logarithmic returns on energy-related commodities, overall and during noncritical and critical periods.

**Figure 48.** Annualized logarithmic returns on energy-related commodities, overall and during the financial crisis and the pandemic.

## **6. Discussion**

#### *6.1. Implications for Firms, Investors, and Governments*

"The interconnected nature of oil, metal, and agro-commodity price movements through the transmission of price shocks have serious implications for policymakers and investors" [57] (p. 1). Oil price volatility also affects strategic investment decisions by individual firms [244,245]. All stakeholders in energy markets should pay close heed to the identification of critical periods through temporal clustering.

As expected, the temporal clustering of the limited market basket of four energyspecific commodities generated a larger number of discrete critical events. The parallel exercise of clustering the broader basket of 22 commodities proves valuable in distinguishing between supply-related and demand-related events. Disruptions in demand affect multiple commodity classes. They tend to be associated with recessions, depressions, and other events of global scale. By contrast, supply disruptions tend to arise from acute crises associated with military operations and extreme weather. At least since 2000, supply-related crises have been unique to energy-related markets and tend to be shorter in duration.

These patterns confirm the value of the trichotomy identified in [126,127]. Though commodity prices are generally endogenous with respect to the global business cycle, they respond to demand shocks slowly but steadily. They respond to supply shocks with sharp but small and momentary movements. Though these effects may not be unique to energy-related markets, this article's focus on oil, gasoline, and gasoil certainly isolated all three effects.

The different duration associated with each of the two types of critical events affects managerial, investment, and policy prescriptions. Different stakeholders in energy markets and adjacent areas of the economy have different time horizons. At one extreme, the brevity of supply-related disruptions suggests that crises identified through the temporal clustering of the energy-specific subarray of volatility forecasts carries the greatest weight for short-term hedging and managerial decisions.

Longer-term investors and strategic managerial decisions (as distinct from tactical hedging decisions) depend more heavily on demand-related crises. These tend to be crises that emerge from temporal clustering of the all-commodities array as well as clustering of the narrower, energy-specific subarray. Changes in comovement and connectedness during these periods tend to be slower but also more enduring. Structural shifts in economic dynamics are likelier to occur during these overlapping crises, as opposed to acute events arising from disruptions in the supply of oil or its distillates.

The difference between long-term structural shifts due to changes in demand and episodic disruptions in supply carries profound macroeconomic implications. Conventional measures of core inflation exclude putatively volatile food and fuel commodities [171–174]. Consumer demand, at least for fuel, turns out to be quite inelastic in the short term. Temporal clustering uncovered rapid and extreme movements in fuel prices, only a few of which coincided with broader drops in demand detected by temporal clustering of all commodities.

Generalizations of the methods demonstrated in this article promise powerful insights, microscopic as well as telescopic. Extensions of this research can and should be both introspective and teleological. Opportunities for further research lurk within the data gathered for this article. In addition to the array of log returns for all commodities, as well as those related to energy, temporal clustering can use different variations on the theme of volatility. Historical volatility or additional conditional volatility forecasts at higher frequencies may yield different results, as would implied volatility derived from options trading.

The temporal clusters also invite closer examination. Hierarchical clustering could easily have been expanded to treat 17 percent rather than 8 percent of all trading days as potentially critical. The threshold for votes among clustering methods could be reduced from two to one. A softer definition of periods to be identified by temporal clustering may uncover, as hypothesized at the beginning of this article, inflection points as well as local minima and maxima within the history of commodities trading.

Obvious extensions beyond crude oil and refined fuels involve other asset classes among commodities, such as precious metals or the surprisingly placid market for tropical and subtropical softs. Although this article did gather data for as many as four additional asset classes among commodities—precious metals, base metals, temperate crops, and semitropical and tropical "softs"—the thoroughness needed to evaluate even one of those commodity classes would have required a considerable effort.

The value of examining temperate crops alongside oil, gasoline, and gasoil could be considerable. At a bare minimum, temporal clustering would enhance the understanding of connectedness between markets for fossil fuel commodities and food crops [78–80]. Corn as a feed stock for ethanol and soybeans as a feed stock for biodiesel directly affect oil markets [84]. Sugarcane, a crop not included in this article's data sources, is an obvious candidate for inclusion in such a comparison [85].

Financialization of commodities raises the premium on hedging. First-order opportunities for diversification and hedging lie within commodity markets. Precious metals experienced relatively less volatility and retained more of their value throughout all crises. During energy-specific events, if not in broader crises, agricultural commodities as a superclass proved resilient. This was particularly true of tropical and semitropical softs. Returns on those commodities mitigated many of the losses incurred by crude oil and refined fuels during energy-specific events. They even fared reasonably well during the financial crisis.

The relationship between energy-specific and agricultural commodities should provide especially useful guidance in emerging markets. The decoupling of energy commodities from softs may reveal hedging and diversification opportunities among investment opportunities in emerging markets. Petrostates tend not to depend on agricultural exports, and coffee and cocoa producers are not coextensive with OPEC. Extensions of this work

can critical moments identified through unsupervised machine learning with event studies. In addition to OPEC announcements [144,145], the public disclosure of decisions affecting major agricultural markets and the resolution of global trade disputes over agriculture can serve as bases for comparative analysis.

All capital markets invite temporal clustering. Deeper research should examine equities and sovereign debt as well as commodities. Although many sources addressing diversification opportunities affecting oil and refined fuels have specifically addressed other commodities (including but not limited to precious metals) [55,57], equity holdings can also contribute to diversification [50,114–116].

In addition to markets for equity and sovereign debt, the entire fixed-income marketplace presents an enticing target for temporal clustering. The market for debt includes Islamic sukuk [246]. Clustering by market movements should operate at two levels: Initially in financial space, as different instruments respond to interest-rate, default, and prepaymen<sup>t</sup> risk, and again in time as crises overtake and release different segments of the bond market.

#### *6.2. Additional Directions for Research: Temporal Clustering and Machine Learninng*

This article has demonstrated the feasibility of using unsupervised machine learning to isolate and interpret critical periods in financial and economic history. In terms of mathematical complexity, the methods demonstrated in this article lie somewhere between the most familiar benchmarks in the literature on the identification of regime shifts throughout economics. The clustering of all commodity markets, followed by a narrower focus on four energy-related markets—Brent, WTI, gasoil, and gasoline—encompasses subtleties that elude methodologies based on arbitrary 10 or 20 percent changes from short-term minima and maxima in stock market prices. By the same token, temporal clustering does not purport to capture all of the nuances of the dynamic-factor, Markov-switching model that the NBER uses to identify recessions in the United States.

The amount of subjective judgment used in this application of unsupervised machine learning likewise occupies middle ground. Since conventional definitions of bull and bear markets are based on fixed changes in stock prices, those exercises rely exclusively on the definition of peaks and valleys in recent financial history. Conversely, the selection of commodity markets and the admittedly crude taxonomy distinguishing oil and refined fuels from precious and base metals, temperate crops, and tropical and semitropical softs does not approach the depth of the research supporting the NBER's focus on non-farm employment, industrial production, real personal income, and real manufacturing and trade sales as broad macroeconomic indicators.

Much of the mathematical elaboration in temporal clustering arises from unsupervised machine learning itself. The categorical ontology of commodity markets is an artifact of the clustering of daily logarithmic returns for each commodity [1]. The clustering of trading days according to volatility forecasts generates far more diverse results. The vast difference in scale between two dozen commodities, give or take, and thousands of trading days makes temporal clustering that much more challenging.

Fixing the optimal number of clusters continues to pose a formidable barrier. One possible solution lies in using more deterministic methods, such as spectral or mean-shift clustering, to guide more malleable methods. Leading use cases include the calibration of element preferences in affinity propagation or the stipulation of *k* in *k*-means clustering.

By its nature, clustering as a branch of unsupervised machine learning divides large quantities of data into more tractable classes. The concurrent application of multiple clustering methods with wholly disparate algorithms highlights the applicability of an ensemble technique from supervised machine learning: the voting classifier. This article used voting methods to aggregate clustering results.

This article also exploited an intuition arising from clustering as a method for outlier detection. Especially for methods predisposed to generate a large number of clusters (affinity propagation) or to select noncontiguous clusters (*k*-means), one method for imposing

order on temporal clustering consists of selecting clusters until some threshold fraction of all trading days has been reached.

This method does inflict costs of its own. Any reduction in the number of clusters pushes clustering closer to binary classification and away from the nuances attained by multiclass clustering. Even the conservative spectral clustering method distinguished between the pandemic and the energy-specific event associated with Hurricane Katrina.

The extreme turbulence associated with COVID-19 provides a unique lesson. The four months after the pandemic's outbreak in March 2020 revealed radical shifts that had no precedent in this 20-year survey. Indeed, there may be no other period like it in modern economic history. The sudden shock to demand, to say nothing of uncertainty over the progression of the greatest threat to human health apart from war, destroyed normal channels for conveying economic information [247].

Utmost care in the volatility-based clustering of critical periods is advised, especially if clustering is treated as an exercise in binary classification. The nine discernible events highlighted in this article are quite diverse, even as they were treated as outliers in the ordinary fabric of financial spacetime. Cataclysms such as the financial crisis of 2008–09 and the COVID-19 pandemic swamp all commodities, though by no means equally. Other events exhibit unusual volatility in a single energy market, often (but not always) gasoline.

Even the direction of the impact on prices and returns is not uniform. Two events, notably, the winter 2016 event and the pandemic, witnessed sharp increases in energy prices. More precisely, these events represented superclusters of temporally contiguous but economically distinct periods. Temporal clustering can steer analysts toward intriguing moments.

On the other hand, clustering cannot dictate the course of economic history. Nor can clustering define the inferences to be drawn from economic analysis. As the poet T.S. Eliot wrote [248] (p. 26):

"The knowledge imposes a pattern, and falsifies, /For the pattern is new in every moment/And every moment is a new and shocking/Valuation of all we have been."

The comparison of temporal clustering across all commodities with the energy-specific subarray carries broad and important implications. The inescapably narrow focus on any fraction of the universe of valuable assets necessarily undermines efforts to model the entire economy according to that limited sample [128].

The reduction of complexity may ultimately prove more of a virtue than a vice. As economics advances by devising ever more elaborate models, from the decision-making level to that of the broader macroeconomy, simplification often holds the key to success [151]. The deeper the data, so it seems, the more vital it becomes to reduce complex relationships to their bare essence [151].

Unsupervised machine learning's greatest contribution may lie in its ability to reveal those moments where other analytical methods are most likely to fail. Such failures include the shortcomings of other branches of artificial intelligence. Failures in otherwise accurate deep learning models for forecasting economic time series may reveal macroeconomic regime shifts in an unintended and unsupervised fashion [249].

Temporal clustering may reveal the mirror image of this phenomenon. The application of unsupervised machine learning to economic time series can identify such shifts, or at least smaller breaks or departures, from otherwise prevalent financial or macroeconomic regimes. Such recognition, one can only hope, should happen *ex ante*, before policymakers adopt predictive models as elaborate and consequential as they are flawed.

Disruptions in financial or economic spacetime represent deviations from the "normal science" of economic exchange. Even if temporal crises do not shift economic paradigms, they raise departures from prior factual suppositions that warrant analytical calibration [250]. Posterior probabilities in Bayesian statistics and the concept of backpropagation in deep learning through neural networks embody this wisdom.

At the very least, critical periods identified through temporal clustering should not be expected to behave according to the usual rules of financial or economic engagement. *Ceteris* *paribus*, temporal outliers identify wrinkles in economic time when conventional wisdom and entrenched forecasting methods are most likely to fail. As necessity is the mother of invention, crisis is the font of philosophical foment and the father of discovery [250].
