*5.1. The K-Means Algorithm Clustering*

To avoid the influence of sparse data and speed up the process of convergence, PCA is used at first to reduce the data dimension [27]. The PCA scree plot is displayed as Figure 1.

**Figure 1.** PCA Scree Plot. The red line is the variance plot and explains the proportion of variation by each component from PCA; the green dotted line is the split line to better present components that have variance bigger than 1.

Often, there are two ways to obtain the number of principal components, that is, to retain a certain percentage of the variance of the original data or to retain only the principal components with eigenvalues greater than 1 according to Kraiser's rule [28,29]. It can be seen in the shown PCA results that there are five principal components with eigenvalues greater than 1, and when the number of principal components is 6, the cumulative variance contribution rate reaches more than 0.8. We finally choose to keep six principal components, that is, compress the 32-dimension original data to six dimensions. It is worth mentioning that several indicators ignored in previous research prove to contribute signficantly according to the PCA results, which are shown above. This is a strong testament to the effectiveness of big data.

There are many methods for deciding the number of clusters *K*. One simple way is to observe the sum of the squarred errors (SSE) with the change of *K* and select the point where SSE changes from steep to gentle. However, the Figure 2 shows that there is no very clear elbow point. As a consequence, we choose to use the Gap Statistic method [30]. Every *K* corresponds to a *Gapk* and *sk*, and *K* is selected as the minimal *K* that makes *Gapk* − *Gapk*<sup>+</sup><sup>1</sup> + *sk*<sup>+</sup><sup>1</sup> ≥ 0. We conduct simulations 50 times, as random sampling is also used in the Gap Statistic. The results are shown in Figure 3, and Figure 4 shows the most common case. It can be seen that when *K* = 4, 6, the GapDiffs are most likely to be greater than 0. Although inferior to *K* = 4, 6, *K* = 5 also shows a considerable frequency. Considering that academic performance evaluation needs an adequate *K* to produce reasonable results, we finally chose *K* as 4, 5 and 6.

In order to obtain credible results, we limit the iteration times of each simulation to 20, so as to avoid bad cases caused by random initialization. In addition, we merge those simulations that have very similar initialization and cluster results. We select the most representive case by comparing their clustering evaluation criteria [31,32]. This strategy makes it easier for us to analyze the performance of different algorithms. For eack *K*, we conduct 30 independent simulations and give the cluster details. To better visualize the clusters, we map the original data points to a plane using PCA. The results are shown in the table and graph below.

**Figure 2.** Sum of the Squared Errors Plot.

**Figure 3.** Results of Gap Statistic Simulations.

**Figure 4.** Gap Statistic Typical Result.

When *K* = 4, we can see from Figure 5 that the cluster completeness is well preserved. Only Xi'an Jiaotong University and Tongji University have small parts divided into different clusters, and the rest of the data points of the same university are all in the same cluster.

**Figure 5.** Clustering results of K-means when *K* = 4.

When *K* = 5, the cluster result Figure 6 still shows very good completeness. However, some universities have changed from one cluster to another. Peking University itself becomes one new cluster, and Wuhan University becomes clustered with Beijing Normal University and Fudan University.

**Figure 6.** Clustering results of K-means when *K* = 5.

When *K* = 6, things begin to change. We can see from Figure 7 that so-called rag bags, which mean small parts of data points that cannot be well clustered, begin to increase. This actually has a bad effect on the cluster homogenity. Shanghai Jiao Tong University and Sun Yat-sen University now also change the cluster and join with Fudan University, while Wuhan University and Beijing Normal University remain together.

**Figure 7.** Clustering results of K-means when *K* = 6.

It can be seen from Table 3 that the clustering indicators of the K-means algorithm are relatively stable. DBI is basically maintained between 1.3 and 1.6, DI is basically maintained between 0.05 and 0.08, and SC is mostly distributed above 0.3. It is in line with the previous SSE result and proves the cluster result to be reasonable. For results, with the change of *K*, the data points of Tsinghua University and Zhejiang University in all years are always the only two in the same cluster, which indicates that the academic level of these two universities is very close and there is a large gap between the two and the remaining universities. In addition, in all years data points of Central South University, Jilin University, Sichuan University, Huazhong University of Science and Technology, Shandong University, Tongji University, etc. always appear in the same cluster, indicating their academic level is close; Northwestern Polytechnical University, Beihang University, Beijing Institute of Technology, Harbin Institute of Technology and Southeast University are in the same situation, and the difference between these two clusters may be that the universities in the latter cluster have a strong color of science and engineering along with a national defense background. Considering that Xi'an Jiaotong University has a relatively uniform distribution in the two clusters with the change of *K*, it is likely that the academic level is close. We also notice that the clustering results of Wuhan University, Sun Yat-sen University, Fudan University, Shanghai Jiao Tong University, Beijing Normal University, Peking University and other universites changed greatly with the change of *K*. When *K* = 4, Beijing Normal University and Peking University are in the same cluster, but it is then divided as *K* increases. One explanation is that when *K* is small, Beijing Normal University and Peking University are clustered together because they have similar backgrounds in humanities and social sciences. However, because of the huge difference of academic level, the two are then divided. This also explains the cluster variance for Fudan University, Sun Yat-sen University, Wuhan University, and Shanghai Jiao Tong University. These are all comprehensive universities, and characteristics of both (1) humanities and social science and (2) science and engineering are relatively distinct. Therefore, for different *K*, they can be in the same cluster with Beijing Normal University or in the cluster of science and engineering backgrounds.


**Table 3.** K-means clustering results.

### *5.2. The GMM Clustering*

Different from the K-means algorithm, the Gaussian mixture model uses Gaussian distributions as feature descriptors, and it is able to softly assign weights for each component thanks to the Expectation Maximization (EM) algorithm. Consequently, the GMM can form clusters of more complicated shapes, which makes it suitable for the university academic data. Under the consideration of consistence with K-means and from the experience of previous work [33], we take the same simulation conditions as the K-means. The Gap Statistic method can also be applied to the GMM, so it is reasonable to choose the same *K* values. The results are shown in the table and graph below.

We can see from Table 4 that the overall performance of the GMM is better than the K-means in terms of clustering criteria. During the change of N-class, we can see that there are actually two patterns. The results of Figures 8 and 9 are actually very similar to that of the K-means. However, Figures 10 and 11 present a very unbalanced result. In thier case, almost all the universities of science and technology are clustered together, and the rest of the universites are actually always the same ones. Although good cluster criteria scores are obtained, the results of the GMM actually cannot be used for university academic evaluation, as they make no effective divisions. This indicates that a different feature extraction method is needed, and we use the SKM algorithm.


**Table 4.** GMM clustering results.


**Table 4.** *Cont*.

**Figure 8.** One case of the GMM when N = 4.

**Figure 9.** One case of the GMM when N = 5.

**Figure 10.** The other case of the GMM when N = 4.

**Figure 11.** The other case of the GMM when N = 5.
