**1. Introduction**

With continuously growing populations, water resources are becoming more and more important for urbanization and agricultural intensification, especially for developing countries [1–3]. In the process of water resource planning, streamflow forecasting plays a key role in hydrological risk assessment, reservoir operations, drought/flood prevention, and water resource allocation [4–6]. More importantly, the management efficiency of water resource systems mainly depends on the reliability and accuracy of hydrological prediction. Consequently, it is desirable to employ streamflow forecasting models for effective water resources planning and management.

**Citation:** Li, H.; Huang, G.; Li, Y.; Sun, J.; Gao, P. A C-Vine Copula-Based Quantile Regression Method for Streamflow Forecasting in Xiangxi River Basin, China. *Sustainability* **2021**, *13*, 4627. https:// doi.org/10.3390/su13094627

Academic Editor: Wen-Cheng Liu

Received: 8 March 2021 Accepted: 19 April 2021 Published: 21 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Over the last few decades, great efforts have been made towards developing advanced forecasting techniques to improve hydrological prediction, including process-driven and data-driven statistical approaches [7–9]. Process-based modeling methods are based on the principle of water cycle balance coupling various physical processes, such as precipitation, evaporation, infiltration, and other processes [10,11]. These models use large amounts of data (e.g., hydrometeorology, topography, and land use/cover) and robust calibration techniques, while data-driven models can be easily built in practice without considering physical process information from hydrological models and have been extensively used [12–14]. Therefore, data-driven technology is very useful and valuable as an option for streamflow forecasting.

Previously, a variety of data-driven modeling techniques were proposed and promoted for streamflow forecasting, including autoregressive moving average, multiple linear regression (MLR), stepwise cluster analysis, artificial neural networks (ANN), genetic programming, and support vector regression (SVR) [15–17]. For example, Besaw et al. [18] employed the ANN method for streamflow forecasting in ungauged basins. The results showed that local climate measurements with time delays as the input to the model are key to improving hydrological forecasting. Guo et al. [19] coupled an SVR model with adaptive insensitive factors to predict monthly streamflow, which was proven to be effective and to have high accuracy in streamflow prediction. Terzi and Ergin [20] used autoregressive (AR) modeling, gene expression programming (GEP), and adaptive neuro-fuzzy inference system (ANFIS) to predict the monthly mean flow of a watershed in Turkey. The results indicated that the developed models had good performance. Fan et al. [21] established a stepwise cluster forecasting (SCF) model for monthly streamflow forecasting, which effectively reflected the nonlinear and discrete relationships between climatic factors and streamflow. In general, these data-driven techniques can effectively simulate hydrological elements by capturing the complex interrelationships among the multiple hydrometeorological inputs. However, these models can often be flawed when predicting outliers (such as flood events), leading to illusory relationships between the response and independent variables [22].

To overcome these limitations, in this study, the copula method is proposed to flexibly construct the joint distribution to describe the complicated dependence structure between stochastic variables. Copula functions have been extensively applied to construct multivariate models and forecasting in several areas such as flood frequency and drought analysis, rainfall and climate predictions, financial risks, and energy [23–26]. However, it is difficult to derive multivariate copulas directly. Fortunately, vines known as pair copula constructions (PCCs) can describe the correlation structures between high-dimensional response-independent variables, providing an efficient and flexible tool to analyze the dependency structures between complex coupled correlated variables [27]. Moreover, the vine copulas coupling the quantile regression provide a more complete statistical analysis of random relationships between stochastic variables, such as tail or asymmetric dependence. Specially, quantile regression (QR) was introduced by Koenker and Bassett to estimate the conditional quantiles [28]. Given the distribution of the variables, the QR method can capture the total variation, heavy tail, skewness, and kurtosis of variables and can support the calculation of confidence intervals. Moreover, the method can estimate the levels of risk in extreme cases [29,30]. Quantile regression has been successfully applied in various scientific fields, such as economics, finance, and medicine [31–33]. Therefore, this study integrates the copula and quantile regression methods to explore the complex dependence among variables. Notably, the data-driven model is often influenced by the division of training and validation data sets. In many cases, the simulation and validation effects of the model are often affected by the data inputs, especially in a changing climate environment. Therefore, in order to overcome the possible influence of different data inputs on the model and randomness errors in the simulation process, the calibration and verification data sets are divided at certain points with the five-fold cross-validation method. In this study, the predictions are repeated five times using different training and test data sets.

Therefore, this study aims to develop a C-vine copula-based quantile regression (CVQR) model for streamflow forecasting. The proposed CVQR model can construct a conditional copula prediction model to capture the relationship between streamflow and hydrometeorology variables. The developed method has advantages in (i) modelling the dependence among the multidimensional response-independent variables, (ii) revealing the complicated interrelationships among hydrometeorological factors, and (iii) outperforming MLR and ANN on issues related to upper tail dependence (i.e., flood events). These findings are very helpful to decision-makers in hydrological process identification and water resource management practices.

In this study, the CVQR model is applied to the Xiangxi River basin to illustrate its applicability in streamflow prediction with multiple hydrometeorological factors. Specially, the structure of this article is as follows. Firstly, the MLR, ANN, and CVQR models are introduced in Section 2. Next, the study area and database, and the method of evaluation for the various functions are depicted in Section 3. In Sections 4 and 5, relevant results from the proposed model applied in our research area, and a comparison with and discussion about the results of different models are described.
