1. Introduction
The idea that the branched structures of river networks are multifractal and non-plane filling objects was first suggested by several authors [
1,
2,
3]. The importance of the relation between the geometric characteristics of a river basin and its behavior under intense rainfall precipitation has been pointed out by Fiorentino et al. (2002) [
4], who introduced the Fractal Instantaneous Unit Hydrograph (FIUH) model, in which they linked directly the lag time distribution function of a basin with its fractal dimension
D, in the hypothesis that the a single fractal measure could characterize the whole river network. The model was further refined by De Bartolo et al. (2003) [
5], who substituted the fractal dimension
D of Fiorentino et al. (2002) with the generalized fractal dimension
under the assumption that a river basin could be better represented as a multifractal object than as a simple fractal, thus proposing the so-called Multifractal Instantaneous Unit Hydrograph (MIUH). Moreover, the same authors showed, as in real cases, the MIUH can reproduce simulated hydrographs, which are in fairly good agreement with the observed ones.
Then, it becomes evident as the possibility to effectively forecast a possible flood peak happening in a river basin in the presence of rainfall precipitation is strictly connected with the need to evaluate the geometric properties of the river network in reasonable times, in particular, according to the MIUH model, its generalized dimension . The problem of determining the fractal dimensions of the basin can be summarized in the following steps: first, the map of the river basin is projected onto a plane, and its structure is represented by picking up (manually, or through automatic algorithms) a meaningful set of points lying on the branches of such 2D maps, the so-called net-points; second, a multifractal analysis is carried out in order to derive the generalized fractal dimensions , the sequence of Lipschitz–Hölder exponents and the multifractal spectrum .
The latter analysis can be performed through several numerical algorithms, which have been widely investigated in the past (see, for instance, De Bartolo et al. (2006) [
6] and references therein for a discussion of the different approaches): fixed-size algorithms (FSA, e.g., box-counting and sandbox algorithms), the correlation integral method (CIM) introduced by Pawelzik and Schuster (1987) [
7], and the fixed-mass algorithm (FMA) introduced initially by Badii and Politi (1984) [
8]. Among the different approaches, the most promising one under the point of view of the possible application of an accurate determination of the parameters entering the MIUH model is the FMA. This happens because the generalized fractal dimension
, present in the MIUH model, keeps account of the contribution of the regions of the basin less populated by
net-points. Since both the FSAs and the CIM are based on the idea of counting the number of points falling inside a region of a given size, when such regions are filled up with few points, their contribution to the computation of high-order moments yields rather oscillating values due to the poor statistics, which, in turn, produce large errors in the determination of the generalized dimensions
. On the other hand, the FMA considers the mass, as established a priori, and computes the characteristic sizes of the subset containing that mass, and therefore, it does not suffer from such a problem.
The drawback of the FMA is in the slowness of the computation. Where the analyses carried out numerically through the FSA and CIM produce the results in a computational time that scales linearly with the total number of points N of the multifractal set, the FMA requires a number of operations proportional to , which can be a huge CPU time even on modern fast processors (see below for a detailed explanation of this high number of operations). This produces a hardly solvable dilemma: on the one hand, modern Digital Elevation Models (DEMs) and Laser Imaging Detection and Ranging (LIDAR) techniques help in producing high-quality databases of river basins, with a very high number of net-points, which can improve the precision in the computation of the multifractal generalized dimensions, useful for the flood forecasts by using the MIUH model, for instance; on the other hand, the multifractal analysis that supplies the best results for the predictive models, the FMA, requires very long computational times, which limits the effectiveness of real-time forecast of flood peaks.
However, although the number of operations needed by the FMA cannot be easily decreased, it is a fact that modern computational supplies are typically equipped with multicore CPUs or are even multi-CPUs. Moreover, it is not truly difficult nowadays to assemble computational facilities by clustering several moderately expensive PCs interconnected with good network switches to allow the computation of parallelizable algorithms on a considerable amount of computational cores. In a previous article [
9] (hereinafter we refer to this article as P1), we have discussed in detail a possible implementation of a parallel FMA by using the Message Passing Interface (MPI) paradigm [
10], which is nowadays a standard for parallel numerical computations on distributed memory machines (such as the clusters of PCs cited above). Such a paradigm can also be used on shared memory machines, such as single-CPU workstations. However, better speed-ups can be obtained on single-node workstations by using the Open Multi-Processing (OpenMP) paradigm, which is also a consolidated standard for parallel numerical computations on shared-memory machines [
11].
In the present article, we improved the parallel algorithm presented in P1 to include the use of OpenMP. This can improve the performance of the parallelization when the computation is carried out on single-CPU, multi-core workstations instead of clusters equipped with many computational nodes. Moreover, we further improved the analysis by modifying the algorithm to include a direct computation of the multifractal spectrum through the FMA [
12]. In fact, in P1, the assessment of the multifractal spectra had been carried out by first computing the generalized dimensions
through linear regressions from the computation of the moments, then obtaining the Lipschitz–Hölder exponents
and the multifractal spectrum
by using suitable numerical differentiation with a finite-difference approach. In the new version of the algorithm, instead, we used the direct procedure suggested by Mach et al. (1995) [
12] for the direct computation of
and
. This allows an improvement of the precision in the computation of the latter two quantities with respect to P1.
Other than giving more precise and reliable results in the determination of the multifractal properties of the river networks, the code currently used in the present work exhibits an almost ideal scaling of the CPU time with the number of computational cores, thanks to the hybrid MPI/OpenMP parallelization, as will be shown in the section of the paper devoted to the presentation of the results and the discussion. The improvements in the computational speeds are remarkable and are limited only by the available number of cores present on the machine. It is worth noticing that the speed-up is rather good on both single-node, multi-core machines (like most workstations) and multi-node, multi-core clusters, as will be shown below.
The plan of the article is the following. In next section, Materials and Methods, we explain the serial FMA, the reasons for which it requires the very long computational times given by and the implementation of the direct computation of the Lipschitz–Hölder exponents and of the multifractal spectrum; then we summarize the parallelization of the algorithm with the MPI paradigm, proposed in P1, and the new OpenMP parallelization strategy implemented in the present work. In the third section, devoted to the results, we applied the proposed numerical methods to the computation of the properties of a theoretical multifractal set to evaluate the improvements of the application of the direct computation of the spectra, and then we analyze how the algorithm scales on different parallel configurations, with MPI only, OpenMP only, or with a hybrid MPI/OpenMP approach. Finally, in the section devoted to discussion, we draw some conclusions on the effectiveness of the proposed algorithm and the possible implications this may have on flood forecasts.
2. Materials and Methods
The multifractal analysis of a set of points is based on generating different partitions of the set itself and studying the scaling properties of the characteristic lengths with respect to the number of points (the so-called “measure” or “mass”) falling in each partition of the set. In particular, we assume that
N is the total number of points of the set and that we can partition the set in
non-overlapping cells. Let us call
,
, the size of the
i-th cell and
the total “mass” (in our case, the number of
net-points) falling inside the same cell. For points belonging to a multifractal set, it is possible to show that the following relation must hold:
k being a constant, generally taken equal to 1, without losing generality.
In Equation (
1), it is possible to show that, for each value of
k, an infinite ensemble of solutions
of the problem exists. The generalized fractal dimensions
are related to those solutions through the relation:
When the solutions
are known, the Lipschitz–Hölder exponents (or “spectrum of the singularities”)
and multifractal spectrum
are given through the relations:
The approach of the FSA (just to fix the concepts, we use as an example the procedure followed in the
box-counting technique) to find the solutions
of Equation (
1) is to consider an ensemble of partitions: for each of them, the sizes of the cells
are kept constant (
, from which the name fixed-size algorithm), but this constant value decreases exponentially when changing the partition. By taking the logarithms of both sides of Equation (
1), in the limit of a vanishing size
s, one obtains:
with:
being the
-th moment of the “mass”
falling inside each box of the partition. Therefore, the solution
, for a given set of values of
q can be determined as the slope of the linear relation shown in Equation (
4) for vanishing values of the size
s. In practice, for each value of
q,
is computed through a linear regression of the logarithms of the moments
with respect to the logarithm of the size
s by considering
. Once the solutions
of Equation (
1) have been determined, the generalized dimensions
, the Lipschitz–Hölder exponents
and the multifractal spectrum
can be determined by using Relations (
2) and (
3).
For the FMA, the situation is the opposite. Instead of keeping the size
as constant for each cell of the
i-th partition, one fixes the “mass”
(in our case, the number of
net-points inside each cell of the partition) and computes the characteristic size
of the cell containing that specific number of points. Again, the solution
of Equation (
1) can be determined by taking the logarithms of both sides of the equation and considering the limit for
:
with:
Here, we took into account the fact that
whenever
. In this case, the solution
of Relation (
1) is obtained by fixing a set of values for
and finding the corresponding value of
q as the slope of the linear relation shown in Equation (
6), which yields the value of
. The latter operation can be carried out as a linear regression of the logarithms of the moments
as a function of the logarithms of the “mass”
p (always for
).
In practice, the FMA works in the following way: (i) first, one fixes a set of values for
in a given interval and a discrete set of values for
p (between 1 and
); (ii) for a given value of
p, the size of the circle containing a “mass” equal to
p is computed by calculating, for each of the
N points of the set, the first
p-th nearest neighbors; (iii) once the different radii
of the circles containing at most
p values are found, for each point, one can compute the moments (Equation (
7)) for all specified values of
. The true bottleneck of the algorithm lies in the point (ii): since for each point of the set, the reciprocal distances for all the points must be computed, and, subsequently, the vector of such quantities must be sorted in increasing order, in such a way that the
p-th element of the vector represents the radius of the cell containing, for each point, the
p nearest neighbors. The sorting operation can be carried out in a “fast” way, in
operations, for instance, with a heap-sort algorithm, but the computation of the distances between each couple of points requires about
operations. Therefore, the total number of operations required for a set containing
N points is:
which is a really huge number for high values of
N, much higher than the simple
counting operations needed by many of the FSAs.
On the other hand, the FMA is superior to the FSAs, in many respects, at least for the assessment of the multifractal properties of river networks, as shown by [
6]. In fact, as cited above, the main quantity connected to the geometrical properties of the set and entering the MIUH model is the quantity
, which corresponds to the zones of the fractal set where the points are more rarefied. This also corresponds to the maximum value of the Lipschitz–Hölder exponent
. Therefore, the precise determination of the generalized dimensions
in the zones where the number of points is more rarefied is crucial for the possibility of obtaining an optimal forecast in the MIUH model. Unfortunately, FSAs exhibit strong oscillations in the computation of the moments (Equation (
5)) in the more rarefied zones of the set due to poor statistics (because the size is fixed, and the “mass” falling in the cell of the partition is “counted”). On the other hand, FMA does not suffer from such a problem since, in that case, the number of points is fixed and the size of the cell computed accordingly. Therefore, the assessment of the
with the FMA is much more precise than with FSA, i.e., the first is more suitable to be used for flood forecasting models like the MIUH.
This represents a problem, in view of the fact that modern DEM and LIDAR techniques allow the extraction of a quite high number of
net-points, which would help improve the predictive quality of the MIUH. Moreover, recently, an attempt has been made to investigate scaling laws in flow patterns obtained by numerically simulating the behavior of a river basin through a 2-dimensional Shallow-Water approach [
13]. In such cases, virtually every point of the numerical grid can be considered as a
net-point, thus requiring a significant numerical effort to characterize the multifractal properties of the basin under study.
A possible solution for the problem of the long computational times for high values of N, as evidenced in P1, is the parallelization of the FMA. In P1, we proposed a possible parallelization of the algorithm based on the Message Passing Interface (MPI) paradigm. This computational model has become a standard for parallel computing, especially on distributed memory machines. It requires that the computations needed by the numerical code can be divided among a team of numerical cores participating in the parallel computation. This condition is realized by starting at the same time separated instances of a process on the different cores and whenever one of the cores needs to be informed on the status of the computation carried out by other cores, the processes communicate among them through a Local Area Network (LAN) interconnection. This is quite effective on distributed memory machines, in which several independent nodes are reciprocally interconnected through a LAN, but it can, in principle, also work on single-node machines by running the different instances of the code on the different cores of a single, multi-core CPU. However, simulating the presence of a network communication on such machines, where no network is necessary since the cores communicate through the physical bus of the processor, generates some latency in the communications. In P1, we measured the speed-up of the code by finding a fairly good scaling of the computational speed with the number of processors , although the speed-up curve has a linear trend (i.e., very close to the theoretical scaling) for smaller only and tends to saturate for an increasing number of computational cores due to the above-cited latency in the communications.
For this reason, we implemented the OpenMP directives into the code in order to improve the computational speeds on shared memory machines. This paradigm is nowadays a standard for this kind of computation, and it works through the use of “directives”, which can be considered as simple comments, and therefore ignored, unless the compiler is specifically instructed to consider them with special compilation flags. The parallelization in OpenMP works in a fork/join implementation: a single process executing the code forks in a sequence of “threads” (whose number can be specified through environment variables before the execution of the run) when it encounters # pragma omp parallel directives, which marks the beginning of a region of the code that can be executed in parallel. The threads join at the end of the parallel region, and the execution of the code continues as a serial run. The real effectiveness of such a paradigm depends strictly on the problem solved by the code: codes involving large, time-consuming loops can be parallelized much more effectively with respect to a sequence of small loops due to the latency in the creation of the threads that, being almost independent processes, can require a fairly large amount of CPU time to be created.
However, in the case of the FMA, the main bottleneck resides in the big loop that computes the mutual distances between the
net-points and sorts them. This represents an almost ideal case for an OpenMP implementation. As we will show below, indeed, the improvements obtained with the adoption of the OpenMP paradigm are relevant with respect to the pure MPI case. Moreover, the application of both the paradigms (MPI and OpenMP) is not at all new in hydraulics applications (see, for instance, [
14,
15,
16] for relevant hydraulic applications, albeit in different contexts).
Moreover, beyond improving the computational times of the code on single-node, multi-core machines, we also improved the precision in the determination of the multifractal spectrum by including the direct computation of the Lipschitz–Hölder exponents
and multifractal spectrum
in the code, adopting the technique first introduced by Mach et al. (1995) [
12]. In P1, for instance, once the solutions
of Equation (
1) are known, the determination of
and
was carried out by computing the derivatives in Equation (
3.1) by using a second-order finite differences method with a non-uniform stencil due to the fact that the values of
q are chosen uniformly distributed in a given interval, but the
, which are numerically calculated from the linear regression of the moments as in Equation (
6), are not equally spaced (for details about the used formulae, see P1). This introduces some amount of numerical error, which makes the determination of
and
less precise. Instead, the direct computation method introduced in [
12] computes the spectrum of the singularities and the multifractal one through the relations:
where:
which represents a normalized measure of the characteristic lengths to the power
. The advantage of the above method is that it does not require any numerical differentiation and a linear regression between the quantities at the left hand side of Equation (
8) as a function of
is enough to compute both
and
. A detailed derivation of Equation (
8) is shown in
Appendix A.
The parallelization strategy adopted in order to divide the computational effort among the different nodes and cores of the computational machine was oriented to accelerate the computation of the mutual distances among the net-points, which is a algorithm. By following the method pursued in P1, we read the file with all the net-points on all the MPI processes; however, the computation of the distances is carried out on different subsets of points on each computational node. The parallelization with OpenMP follows the same idea: on each node, the loop for the computation of the distances is further divided on each computational core through a #pragma omp parallel directive by suitably specifying the different role (private or shared) of the loop variables.
When the mutual distances of each point from the others have been calculated, such a vector is sorted in ascending order with a heap-sort method. This operation is still carried out in parallel for each node and core. By sorting the vector of the distances, we ensure that the
p-th element of such a vector corresponds to the radius of the circle containing at most
p nearest neighbors. At this point, the contribution to the moments in Equations (
7), (
8.1) and (
8.2) is executed separately on each core. At this point, two “reduce” operations—the first to add-up the contribution computed on each core through a
pragma opm reduce directive and the second to sum the contribution of each single node through a
MPI_REDUCE operation—are performed in order to evaluate the final values of the different moments.
Finally, the linear regressions necessary to compute the values of
,
and
through the direct method [
12] are calculated through a python script. Examples of the results obtained in the case of a deterministic multifractal are shown in the next section.
4. Discussion
In the past, several studies have shown that river networks are non-plane filling, multifractal structures and that their geometric properties strongly influence the behaviour of the instantaneous hydrograph during rainfall precipitation. Models such as the MIUH, discussed above, require detailed knowledge of the generalized dimensions of the river basin. In the past, several authors have shown that, among the several possible methods, the FMA was the one yielding the most precise assessment of the multifractal measures for hydrological applications. However, the FMA has the disadvantage, with respect to other methods, to require rather long computational times to accomplish the analysis.
In order to tackle this problem, we transformed a serial code, developed in the past to implement the FMA, into a parallel code to reduce the computational times. In fact, nowadays, modern computers have an increasingly higher number of computational cores, which can be used together to speed-up the computation. The present study investigates the possibility to improve this already existing parallel (pure MPI) numerical code to include the parallelization through the OpenMP paradigm, which is nowadays a standard for parallel computations on shared memory machines. Although the code parallelized with MPI allowed the parallel execution not only on multi-node clusters but also on single-node workstations, we have shown here that the adoption of the OpenMP parallel directives are quite beneficial for the speed-up of the computation. In any case, the code can be run in a hybrid mode by using the MPI library to manage the execution on multi-node clusters and OpenMP to ensure the best possible execution time on the single node. Our results have shown that the latter yields the best results in terms of computational speed-up in the case of distributed memory machines.
As an example, the serial version of the code in the case of
net-points takes about 1203 CPU seconds to be executed according to the results shown in
Table 1. This means that, if we had
net-points, namely 10 times more than our simple study-case, by retaining only the most important terms in the number of operations (
), we can expect that the CPU time needed by the serial version of the code would be about 120300 CPU seconds, namely around 33.5 CPU hours. If we have a machine with 16 cores available, for instance, due to the quasi-linear scaling of the code in the hybrid MPI/OpenMP case, we could obtain the results in just 2.5 CPU hours. This is, of course, not an ideal situation, but, in principle, the speed-up is limited only by the number of cores available on the machine. The pure MPI implementation realized in P1, for instance, would yield an almost ideal speed-up for a lower number of processes only due to the latency of the network interconnection. This situation can be minimized by using the hybrid MPI/OpenMP implementation of the code described in the present work.
Along with the improvements in the parallelism of the computation, another considerable step towards a better assessment of the generalized dimensions of a river basin was taken by introducing a direct method for the computation of the Lipschitz–Hölder exponents and the multifractal spectrum. This allowed us to obtain, at least for the known case of a deterministic multifractal, a very precise assessment of those quantities, better than that obtained in the past through numerical differentiation.