*Article* **Roadmap Optimization: Multi-Annual Project Portfolio Selection Method**

**Ran Etgar 1,\* and Yuval Cohen <sup>2</sup>**


**Abstract:** The process of project portfolio selection is crucial in many organizations, especially R&D organizations. There is a need to make informed decisions on the investment in various projects or lack thereof. As the projects may continue over more than 1 year, and as there are connections between various projects, there is a need to not only decide which project to invest in but also when to invest. Since future benefits from projects are to be depreciated in comparison with near-future ones, and due to the interdependency among projects, the question of allocating the limited resources becomes quite complex. This research provides a novel heuristic method for allocating the limited resources over multi-annual planning horizons and examines its results in comparison with an exact branch and bound solution and various heuristic ones. This paper culminates with an efficient tool that can provide both practical and academic benefits.

**Keywords:** metaheuristics; project selection; portfolio management; resource; R&D; roadmap; program management

**MSC:** 90B35

**1. Introduction**

Most project-based organizations are faced with a long list of proposed projects that compete for a limited set of resources such as money, manpower, and equipment [1]. Project portfolio selection (PPS) aims to find which projects an organization should take [2–4]. Needless to say, as the decision to allocate and prioritize projects today affects the organization's competitive position in the future [5], and the decisions of initiation (and termination) of projects are of a strategic nature since they involve the commitment of substantial enterprise resources [6], it is recognized worldwide that there is a need to manage projects as an overall portfolio [7] and not as separated projects [8].

Evidently, organizations wish to maximize their return on investment when selecting projects [9], and therefore, the selection process should be based on criteria that take into account this objective function [10].

The operational research problem of PPS was defined [11] as the situation where several projects are available for investment. They are different in their resource needs (both resource types and resource demand level).

The current research and method are lacking in two aspects: they do not take into consideration the time value of the projects (i.e., contribution depending on the projects' completion time), and they also do not consider the dependence between the projects (i.e., precedence and competition over resources). This research aims to present a novel formulation of the problem, namely one that incorporates theses aspects. The article also provides an exact solution algorithm (branch and bound) that can solve small to medium problems. However, due to the NP-hard nature of the problem, large-scale problems require a different approach. Thus, several metaheuristic algorithms are proposed and analyzed.

**Citation:** Etgar, R.; Cohen, Y. Roadmap Optimization: Multi-Annual Project Portfolio Selection Method. *Mathematics* **2022**, *10*, 1601. https://doi.org/10.3390/ math10091601

Academic Editors: Humberto Rocha and Ana Maria Rocha

Received: 20 March 2022 Accepted: 27 April 2022 Published: 8 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Literature**

The research in the area of project portfolio selection has been growing intensively in the past decade, with intensive proliferation of research articles tackling a myriad of variations of this problem. Some of this proliferation was summarized in several review papers. For example, Frey and Buxmann [12] reviewed the literature on portfolio selection of IT projects. Weissenberger-Eibl and Teufel [13] provided a strategic and political review on project selection. Padhy [14] and Condé and Martens [15] reviewed six-sigma project selections. Danesh et al. [16] provided a broad review of multi-criteria portfolio management, and Mohagheghi et al. [17] reviewed models, uncertainty approaches, solution techniques, and case studies.

The simplest model of the project selection problems is single-attribute optimization under resource constraints. This problem resembles the well-known knapsack algorithm [18,19]. This problem was primarily extended to reflect synergies and uncertainty [18]. The main extensions to this type of optimization were with interactions and under uncertainty [20,21].

Another classical model of the project selection problem is financial project portfolio selection, which is based on mean profit vs. profit variance [22]. However, this model of portfolio selection is rooted in investment in the stock markets, and its use is indeed more suited to investment decisions in a portfolio of stocks and other financial assets. Thus, only a few papers on project portfolio selection adopted this modeling approach [9]. This model's assumptions and characteristics are more suited to pure investment decisions than to a company or organization decision problem.

Biobjective models were a natural evolution of single-objective optimization [20,23]. While multi-objective optimization became the research mainstay of project portfolio selection, biobjective models, due to their simplicity, still attract some research attention [19,23–25].

The multi-objective optimization models followed the aforementioned models [26–30]. The literature on multi-objective project portfolio selections proliferated [31–36] and became the main branch of the project portfolio research [16], and it is still prevalent [37]. Within this multi-objective framework, robustness became a prevalent requirement and an objective to address the uncertainties of project portfolio selection [29,32].

During the last two decades, fuzzy logic started to play an important role in decision making, and in the past decade, it made its way into the portfolio selection literature. For example, Perez and Gomez [33] and Perez et al. [38] used fuzzy constraints between projects, while others [1,35,39,40] used them in the objective function.

Another branch of research gathered both project scheduling and project selection into a single decision frame (some examples are in [19,30,41–43]).

While the above discussion dealt with project selection in technical terms, strategic project portfolio selection presents a very different approach [25,44–46]. Killen et al. [4] identify three strategic perspectives of project portfolio selection: (1) the resource-based view, (2) the dynamic capabilities view, and (3) the absorptive capacity view. Kaiser et al. [47] stressed the role of structural alignment of selected projects with the organization's values, vision, and strategy. Kopmann et al. [46] suggested fostering both deliberate and emergent strategies. Finally, Guo et al. [25] suggested balancing strategic contributions and financial returns.

While the existing reviews cover their relevant part of the literature and suggest some classifications of the project selection problems, a more formal and general classification system can contribute to them. We suggest a system that would be close to Kendall's notation in queuing theory [48]. To foster a discussion for filling this gap, we present here our initial attempt at classification of project portfolio selection's main problem types using the major characteristics of these problems. It is hoped that the suggested classification scheme will initiate a discussion (or even a debate) which will culminate in an agreed-upon standard classification method. In Figure 1, we propose a classification method for the PPS problem. The suggestion is to classify the problems by their objective type, the solution method used to make the selection, the nature of the data, and the constraints. Therefore, the proposed method has four classifiers:

## **ůĂƐƐŝĨŝĐĂƚŝŽŶŽĨWƌŽũĞĐƚWŽƌƚĨŽůŝŽ^ĞůĞĐƚŝŽŶWƌŽďůĞŵƐ**

**Figure 1.** Project portfolio selection classification scheme.


Some examples of this classification scheme are shown in Table 1.


**Table 1.** Examples of the suggested classifications.

As stated above, enhancements to this classification scheme and even challenges are welcome as part of future research.

The problem researched in this article is of a single objective (maximum value); the data are assumed to be deterministic, and no depleted resources are concerned. The article utilizes two solution types: optimal for small-to-medium-sized problems and metaheuristic searching for larger ones. Therefore, this problem should be classified according to the hierarchical classification of Figure 1 as multi-attribute, optimized, and deterministic.

This article does not consider random, fuzzy, or gray data.

#### **3. Problem Description**

The problem deals with an ongoing situation of R&D to develop projects. The need is to decide which projects should be scheduled for each year in the planning horizon. As a company has a limited amount of resources each year, it is impossible to perform all projects at once, and therefore, there is a need to schedule less-lucrative projects for distant years. In the case of no constraints, all projects would have been planned for the first year. However, this is not the case. Two reasons compel postponing projects to future years:


## *3.1. Problem Assumptions*


## *3.2. Problem Notations*

To assist understanding of the formulation, Table 2 depicts the notations used for the formulation.

**Table 2.** Problem notations.


#### *3.3. Problem Formulation*

The objective function is for maximizing the cumulative value of the project portfolio such that

$$\max \mathcal{V} = \sum\_{\forall y\_i \in \mathcal{T}} \sum\_{\forall p\_k \in \mathcal{P}} \sum\_{\forall x\_{k,i} \in \mathcal{D}} y\_i p\_k x\_{k,i} \tag{1}$$

Each project is to end in one year only, so the relevant constraints are

$$\sum\_{\forall i} \mathbf{x}\_{k,i} = 1 \; \forall k \in \{1, 2, \dots, N\} \tag{2}$$

Since a project can stretch over more than one year (i.e., start before its final year), there is a need for an auxiliary set of variables (<sup>=</sup> *Z*), denoting the years a project can use resources (i.e., the project cannot use resources after its ending):

$$z\_{k,i} \le \sum\_{l=i}^{H} \mathbf{x}\_{k,l} \; \forall k \in \{1, 2, \dots, N\} \tag{3}$$

Thus, *zk*,*<sup>i</sup>* can have the maximum value of 1 for each year until its ending year and 0 onward.

To ensure that the project spreads over consecutive years, we use

$$z\_{k,i} \le z\_{k,i+1} \,\forall k \in \{1, 2, \dots, N\}, \forall i \in \{1, 2, \dots, H-1\} \tag{4}$$

This notification enables the setting of the resource-consuming variables ( <sup>=</sup> W), denoting the level of resource j consumed by project *k* at year *i*:

$$w\_{k,i,j} \le q\_{k,j} z\_{k,i} \; \forall k \in \{1, 2, \dots, N\}, \; \forall i \in \{1, 2, \dots, H\}, \; \forall j \in \{1, 2, \dots, P\} \tag{5}$$

Thus, a project may consume part of the resources it needs in year n and part of year n+1 (e.g., use 2023's budget and 2024's budget). Additionally, each project must consume all the needed resources:

$$\sum\_{i=1}^{H} w\_{k,i,l} = q\_{k,j} \,\forall k \in \{1, 2, \dots, N\}, \forall l \in \{1, 2, \dots, H\} \tag{6}$$

To prevent over-consumption of resources at any given year, the resource level constraints are

$$\sum\_{\forall k} \sum\_{\forall i} w\_{k,j} \le r\_{j,i} \,\forall j \in \{1, 2, \dots, P\} \tag{7}$$

Finally, the technical dependencies among the projects are expressed as

$$z\_{k,i} \ge z\_{m,i} \,\forall d\_{k,m} = 1, \forall i \in \{1, 2, \dots, H\} \tag{8}$$

A small example to help illustrate this formulation is detailed in the Appendix A.

#### **4. Problem Complexity and Exact Solution**

Although the formulation depicted in Section 3.3 is accurate and needed, it is of little contribution when trying to solve such problems. The described problem is NP-complete, yet exact solutions can be obtained via the branch and bound (B&B) method. Section 4.1 describes a B&B solution for this problem.

Since a B&B solution is practical for some of the problems, and the NP-completeness of the general problem hinders exact solutions, a practical and efficient metaheuristic solution is described in Section 5. This solution method is useful for large-sized problems and also provides an initial upper bound for the B&B algorithm.

## *4.1. Complexity*

To prove the NP-completeness of the PPS, a reduction to a well-known problem is needed: the precedence constraint knapsack problem [54]. The reduction is performed as follows:


As PPS is far more complicated than this, it is clear that PPS is of an NP-complete nature.

#### *4.2. Branch and Bound Algorithm*

An efficient B&B algorithm is based on the following components:


The following subsections describe the components of the algorithm.

#### 4.2.1. Initial Solution

It is convenient (though not necessary) to start the run of the algorithm with a lower bound (LB). The higher the LB, the better. A reasonable algorithm should consider all important attributes of the projects, namely the resource requirements and precedence. The following algorithm can provide a decent initial solution:

	- 1.1 Calculate for each project the ratio of the project value to its resource requirement; that is, *<sup>θ</sup><sup>k</sup>* <sup>=</sup> *pk qk*,*<sup>j</sup>* , where *θ<sup>k</sup>* denotes the value that can be obtained from one unit of the resource by performing the project.
	- 1.2 For each project, calculate its total impact. The impact is calculated by aggregating the project's *θ<sup>k</sup>* value and the values of all its successors (direct and indirect). For example, for project 1, the impact is *I*<sup>1</sup> = *θ*<sup>1</sup> + *θ*<sup>5</sup> + *θ*<sup>7</sup> + *θ*<sup>8</sup> (since projects 5, 7, and 8 are successors to project 1).
	- 1.3 Sort the projects in descending order by the total impact.
	- 1.4 Schedule the projects according to the order obtained in Step 1.3. Each project should be scheduled for the first available year.
	- 1.5 Calculate the total value obtained from the schedule of Step 1.4 [V(*j*)].

## 4.2.2. Branching

The branching process is quite simple. Each "level" of the tree represents a year. Therefore, the tree depth can only be as deep as the planning horizon.

Each branch is simply a set of projects that fully utilize at least one of the resources available for that year; that is, when branching year *i*, each branch is a set *J* of all the unscheduled projects that fulfill the following requirements:

• There exists a resource type (*j*) for which

$$\sum\_{k:\forall i\in I} q\_{k,j} \ge r\_{i,j} \tag{9}$$

• For every subset of *J*, denoted by *J*<sup>−</sup> (i.e., *J*<sup>−</sup> ⊂ *J* and not *J*<sup>−</sup> = *J*) and for every resource type (*j*), the following is true:

$$\sum\_{\forall i \in I^{-}} q\_{k,j} < r\_{i,j} \tag{10}$$

The two conditions may look cumbersome, but all they mean is that the set *J* is a set that exploits one resource to the fullest, and any subset of *J* can be extended by adding project (s).

## 4.2.3. Bounding Rule

An efficient bounding rule should satisfy the following demands:


The following algorithm provides the two demands. The algorithm is based on two relaxations of the problem: the first is ignoring the precedence constraints, and the second is ignoring the multi-resource nature of the problem (by dealing with one resource at a time).

	- 1.1 Ignore all resource requirements (other than resource *j*) and precedence constraints.

The proposed algorithm is quite simple and rapid for small-scale problems.

## **5. Metaheuristic Search**

Although the previous section describes an exact algorithm, it is impractical to apply this algorithm (or any exact algorithm) successfully to large-size problems. The algorithm provided in Section 4.2.1 may prove unsatisfactory for even medium- let alone large-scale problems.

To provide near-optimal solutions, a metaheuristic approach is suggested. The benefit of this approach is that (providing enough runtime) optimality is guaranteed. Since the first conception of metaheuristics, many approaches were suggested. The proposed solution is based on the CLONALG metaheuristics. This search method was developed by de Castro and Van Zuben [55]. This basic method can be applied to various scheduling problems, as long as the following requirements are provided [56]:


Section 5.1 elaborates on the application of the first requirement to PPS. The second requirement application is provided in Section 5.2. Finally, several different mutation algorithms (third requirement) are detailed in Section 6.

#### *5.1. Vector Representation*

Although the formulation provided in Section 3 (and the notations of Table 2) is mathematically accurate, the presentation of the decision variables ( = X ) is impractical for the purposed CLONALG process. The main problem is that most possible instances of

= X are not feasible, either because they represent schedules that violate the dependency constraints or violate the resource constraints. An ideal representation is one in which every possible instance represents a feasible solution. Thus, the mutation process would never yield infeasible solutions. Another requirement is that all feasible solutions can be represented (i.e., the vector representation should spread the entire solution space) so the mutation process will not omit any possible solution.

To achieve this, a vector *G* is introduced. This vector represents the "genotype" of the solution (i.e., it is not the schedule itself), but from each possible instance of *G*, a feasible "phenotype" ( <sup>=</sup> X , the solution) can be derived. *G* is a vector of *N* natural numbers (from 1 to *N*), where *gk* is the *kth* project to be scheduled.

The transformation from *<sup>G</sup>* to <sup>=</sup> X ("genotype to phenotype translation") is performed as follows:

MAIN

	- 1.1 If ∑*<sup>N</sup> <sup>j</sup>*=<sup>1</sup> *xk*,*<sup>j</sup>* = 0 (i.e., *gk* has not been scheduled yet): Run procedure "FindYear" with parameter *k*.

1.2 If ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *xk*,*<sup>j</sup>* = 1 (i.e., *gk* has already been scheduled): Continue.

FindYear(*k*)

	- 1.1 If *dm*,*<sup>k</sup>* = 1 and ∑*<sup>N</sup> <sup>j</sup>*=<sup>1</sup> *xm*,*<sup>j</sup>* = 0 (project *k* depends on project *m*, and project *m* has not been scheduled), run procedure "SCHEDULE" with parameter *m*.
	- 1.2 Otherwlse (either project *k* does not depend on project *m* or project *m* has already been scheduled), continue.

This simple algorithm provides the two requirements: (1) any genotype *G* can be converted to a feasible "phenotype" (feasible <sup>=</sup> X ) through a simple process, and (2) all feasible solutions can be originated from a genotype *G*.

## *5.2. Initial Solution Generation*

Since the aforementioned procedure makes any vector containing the natural numbers from 1 to *N* to be a feasible representation of a solution, a procedure to generate an initial feasible solution is quite straightforward; any "scrambled" vector containing the numbers from 1 to *N* in random order will suffice. To achieve a set of these scrambled vectors, the following method was applied:

	- 2.1 Set *si*,1 = *i*.
	- 2.2 Set *si*,2 = *U*(0, 1) (random number from the unit uniform distribution).

After this stage, the first column is filled with running numbers and the second with random numbers.


#### **6. Mutation Generation**

The presentation of the solution vector and the initial solution set generating set the ground for the central part of the CLONALG process: mutation generation and cloning. The mutations are generated by random change insertion to the genome vector (*G*). Since the genotype vector describes the order of scheduling the projects, the mutation process will be carried out by changing this order. This section describes three approaches to the mutation process. The first and the second are "traditional" approaches, and the third one attempts to exploit clustering techniques to improve the search performance. An additional approach is also presented: a combination of the previous ones.

#### *6.1. Minor Mutations*

The most trivial and straightforward approach to changing the order of the vector members is by replacement. The simplest way is to randomly choose a project (member of the genotype vector) and replace it with its neighbor, as depicted in Figure 2. In this case, the fifth location (project 6) was chosen and replaced with its neighbor in the sixth location (project 3).

**Figure 2.** A simple mutation.

There is an advantage in small mutations. When in the vicinity of the optimal solution, a small mutation is less likely to cause damage and drift away from the optimal solution, as visualized in Figure 3. The small mutation, though in a wrong direction, is less likely to corrupt the solution value than the larger mutation in the correct direction.

**Figure 3.** (**a**) A small mutation. (**b**) A large mutation.

#### *6.2. Major Mutations*

Though the minor mutations have the advantage of minimal damage when in the vicinity of the optimum, they are likely to provide only minimal improvement (i.e., many steps needed toward the optimum). To illustrate this, let us examine the case depicted in Figure 4.

**Figure 4.** Projects' dependencies.

Project 7 has a high value and therefore should be scheduled ASAP. Let us assume a pre-mutated solution vector *G* = (4, 10, 1, 2, 3, 5, 6, 7, 8, 9). A mutation switching Projects 1 and 10 may decrease the total value (as Project 10 is postponed) without expediting the lucrative Project 7. This mutation has a high probability of being rejected (decrease in the objective function). To expedite Project 7, there is a need for several "lucky" small mutations. This sequence of small mutations will indeed appear eventually but may take quite a long time. Figure 5 visually depicts the difference between the mutation types.

**Figure 5.** Comparison between the advantages of (**a**) large and (**b**) small mutations.

To improve this, another version of mutations is suggested: randomly choosing two projects in the vector and switching their locations, as depicted in Figure 6. In this case, the third location (Project 2) and the eighth location (Project 1) were switched.

**Figure 6.** Switch mutation.

While this "mega-mutation" may prove lethal (i.e., significantly reduce the objective function value), it may also save the need for a lucky sequence of minor mutations.

## *6.3. Oriented Mutations*

As claimed by Darwin, in nature, all mutations, whether large or small, are totally random and have no special direction (unlike the Lamarckism theory, which claims that they evolve toward a defined goal). The two mutation types described in Sections 6.1 and 6.2 are Darwinian; that is, they are totally random, and each project has the same probability to be selected. This full randomness has the distinct advantage of being totally unbiased but may prove inefficient. Evolutionary scientists have claimed for decades that a gene cannot be considered simply "bad", "good", or even "helpful" for the organism, but rather a set of genes operating together may prove beneficial [57–59]. For example, a set of sharp incisors and canines is of no use for an herbivore animal, nor are long intestines and complex stomachs useful for a carnivorous hunter. A gene for sharp teeth can contribute only when accompanied by other genes for a carnivorous lifestyle, yielding a cheetah for example. Not to push the natural metaphor too much, but this observation from the field of "ordinary" evolution can be adapted to the field of evolutionary metaheuristic search. If project X

and project Y are both predecessors of project Z (which is very lucrative), then there is no advantage in expediting project X alone, but it is very beneficial to expedite X and Y together. In our example, expediting Project 5 may prove unbeneficial if not accompanied by Projects 6 and 7. A beneficial mutation will include several combined small mutations and thus collectively improve the total value.

The problem then is how to recognize whether project X is beneficial with project Y. The projects have dependencies, and they compete for the same resources. To predict whether two projects should be scheduled together, a technique often used by data science was exploited: the similarity coefficient method (SCM). The concept of the SCM was originally used for group technology (GT) applications [60,61], but it is now used for a wide variety of classification and optimization problems [62–64]. Therefore, the proposed method is actually a combination of three fields: SCM, oriented (Lamarckist) evolution, and PPS. Obviously, the similarity between machines and manufacturing processes (as used by GT) should be altered too.

The challenge is to keep the process of CLONALG and its random mutations while inserting "smart" mutations. The basic notion laying beneath this approach is to increase the probability of clustered-together projects to be moved simultaneously (either postponed or expedited but together) by resorting to the SCM. The first step is to calculate the similarity between the projects (i.e., likelihood of benefiting from being scheduled together). The second phase is to generate the mutation in a way that is based on this similarity.

The similarity measure will be as follows:

• Dependent projects: In the example depicted in Figure 4, Projects 5 and 6 have a common dependent in Project 7. This means that it will not be possible to gain the value of Project 7, even if Project 5 is expedited. There is a need to complete Projects 6 and 2 as well. Any small mutation expediting just one of these projects will leave the others untouched and fail to yield a major gain in value. Furthermore, any mutation that causes one of these projects to be postponed will yield a major reduction in the total value. A mutation involving all three projects may enable the expedition of the lucrative Project 7. The proposed similarity measure is (based on [65])

$$\mathbb{S1}\_{k,m} = \frac{\sum\_{i=1}^{N} d\_{k,i} d\_{m,i}}{\sum\_{i=1}^{N} d\_{k,i}(1 - d\_{m,i}) + \sum\_{i=1}^{N} d\_{m,i}(1 - d\_{k,i}) + \sum\_{i=1}^{N} d\_{k,i} d\_{m,i}} \tag{11}$$

where the latter is simply the number of projects that depend on both project *k* and *m* divided by the number of projects that depend on either of the two. For example, <sup>S</sup>15,6 <sup>=</sup> <sup>1</sup> <sup>1</sup>+1+<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>3</sup> , since only Project 7 depends on both projects, and there are 3 projects that depend on either 5 or 6.

• Mutual dependencies: When two projects depend on the same (or nearly the same) projects, expediting both together causes only a little more impact on the entire schedule than expediting only one (i.e., "two for almost the same price"). Therefore, the second similarity is calculated as follows:

$$\mathbb{ES}\_{k,m} = \frac{\sum\_{i=1}^{N} d\_{i,k} d\_{i,m}}{\sum\_{i=1}^{N} d\_{i,m} (1 - d\_{i,m}) + \sum\_{i=1}^{N} d\_{i,m} (1 - d\_{i,k}) + \sum\_{i=1}^{N} d\_{i,k} d\_{i,m}} \tag{12}$$

• Resource requirements: Obviously, all projects compete for the same resource pool. Therefore, the third similarity is based on the measure of the level of common resources required by both projects. Two projects that require totally different resources do not compete at all. Projects that compete for the same resource, in which the resource itself is in abundance, will result in a minimal competition. If, however, both have high requirements for a low-level resource, then they are in head-to-head competition. Therefore, the third similarity can be calculated as the ratio between the required combined resources and the availability of these resources:

$$\text{s3}\_{k,m} = \sum\_{j=1}^{R} \frac{q\_{j,k} + q\_{j,m}}{\sum\_{i=1}^{H} r\_{j,i}} \tag{13}$$

• This similarity measure differs from <sup>S</sup>1 and <sup>S</sup>2. First, it measures dissimilarity. Second, the value of *s*3 depends on the number of resources (*R*). The larger the value of *R*, the larger the value of *s*3. This poses a problem for the implementation of the CLONALG method. Therefore, the similarity measure (S3*k*,*m*) will be calculated as follows:

$$\text{SS}\_{k,m} = \frac{1 - s\mathfrak{Z}\_{k,m}}{\max\_{\forall n \neq p} (1 - s\mathfrak{Z}\_{n,p})} \tag{14}$$

Thus, the similarity measure (S3) is set to be a 0–1 number, where 1 indicates 2 non-competing projects, and the lower the value, the higher the competition for resources.

These three similarity measures are utilized to create a similarity coefficient that incorporates all these attributes. The similarity coefficient is, therefore, the following:

$$\mathbb{BC}\_{k,m} = \alpha\_1 \mathbb{BD}\_{k,m} + \alpha\_2 \mathbb{BD}\_{k,m} + \alpha\_3 \mathbb{BD}\_{k,m} \tag{15}$$

where *α*<sup>1</sup> + *α*<sup>2</sup> + *α*<sup>3</sup> = 1 and *α*1, *α*2, *α*<sup>3</sup> are non-negative.

The similarity coefficient is the base of the mutation generation algorithm:

	- 3.1 Generate a random number *u* ∼ *U*(0, 1);
	- 3.2 If SC*k*,*<sup>m</sup>* > *<sup>u</sup>*, then add project *<sup>m</sup>* to <sup>Φ</sup>.

#### *6.4. Mixed Mutations*

The basic concept of the oriented mutations is that the advance toward the optimum is not limited to random mutations but also benefits from knowledge and common sense (as expressed in the similarity coefficient matrix). In Figure 7a, there is an illustration of random mutations, where the new solutions are spread randomly. Figure 7b depicts an illustration of the oriented mutations, where the new solutions are concentrated in the vicinities of the optima.

**Figure 7.** (**a**) Random mutations vs. (**b**) Oriented mutations.

The main risk of the oriented mutation is that the "orientation" will lower the diversification, or the ability to visit many different regions of the solution space [66]. Lower diversification will interfere with the random search and disturb the metaheuristic process that enables escape from the local optima. Though the oriented approach increases the search intensification, it is essential to find an optimal balance between intensification and diversification [67].

To overcome this risk, other approaches were examined for the mutation process. These approaches were basically to tune the level of similarity that is, in the mutation algorithm depicted in Section 6.2, Step 3 is replaced by the following:

	- 3.1 Generate a random number *u* ∼ *U*(0, 1);
	- 3.2 If *<sup>α</sup>*SC*k*,*<sup>m</sup>* > *<sup>u</sup>*, then add project *<sup>m</sup>* to <sup>Φ</sup>.

where *α* is the tuning parameter. A high value of *α* means that the mutation process relies more on oriented mutations. A low value of *α* means it is less reliant on these. When *α* = 0, the mutation process is the same as depicted in Section 6.2.

## **7. Computational Results**

#### *7.1. Database*

To examine the performance of the different approaches, a random data set of portfolios was generated. To encompass the wide variety of portfolios, the database was generated by randomly generating different cases varying in size, connectivity, and resources:


For each combination of size, connectivity, and resources, a set of five random portfolios was generated.

The data set was planned for 5 years (a project scheduled for year 6 is considered to be undesired and not performed).

## *7.2. Experiment Design*

The purpose of the experiment was to assess the performance of the four approaches to CLONALG mutations (i.e., minor, major, oriented, and mixed mutations). As there was no database of optimal (or even best-known) solutions, the comparison was based on the following measures:


To compare the various methods correctly, they were run for the exact same time. The runtime for each problem was set by running the minor mutation first until no improvement was achieved for 20 generations. The time was measured, and then all other methods were run for the same length of time, thus providing a comfortable benchmark.

The mixed method used a value of *α* = 0.5.

## *7.3. Initial Results*

The results of the experiment for small problems are depicted in Table 3. For larger problems, the results are depicted in Table 4.


**Table 3.** Initial results: ratio to optimal solutions.

**Table 4.** Initial results: ratio to optimal best known.


As can be seen from Tables 3 and 4, the mixed method outperformed all other methods (also relying only on minor mutations proven to be underperforming). A comparison between the methods is described in Figure 8.

The graph of Figure 8 reveals that the "oriented" (i.e., Lamarckian) mutations indeed improved the performance of the metaheuristic search. However, mixing them with "classic" mutations provided better solutions. The question that arises is the composition of this mix. The experiment was based on the arbitrary setting of *α* = 0.5 (i.e., halfway between oriented mutations and totally non-oriented ones). As the advantage of oriented mutations was established, it is interesting to further explore and find the optimal ratio of this "mix".

To test this point, a second experiment was performed. The same sets of 80 problems were used again for different values of *α*. The results are depicted in Figure 9. From the results, it is evident that there was almost no significance to the value of α as long as there was a presence of oriented mutations and as long as the oriented mutations monopolized the process.

**Figure 9.** Effect of α.

## **8. Summary and Conclusions**

This paper aimed to tackle the problem of project selection subject to resource constraints and technical precedence. To test this novel problem, the research developed a

benchmark database of portfolios varying in size, precedence complexity, and resources. As far as we can ascertain, this is a one-of-a-kind database, and one of the outcomes of this research is to set benchmark results. This paper provides an exact formulation and an example for the new problem.

To solve the problem, a practical search approach for reaching a solution was developed. This enhancement approach would be applicable for most metaheuristic search techniques by using clustering methods that portray the attractive search zones and act as "intensificators". The proposed search was able to generate feasible, meaningful, and highly satisfactory solutions to the planning of long-horizon problems.

The proposed algorithm has both theoretical and practical implications. The practical one is its ability to upgrade the PPS problem decision making and base it on solid exact foundations. The decision-making process should be based less on "gut feelings" and more on exact and well-presented data. Furthermore, the process may enable the decision makers to be aware of the impact of various constraints and lead to improved decisions (e.g., the economic benefit of recruiting more of a specific type of engineers). The theoretical implications, on the other hand, can be derived from the metaheuristic approach; the suggested oriented search need not be limited to PPS and can be implemented in various scheduling (and perhaps other) problems.

An obvious weakness of the article is its limitation to a specific problem, where the data is deterministic and the objective function is limited to a single one (maximum gain value). In reality, the data are often fuzzy or stochastic, and the proposed model does not take this into account. It is worth mentioning that there is nothing fundamental that prevents the proposed meta-heuristic search techniques from tackling fuzzy objective functions, and this may be a suitable direction for further research.

Another direction for future research could use the presented insights to develop better algorithms that will smartly manipulate the mutation type in the different phases of the search and develop a technique or a method for optimizing the various factors of the search to better its performance.

**Author Contributions:** Conceptualization, R.E. and Y.C.; methodology, R.E. and Y.C.; software, R.E.; analysis, R.E.; writing, R.E. and Y.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not Applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Appendix A. Problem Example**

To help visualize the problem formulation, a miniature PPS is portrayed. Whereas typical PPS may include hundreds of projects, this one includes only 10.

#### *Appendix A.1. Projects*

The set includes 10 projects with the dependencies included in Figure 4. The dependency matrix is, therefore, the following:


#### *Appendix A.2. Resources*

The planning horizon is limited to 3 years. In this example, the number of resources is limited to one. Therefore, the matrix <sup>=</sup> R is of the dimensions 3 × 1. Let us assume a constant level of 5 units (e.g., 5 worker years for each of the years of the entire horizon): = <sup>R</sup>*<sup>T</sup>* <sup>=</sup> (5, 5, 5, <sup>∞</sup>)

As can be seen, the last year has an infinite level of resources, since scheduling a project to year 4 means not performing it at all.

The demand for resources is depicted in Table A1. From this table, the matrix <sup>=</sup> Q can be derived: <sup>=</sup> <sup>Q</sup>*<sup>T</sup>* <sup>=</sup> (2, 3, 1, 3, 2, 3, 1, 2, 3, 1)

**Table A1.** Resource demand.


## *Appendix A.3. Planning Horizon*

As mentioned, the planning horizon spreads over 3 years. The first year has a value (depreciation) of 1 (no depreciation), the second has a value of 0.8, the third has a value of 0.5, and everything that follows has a value of 0 (i.e., not planned to be developed at all). Therefore, we set *H* = 4 (i.e., the 3 years of the planning horizon plus 1 year for the projects that would not be realized). We also set the following: <sup>Y</sup>*<sup>T</sup>* <sup>=</sup> (1, 0.8, 0.5, 0)

## *Appendix A.4. Projects' Values*

The projects' values are depicted in Table A2. Therefore, the vector P is set to <sup>P</sup>*<sup>T</sup>* <sup>=</sup> (1, 1, 1, 1, 1, 2, 8, 2, 2, 3).



*Appendix A.5. Objective Function*

From the previous notations, the objective function can be derived:

max V = (*X*1,1+ *X*2,1 + *X*3,1 + *X*4,1 + *X*4,1 + 3*X*6,1 + 8*X*7,1 + 2*X*8,1 + 2*X*9,1 + 3*X*10,1) + 0.8(*X*1,2 + *X*2,2 + *X*3,2 +*X*4,2 + *X*4,2 + 3*X*6,2 + 8*X*7,2 + 2*X*8,2 + 2*X*9,2 + 3*X*10,2) + 0.5(*X*1,3 + *X*2,3 + *X*3,3 + *X*4,3 + *X*4,3 +3*X*6,3 + 8*X*7,3 + 2*X*8,3 + 2*X*9,3 + 3*X*10,3)

*Appendix A.6. Single Completion Year Constraints*

To ensure that a project is completed in 1 year only, the following constraints are added:

*X*1,1 + *X*1,2 + *X*1,3 + *X*1,4 = 1 *X*2,1 + *X*2,2 + *X*2,3 + *X*2,4 = 1 *X*3,1 + *X*3,2 + *X*3,3 + *X*3,4 = 1 *X*4,1 + *X*4,2 + *X*4,3 + *X*4,4 = 1 *X*5,1 + *X*5,2 + *X*5,3 + *X*5,4 = 1 *X*6,1 + *X*6,2 + *X*6,3 + *X*6,4 = 1 *X*7,1 + *X*7,2 + *X*7,3 + *X*7,4 = 1 *X*8,1 + *X*8,2 + *X*8,3 + *X*8,4 = 1 *X*9,1 + *X*9,2 + *X*9,3 + *X*9,4 = 1 *X*10,1 + *X*10,2 + *X*10,3 + *X*10,4 = 1

*Appendix A.7. Precedence Constraints*

The auxiliary variables *zi*,*<sup>j</sup>* are set as follows:

*z*1,1 ≤ *x*1,1 + *x*1,2 + *x*1,3 + *x*1,4, *z*1,2 ≤ *x*1,2 + *x*1,3 + *x*1,4,*z*1,3 ≤ *x*1,3 + *x*1,4,*z*1,4 ≤ *x*1,4 *z*2,1 ≤ *x*2,1 + *x*2,2 + *x*2,3 + *x*2,4, *z*2,2 ≤ *x*2,2 + *x*2,3 + *x*2,4,*z*2,3 ≤ *x*2,3 + *x*2,4,*z*2,4 ≤ *x*2,4 *z*3,1 ≤ *x*3,1 + *x*3,2 + *x*3,3 + *x*3,4, *z*3,2 ≤ *x*3,2 + *x*3,3 + *x*3,4,*z*3,3 ≤ *x*3,3 + *x*3,4,*z*3,4 ≤ *x*3,4 *z*4,1 ≤ *x*4,1 + *x*4,2 + *x*4,3 + *x*4,4, *z*4,2 ≤ *x*4,2 + *x*4,3 + *x*4,4,*z*4,3 ≤ *x*4,3 + *x*4,4,*z*4,4 ≤ *x*4,4 *z*5,1 ≤ *x*5,1 + *x*5,2 + *x*5,3 + *x*5,4, *z*5,2 ≤ *x*5,2 + *x*5,3 + *x*5,4,*z*5,3 ≤ *x*5,3 + *x*5,4,*z*5,4 ≤ *x*5,4 *z*6,1 ≤ *x*6,1 + *x*6,2 + *x*6,3 + *x*6,4, *z*6,2 ≤ *x*6,2 + *x*6,3 + *x*6,4,*z*6,3 ≤ *x*6,3 + *x*6,4,*z*6,4 ≤ *x*6,4 *z*7,1 ≤ *x*7,1 + *x*7,2 + *x*7,3 + *x*7,4, *z*7,2 ≤ *x*7,2 + *x*7,3 + *x*7,4,*z*7,3 ≤ *x*7,3 + *x*7,4,*z*7,4 ≤ *x*7,4 *z*8,1 ≤ *x*8,1 + *x*8,2 + *x*8,3 + *x*8,4, *z*8,2 ≤ *x*8,2 + *x*8,3 + *x*8,4,*z*8,3 ≤ *x*8,3 + *x*8,4,*z*8,4 ≤ *x*8,4 *z*9,1 ≤ *x*9,1 + *x*9,2 + *x*9,3 + *x*9,4, *z*9,2 ≤ *x*9,2 + *x*9,3 + *x*9,4,*z*9,3 ≤ *x*9,3 + *x*9,4,*z*9,4 ≤ *x*9,4 *z*10,1 ≤ *x*10,1 + *x*10,2 + *x*10,3 + *x*10,4, *z*10,2 ≤ *x*10,2 + *x*10,3 + *x*10,4,*z*10,3 ≤ *x*10,3 + *x*10,4, *z*10,4 ≤ *x*10,4 *z*1,1 ≤ *z*1,2 ≤ *z*1,3 *z*2,1 ≤ *z*2,2 ≤ *z*2,3 *z*3,1 ≤ *z*3,2 ≤ *z*3,3 *z*4,1 ≤ *z*4,2 ≤ *z*4,3 *z*5,1 ≤ *z*5,2 ≤ *z*5,3 *z*6,1 ≤ *z*6,2 ≤ *z*6,3

> *z*8,1 ≤ *z*8,2 ≤ *z*8,3 *z*9,1 ≤ *z*9,2 ≤ *z*9,3 *z*10,1 ≤ *z*10,2 ≤ *z*10,3

*z*7,1 ≤ *z*7,2 ≤ *z*7,3

*Appendix A.8. Resource Requirements*

$$\begin{array}{l} w\_{1,1} \le z\_{1,1}, w\_{1,2} \le z\_{1,2}, w\_{1,3} \le z\_{1,3}, w\_{1,4} \le z\_{1,4} \\ w\_{2,1} \le z\_{2,1}, w\_{2,2} \le z\_{2,2}, w\_{2,3} \le z\_{2,3}, w\_{2,4} \le z\_{2,4} \\ w\_{3,1} \le z\_{3,1}, w\_{3,2} \le z\_{3,2}, w\_{3,3} \le z\_{3,3}, w\_{3,4} \le z\_{3,4} \\ w\_{4,1} \le z\_{4,1}, w\_{4,2} \le z\_{4,2}, w\_{4,3} \le z\_{4,3} w\_{4,4} \le z\_{4,4} \\ w\_{5,1} \le z\_{5,1}, w\_{5,2} \le z\_{5,2}, w\_{5,3} \le z\_{5,3}, w\_{5,4} \le z\_{5,4} \\ w\_{6,1} \le 2z\_{6,1}, w\_{6,2} \le 2z\_{6,2}, w\_{6,3} \le 2z\_{6,3}, w\_{6,4} \le 2z\_{6,4} \\ w\_{7,1} \le 8z\_{7,1}, w\_{7,2} \le 8z\_{7,2}, w\_{7,3} \le 8z\_{7,4} \le 8z\_{7,4} \\ w\_{8,1} \le 2z\_{8,1}w\_{8,2} \le 2z\_{8,2}, w\_{8,3} \le 2z\_{8,4} \le 2z\_{8,4} \\ w\_{9,1} \le 2z\_{9,1}w\_{9,2} \le 2z\_{9,2}w\_{9,3} \le 2z\_{9,4}w\_{9,4} \le 2z\_{9,4} \end{array}$$

*w*10,1 ≤ 3*z*10,1,*w*10,2 ≤ 3*z*10,2,*w*10,3 ≤ 3*z*10,3,*w*10,4 ≤ 3*z*10,4

*Appendix A.9. Resource Consumption Constraints*

*w*1,1 + *w*1,2 + *w*1,3 + *w*1,4 = 1 *w*2,1 + *w*2,2 + *w*2,3 + *w*2,4 = 1 *w*3,1 + *w*3,2 + *w*3,3 + *w*3,4 = 1 *w*4,1 + *w*4,2 + *w*4,3 + *w*4,4 = 1 *w*5,1 + *w*5,2 + *w*5,3 + *w*5,4 = 1 *w*6,1 + *w*6,2 + *w*6,3 + *w*6,4 = 2 *w*7,1 + *w*7,2 + *w*7,3 + *w*7,4 = 8 *w*8,1 + *w*8,2 + *w*8,3 + *w*8,4 = 2 *w*9,1 + *w*9,2 + *w*9,3 + *w*9,4 = 2 *w*10,1 + *w*10,2 + *w*10,3 + *w*10,4 = 3

*Appendix A.10. Resource Limitations*

The resource limitation constraints for each of the years of the planning horizon are as follows, where year 4 does not have a constraint as it is limitless:

> *w*1,1 + *w*2,1 + *w*3,1 + *w*4,1 + *w*5,1 + *w*6,1 + *w*7,1 + *w*8,1 + *w*9,1 + *w*10,1 ≤ 5 *w*1,2 + *w*2,2 + *w*3,2 + *w*4,2 + *w*5,2 + *w*6,2 + *w*7,2 + *w*8,2 + *w*9,2 + *w*10,2 ≤ 5 *w*1,3 + *w*2,3 + *w*3,3 + *w*4,3 + *w*5,3 + *w*6,3 + *w*7,3 + *w*8,3 + *w*9,3 + *w*10,3 ≤ 5


As can be seen, even for such a small problem, the formulation is quite complicated and non-linear.

#### **References**

